PDFsharp - moved to http://forum.pdfsharp.net/

antesima · Joined: 02 Jul 2008 Posts: 5

Hello,

thanks for this excellent library, I use it almost every day.

Now, I'm having problems retrieving raw text from an existing
PDF File.

I use the following code, and the string are then parsed to find some info.

antesima · Joined: 02 Jul 2008 Posts: 5

Does somebody have a clue ?

Do you need a sample code that generates the PDF ?
(as it is generated with PDFSharp with Times New Roman font).

Thomas Hoevel · Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany

Between the brackets you see Unicode characters in hex format.
You can convert them to Unicode strings using .NET (take 4 chars, convert to int, convert to char, add to string).

Since the high byte is always 00 (in the samples shown) these are odinary ANSI chars.

I may be wrong: maybe these are not Unicode chars, but indices into the font subset.

It should also be possible to create ANSI PDF files with MigraDoc (it's a parameter of PdfDocumentRenderer).
OTOH for compatibility of your application with unknown PDF files you should support Unicode, too.
_________________
Regards
Thomas Hoevel
PDFsharp Team

antesima · Joined: 02 Jul 2008 Posts: 5

Ok thank you I will give it a try and give you the feedback.

Regards,
Antesima

antesima · Joined: 02 Jul 2008 Posts: 5

It doesn't seem to fit...

Here is the string I try to convert :

"00280057005800470048"

and the code I use :

Thomas Hoevel · Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany

So it seems these are indices into the font subsets, not unicode character codes (would be too simple Crying or Very sad

); don't blame me, I warned you about it.

So you have to add another level of indirection by looking into the font table. That's not my area of expertise so I can't give you any clue.

The other solution: create ANSI PDF files ...
_________________
Regards
Thomas Hoevel
PDFsharp Team

antesima · Joined: 02 Jul 2008 Posts: 5

Ok thank you, I will try to get the fonts and extract the text.

If I manage to have some code that work, I will publish it here.