PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index PDFsharp - moved to http://forum.pdfsharp.net/
Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Read Text of a PDF-File

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/
View previous topic :: View next topic  
Author Message
gasi



Joined: 07 Oct 2008
Posts: 1

PostPosted: Tue Oct 07, 2008 2:36 pm    Post subject: Read Text of a PDF-File Reply with quote

Hi,

I'm using PDFSharp for a short time. I'm trying to read the whole text of a PDF-file, for example headlines and textbodies. But I didn't find a way to do this.
Actually I tried to use PDFDictionary to navigate in some objects (e. g. "/MediaBox","/xObject") but there was no success.

Can somebody give me an advice? For example what class(es) (and methods) has to be used.

Thanks.
Back to top
View user's profile Send private message Visit poster's website
PeterGillespie



Joined: 14 Oct 2008
Posts: 8
Location: England

PostPosted: Fri Nov 07, 2008 10:36 am    Post subject: Reply with quote

You should probably look at Migradoc to accomplish this. I would imagine the steps you are:

    Load your PDF into a Migradoc Document object.

    You can then iterate through each section within it. (I have not tried importing a prec-reated PDF file into Migradoc so not sure how this works)

    Assuming you get this far you can then iterate through each Element within the section which would look something like:


Code:

List<string> allText= new List<string>();
 foreach (DocumentObject element in Section.Elements)
 {

    if (element is MigraDoc.DocumentObjectModel.Text)
   {

     MigraDoc.DocumentObjectModel.Text textObj =
                                  (MigraDoc.DocumentObjectModel.Text)element;

     allText.Add(textObj.Content);
   }
}
Back to top
View user's profile Send private message
marihanzo



Joined: 17 Mar 2009
Posts: 2

PostPosted: Tue Mar 17, 2009 4:26 pm    Post subject: Reply with quote

Unfortunately I wasn't able to apply this solution to my context.
So I've implemented another solution that uses a low level parsing of pdf content.

My solution has been posted here:
http://pdfsharp.s3.bizhat.com/viewtopic.php?p=1603#1603

I hope this will help you.
Enjoy it! Cool
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/ All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com


Start Your Own YouTube Clone

Free Web Hosting | Free Forum Hosting | FlashWebHost.com | Image Hosting | Photo Gallery | FreeMarriage.com

Powered by PhpBBweb.com, setup your forum now!
For Support, visit Forums.BizHat.com