PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index PDFsharp - moved to http://forum.pdfsharp.net/
Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Extracting text from pdf

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/
View previous topic :: View next topic  
Author Message
luizpapa



Joined: 08 Nov 2007
Posts: 2

PostPosted: Thu Nov 08, 2007 8:04 pm    Post subject: Extracting text from pdf Reply with quote

Hi,

Is it possible to extract text from a pdf file?

It would be better yet if I could extract the text from a area from a page of the pdf instead of the entire file...

I am trying to do it with PDFSharp, but I am not finding a way to do it.

TIA,
Luiz Papa
Back to top
View user's profile Send private message
dgalloway



Joined: 09 Nov 2007
Posts: 1

PostPosted: Fri Nov 09, 2007 2:33 pm    Post subject: Extracting Text from PDF Reply with quote

I have been trying to do that, too. I have been able to use the ContentReader to read a page. I looped through all of the cObjects in the page, but couldn't figure out how to display the content of the object or how to determine if it had any text in it.

Dave Galloway
Back to top
View user's profile Send private message
luizpapa



Joined: 08 Nov 2007
Posts: 2

PostPosted: Fri Nov 09, 2007 4:39 pm    Post subject: Pdfbox Reply with quote

I think I will use pdfbox to do that.

The code below does exactly what I want. The only problem is that I have to put IKVM within my project references...

org.pdfbox.pdmodel.PDDocument doc = org.pdfbox.pdmodel.PDDocument.load(txtFile.Text);
org.pdfbox.util.PDFTextStripperByArea stripper = new org.pdfbox.util.PDFTextStripperByArea();
java.awt.geom.Rectangle2D rect = new java.awt.geom.Rectangle2D.Double(x, y, width, height);
stripper.addRegion("regiao1", rect);
stripper.setSortByPosition(true);
org.pdfbox.pdmodel.PDDocumentCatalog cat = doc.getDocumentCatalog();
org.pdfbox.pdmodel.PDPageNode pn = cat.getPages();
org.pdfbox.pdmodel.PDPage pag = pn.getKids().toArray()[0] as org.pdfbox.pdmodel.PDPage;
stripper.extractRegions(pag);
return stripper.getTextForRegion("regiao1");
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/ All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com


Start Your Own YouTube Clone

Free Web Hosting | Free Forum Hosting | FlashWebHost.com | Image Hosting | Photo Gallery | FreeMarriage.com

Powered by PhpBBweb.com, setup your forum now!
For Support, visit Forums.BizHat.com