PDFsharp - moved to http://forum.pdfsharp.net/

jigsaw · Joined: 02 Nov 2006 Posts: 1 Location: Australia

Hi,
I was wondering if the PDFsharp object could be used to find text within a PDF file and retrieve the page number it was on. What I need to do is split a PDF file based upon finding some text.

ie. The top of every page has the text
[Customer:xxxxxxxx]
where xxxxxxxx is the customer name. When the xxxxxxxx changes I need to split the PDF. So a single PDF with 10 pages, 4 which are for Customer X, 3 for Customer Y and 3 for Customer Z would need to produce 3 files,
one for customer X of three pages
one for customer Y of four pages
one for customer Z of three pages

It would also be great if the text search could use a regular expression.

Is this possible with PDFsharp?

Stefan Lange · Joined: 12 Oct 2006 Posts: 47 Location: Cologne, Germany

Hello,

the content of a PDF page is a sequence of bytes that represents graphical commands. These bytes are called the "content stream" of the page. You can get it uncompressed with this code: