PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index PDFsharp - moved to http://forum.pdfsharp.net/
Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Iterating through a PDF and retrieving image properties

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/
View previous topic :: View next topic  
Author Message
Vermis



Joined: 02 Jul 2007
Posts: 1

PostPosted: Mon Jul 02, 2007 5:00 pm    Post subject: Iterating through a PDF and retrieving image properties Reply with quote

The project I'm working on needs to go through a directory of PDFS and spit out information about all of the images in them. The PDFs will all be created from scanned images, so the focus of this is slightly narrow.

I have some proof of concept code working with the limited testing that I've done so far (far from being production code). However, I've only just started working with PDFsharp and this doesn't seem to be very elegant or robust. Is there a better way that I'm overlooking?

Code:

Dim pDoc As PdfSharp.Pdf.PdfDocument
Dim pDict As PdfSharp.Pdf.PdfDictionary
Dim pRef As PdfSharp.Pdf.Advanced.PdfReference
Dim iPage As Integer = 0
Dim iWidth As String = ""
Dim iHeight As String = ""
Dim iColor As String = ""
Dim iBits As String = ""
Dim iFilter As String = ""
Dim tmp As String = ""

' Open the PDF in read-only mode
pDoc = PdfSharp.Pdf.IO.PdfReader.Open("C:\Test Files\Color PDFs\8902-01-0003.pdf", PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)

' Loop through each page, find the image, and report on it
For iPage = 0 To pdoc.Pages.Count - 1
   iWidth = ""
   iHeight = ""
   iColor = ""
   iBits = ""
   iFilter = ""

   ' Does this page have a Resources Element?
   If pDoc.Pages(iPage).Elements.Contains("/Resources") Then
      pDict = pDoc.Pages(iPage).Elements("/Resources")

      ' Does the Resources Element contain an XObject?
      If pDict.Elements.Contains("/XObject") Then
         pDict = pDict.Elements("/XObject")

         ' Does the XObject contain an Im1 image element?
         If pDict.Elements.Contains("/Im1") Then
            pRef = pDict.Elements("/Im1")

            ' Get the dictionary by the reference under Im1
            pDict = pDoc.Internals.GetObject(pRef.ObjectID)

            ' Get image details
            If pDict.Elements.Contains("/Width") Then iWidth = pDict.Elements("/Width").ToString
            If pDict.Elements.Contains("/Height") Then iHeight = pDict.Elements("/Height").ToString
            If pDict.Elements.Contains("/ColorSpace") Then
               iColor = pDict.Elements("/ColorSpace").ToString
               If iColor.Substring(0, 1) = "/" Then iColor = iColor.Substring(1)
            End If
            If pDict.Elements.Contains("/BitsPerComponent") Then
               iBits = pDict.Elements("/BitsPerComponent").ToString
            End If
            If pDict.Elements.Contains("/Filter") Then
               iFilter = pDict.Elements("/Filter").ToString
               If iFilter.Substring(0, 1) = "/" Then iFilter = iFilter.Substring(1)
            End If
         End If

         ' {0} Delim
         ' {1} Filename
         ' {2} Page Number
         ' {3} Page Width (inch)
         ' {4} Page Height (inch)
         ' {5} Page Orientation
         ' {6} Image Dimensions (pixels)
         ' {7} Bits Per Component
         ' {8} Colorspace
         ' {9} Decode Filter
         tmp = String.Format("{1}{0}{2}{0}{3}{0}{4}{0}{5}{0}{6}{0}{7}{0}{8}{0}{9}", _
                                       "|", _
                                       "filename.ext goes here", _
                                       iPage + 1, _
                                       String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Width).Inch), _
                                       String.Format("{0:0.##}", PdfSharp.Drawing.XUnit.FromPoint(pDoc.Pages(iPage).Height).Inch), _
                                       pDoc.Pages(iPage).Orientation.ToString, _
                                       iWidth & "x" & iHeight, _
                                       iBits, _
                                       iColor, _
                                       iFilter)

         Console.WriteLine(tmp)
      End If
   End If

Next

pdoc.Close()
pDoc = Nothing


The output looks like this:
Code:

filename.ext goes here|1|11.04|8.49|Portrait|3311x2544|1|DeviceGray|CCITTFaxDecode
filename.ext goes here|2|8.49|11.06|Portrait|2544x3315|1|DeviceGray|CCITTFaxDecode
...
filename.ext goes here|39|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode
filename.ext goes here|40|8.49|11|Portrait|1696x2198|8|DeviceRGB|DCTDecode
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Support - moved to http://forum.pdfsharp.net/ All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com


Start Your Own YouTube Clone

Free Web Hosting | Free Forum Hosting | FlashWebHost.com | Image Hosting | Photo Gallery | FreeMarriage.com

Powered by PhpBBweb.com, setup your forum now!
For Support, visit Forums.BizHat.com