|
PDFsharp - moved to http://forum.pdfsharp.net/ Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
|
View previous topic :: View next topic |
Author |
Message |
peteratoce
Joined: 20 Feb 2007 Posts: 5
|
Posted: Tue Feb 20, 2007 2:18 pm Post subject: Image compression |
|
|
It would be most welcome if the library could compress images (not reduce resolution, as is sometimes appropriate).
Here are the results of some tests I did:
I started with a 100 page TIF file (A4, resolution 1200 dpi). BTW, such high resolution is absolutely necessary when a scanned document is to be printed on an offset press.
First, I opened the TIF in Acrobat (V 7) and saved as PDF. The file size barely grew, from 30.507.933 Bytes to 30.561.981 Bytes.
Then I used PDFsharp to do the equivalent (TIF aquired through System.Drawing.Image.FromFile, each page passed to PDFsharp through XImage.FromGdiPlusImage and then inserted in the output PDF with XGraphics.DrawImage). The conversion took about four times as long, and the resultant file size was 100.594.087 Bytes, i.e. more than three times as much.
Another consideration is the amount of memory needed during conversion. My understanding is that all newly created PDF pages have to be kept in memory by PDFsharp, until they are finally saved to file. My first test, done with a similar TIF file, but with 1012 pages in it, ran into an OutOfMemoryException. I expect that pages with compressed images on them would need far less memory during processing. |
|
Back to top |
|
|
Thomas Hoevel
Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany
|
Posted: Tue Feb 20, 2007 5:38 pm Post subject: |
|
|
I cannot explain why the file is so much bigger.
Have you tried a release build? The debug build by default produces "verbose" PDF files that are bigger.
Images in the PDF file use lossless LZ compression (except for JPEG images - those are copied byte by byte into the PDF file).
Not sure if the verbose mode can account for a factor 3 - I don't expect that.
I'd like to know which image format and compression was used for the TIFF file. If it was JPEG or CCITT/FAX than this could be the reason - PDFsharp uses the standard LZ compression, but other methods may be better for your scanned image.
Or maybe the image got converted to 24 bit RGB - this could explain factor 3.
PDFsharp does not read the files - it relies on GDI+ to read them; the 8-bit-to-24-bit-conversion could occur here.
Long story short: we do compress image data. I'd like to know what happens there.
BTW: all pages are kept in memory. With 1000 scanned pages this really could be a problem, but for most applications this approach is appropriate. _________________ Regards
Thomas Hoevel
PDFsharp Team |
|
Back to top |
|
|
peteratoce
Joined: 20 Feb 2007 Posts: 5
|
Posted: Mon Mar 05, 2007 8:48 am Post subject: |
|
|
Differences in file size really seem to be caused by differing compression schemes:
A 100 page TIF (CCITT G4): 30.507.933 Bytes,
the same TIF (LZW): 100.349.200 Bytes.
PDFsharp created a file of size 100.521.556 Bytes from the G4, so the result is consistent.
I wish somebody (perhaps a knowledgeable user?) would turn his/her attention to image import and export in the library, including questions of different (= optimal) compression schemes for differing content types! From my experience I can say that GDI+ as an intermediate would have to go, though...
And, it would be nice to have more control over memory allocation, creation of temporary files or whatever is necessary to successfully process really large files.
Peter |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|