PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index PDFsharp - moved to http://forum.pdfsharp.net/
Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Image compression

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Feature Request - moved to http://forum.pdfsharp.net/
View previous topic :: View next topic  
Author Message
peteratoce



Joined: 20 Feb 2007
Posts: 5

PostPosted: Tue Feb 20, 2007 2:18 pm    Post subject: Image compression Reply with quote

It would be most welcome if the library could compress images (not reduce resolution, as is sometimes appropriate).
Here are the results of some tests I did:
I started with a 100 page TIF file (A4, resolution 1200 dpi). BTW, such high resolution is absolutely necessary when a scanned document is to be printed on an offset press.

First, I opened the TIF in Acrobat (V 7) and saved as PDF. The file size barely grew, from 30.507.933 Bytes to 30.561.981 Bytes.

Then I used PDFsharp to do the equivalent (TIF aquired through System.Drawing.Image.FromFile, each page passed to PDFsharp through XImage.FromGdiPlusImage and then inserted in the output PDF with XGraphics.DrawImage). The conversion took about four times as long, and the resultant file size was 100.594.087 Bytes, i.e. more than three times as much.

Another consideration is the amount of memory needed during conversion. My understanding is that all newly created PDF pages have to be kept in memory by PDFsharp, until they are finally saved to file. My first test, done with a similar TIF file, but with 1012 pages in it, ran into an OutOfMemoryException. I expect that pages with compressed images on them would need far less memory during processing.
Back to top
View user's profile Send private message
Thomas Hoevel



Joined: 16 Oct 2006
Posts: 387
Location: Cologne, Germany

PostPosted: Tue Feb 20, 2007 5:38 pm    Post subject: Reply with quote

I cannot explain why the file is so much bigger.

Have you tried a release build? The debug build by default produces "verbose" PDF files that are bigger.

Images in the PDF file use lossless LZ compression (except for JPEG images - those are copied byte by byte into the PDF file).

Not sure if the verbose mode can account for a factor 3 - I don't expect that.

I'd like to know which image format and compression was used for the TIFF file. If it was JPEG or CCITT/FAX than this could be the reason - PDFsharp uses the standard LZ compression, but other methods may be better for your scanned image.
Or maybe the image got converted to 24 bit RGB - this could explain factor 3.

PDFsharp does not read the files - it relies on GDI+ to read them; the 8-bit-to-24-bit-conversion could occur here.

Long story short: we do compress image data. I'd like to know what happens there.

BTW: all pages are kept in memory. With 1000 scanned pages this really could be a problem, but for most applications this approach is appropriate.
_________________
Regards
Thomas Hoevel
PDFsharp Team
Back to top
View user's profile Send private message Visit poster's website
peteratoce



Joined: 20 Feb 2007
Posts: 5

PostPosted: Mon Mar 05, 2007 8:48 am    Post subject: Reply with quote

Differences in file size really seem to be caused by differing compression schemes:
A 100 page TIF (CCITT G4): 30.507.933 Bytes,
the same TIF (LZW): 100.349.200 Bytes.

PDFsharp created a file of size 100.521.556 Bytes from the G4, so the result is consistent.

I wish somebody (perhaps a knowledgeable user?) would turn his/her attention to image import and export in the library, including questions of different (= optimal) compression schemes for differing content types! From my experience I can say that GDI+ as an intermediate would have to go, though...

And, it would be nice to have more control over memory allocation, creation of temporary files or whatever is necessary to successfully process really large files.

Peter
Back to top
View user's profile Send private message
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    PDFsharp - moved to http://forum.pdfsharp.net/ Forum Index -> Feature Request - moved to http://forum.pdfsharp.net/ All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group. Hosted by phpBB.BizHat.com


Start Your Own YouTube Clone

Free Web Hosting | Free Forum Hosting | FlashWebHost.com | Image Hosting | Photo Gallery | FreeMarriage.com

Powered by PhpBBweb.com, setup your forum now!
For Support, visit Forums.BizHat.com