How does ghostscript compress PDFs when it applies a greyscale?

How does ghostscript compress PDFs when it applies a greyscale? - compression

I have a single paged PDF that looks like this that is 6.3 MB. Because it seems to already be in greyscale in the first place, applying a greyscale should not make a huge difference.
But when I apply a greyscale to the PDF with:
gs \
-sOutputFile=output.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
input.pdf
"output.pdf" is only 128.4 kB, and you can see the presence of new artifacts. The artifacts are not noticeable when the PDF is at full scale, but if you zoom in you can clearly tell a difference. You can see the greyscaled image here.
What is occuring in the ghostscript that causes the artifacts? But also more importantly, what causes such a dramatic loss in file size?
EDIT:
I think I overstated the artifacts in the output file. For all intensive purposes, the files look very similar.
Version: GPL Ghostscript 9.23
Here is the original PDF file: https://send.firefox.com/download/e47df175af/#tdZSodyN2CuQL8X0VIFC1g
Here is the greyscaled PDF:
https://send.firefox.com/download/a63b3d641c/#ce9Ctu6obfXlvvNZJvPnUA
I found that Sribd, Imgur compressed the original PDF file, so there was no point in using a hoster.

Supplying PNG images, rather than the actual PDF files, makes it impossible for anyone to be able to tell for certain what your problem is. If you had posted the PDF files I'd be able to look and tell you.
However, I'm going to guess that you are using an older version of Ghostscript (again you don't say), and that the image in the original file is DCT (JPEG) compressed.
Because you haven't specified a particular compression method, the pdfwrite device (not Ghostscript, but the Ghostscript device which writes PDF files) uses 'Automatic' compression. It writes the image data multiple times with different compression filters, and selects the one which produces the smallest output.
Almost certainly this will again be the DCT (JPEG) compression filter, it almost invariably produces the smallest output. This is also the default filter which is used if you disable automatic selection and don't specify a different compression filter to use.
The problem is that DCT is a lossy compression, so every time you decompress and recompress it, you lose fidelity. Though the image size in bytes does decrease each time.
So that's the reason for both your results; the compression artefacts and at least part of the reduction in size. It may also be the case that your original Grayscale image is actually not Gray but RGB (or Lab or CalRGB, or ICCBased...), in which case converting it to grayscale will result in a decrease in size of 66%. Without seeing the file I can't tell.
Note that current versions of Ghostscript use a JPEG passthrough feature. Provided that the image is not being downsampled, or having its colour space altered, the image is not decompressed. It is passed unchanged to the output device, which embeds it unchanged. This avoids the artefacts introduced by decompression and recompression.
Obviously if you want to change the colour space, then the pdfwrite device does have to manipulate the image, so it has to decompress it.
You can select the compression filter you want to use, instead of permitting automatic selection, by using the GrayImageFilter distiller parameter see here.

Related

Supress Caution: quantization tables are too coarse for baseline JPEG

I have a custom built libtiff with jpg support. When I create TIFF images using JPG compression with a low quality, like 10, I get the following warning printed to stderr:
Caution: quantization tables are too coarse for baseline JPEG
I am looking to suppress this warning. I know it is an issue, but the intent of these is a super small quick look image before the full size image is created, so it is meaningless under the circumstance and floods stderr.
What I don't know is, is there a mechanism to suppress these via the TIFF interface or am I going to have to recompile the libraries and set a field, and if so, is there a CMake option I missed, or do I need to modify code.

Bad quality of images using GDI+ with PostScript driver

I'm developing a program to print images of different formats (BMP, JPEG, EMF, ...) on HDC using C++ and Windows GDI+. Using the MS Publisher Imagesetter driver I can generate a postscript file and through GhostScript functions I obtain the PDF file. If I try to print the following image:
I obtain the following bad quality result with those strange squares (not present on original image):
The part of my code that I used to print the image is:
SetMapMode(hdcPrint,MM_TEXT);
Gdiplus::Graphics graphics(hdcPrint);
graphics.SetPageUnit(Gdiplus::UnitMillimeter);
Gdiplus::Image* image = Gdiplus::Image::FromFile(srPicture->swPathImage);
graphics.DrawImage(image,x,y,w,h);
I tried to print the same image with many drivers and different kind of formats (different from PostScript: PDF, EMF, real printer) and the result is always acceptable (the squares are not present).
Furthermore, I tried to open the bad quality result with a pdf reader different from Adobe Acrobat Reader DC (Wondershare PDFelement and Chrome) and, even then, the result is acceptable.
I also noticed that if the image contains some different shapes (i.e. a big red line, like in the next image) the result is good too.
At this point, I have no idea if the problem is Adobe reader or my implementation.
Is there a differnt way to print different formats images with GDI+ (or pure GDI)?
The PostScript file generated is this.

Well... You haven't supplied either the PostScript or PDF files, which makes it really hard to comment.
Its not completely obvious to me at what point you are getting the image you show, is this what you see on the PDF file ? Is it something you are getting when printing the PDF file to a physical printer ? If its the lattter, how are you printing the PDF file to the printer ?
The JPEG you have supplied a link to is really small (6Kb), are you genuinely trying to use that JPEG file ?
My guess (and in the absence of any files, a guess is all it can be) that you are using an old version of Ghostscript. Old versions would decompress the JPEG image, then recompress the image using whatever filter produced the smallest result, usually JPEG again.
Because JPEG is a lossy format, every time you apply it to an image the quality decreases.
Newer versions of Ghostscript don't decompress the JPEG image data when going to the pdfwrite device, unless other optsions (eg Colour conversion, image downsampling etc) make it neccesary. The current version of Ghostscript is 9.27 and the release of 9.28 is imminent, I'd suggest you try one of those.
Another possibility would be that either the PostScript program has been created in such a way as to degenerate every image smaple to a rectangle, or you are using an extremely old version of Ghostscript where that technique was also used.
Note that none of these would, in my opinion, lead to exactly the result you've pasted here, but the version is certainly worth investigating. Posting the PostScript program file (ie the file you send to Ghostscript) would be more helpful, because it would allow me to at least narrow down where the problem has occured.
[EDIT]
The fault appears to be an intriguing bug in Acrobat.
The PostScript program uses a colour transfer function to invert the colour samples of the RGB JPEG image. (this is a frowned upon practice, its not what transfer functions are for, but its not uncommon). Ghostscript's pdfwrite device preserves the transfer function.
When rendered Ghostscript correctly produces the expected result, Acrobat, however, spectacularly does not, I have no idea what kind of mess they've made which leads to the result you get but its clearly wrong.
If I alter Ghostscript's pdfwrite production settings to Apply transfer functions instead of preserving them:
-c "<</TransferFunctionInfo /Apply>> setdistillerparams" -f PostScript.ps
then the resulting file views correctly in Acrobat. If I modify Adobe Acorbat's settings so that it uses Preserve instead of Apply for transfer functions (look in Settings->Edit Adobe PDF Settings, then the Color tab, and at 'when transfer functions are found', set the drop-down to Preserve instead of Apply) the resulting PDF file renders correctly in Ghostscript, and the same kind of incorrectly in Acrobat as the Ghostscript pdfwrite output file.
In short I'm afraid what you are seeing here is an Acrobat rendering bug, you can work around it by altering the Ghostscript transfer function settings as above but its really not a problem in Ghostscript.

Writing image data to the disk as fast as possible

I am writing an application that generates a huge amount of images. Each frame is 1280x800 pixels large and has 1 byte per pixel for color information (greyscale). Each of the frames must be written to disk.
Currently I simply dump the raw pixel data to a binary file on the disk. The file can then be viewed with a special viewer I also created.
This is a very unsatisfactory solution, since the images can't be viewed/processed directly. They always have to run through my custom viewer/converter.
Is there an image format I could use to write my images to disk that:
Is fast to be written (no compression etc.)
Does not increase the final file's size much
Supports dumping my raw pixel buffer in there (no alignemnt changes etc.)
Can be read by common applications (Windows Explorer, Paint, Photoshop etc.)
I already tried to use .png, but the file generation takes much too long due to the compression.

Have a look at the binary Portable GrayMap (P5) format. It consists of an extremely simple header followed by raw image data (without any alignment requirements), and is widely supported by image viewers.

Both bmp and tiff can be used to save raw data. Bmp has the oddity of having image upside down, unless height is negative. And tiff has plenty of encoding options. It should be anyway feasible to reverse engineer the format to be used as a template, where the image data is copy pasted. So no need to use a library: just a header, image data and an optional footer concatenated.

Shrink the size of a .png file

There are many programs that claim to reduce the size of a .png file but none of the well known ones, optipng , pngcrush , pngquant, allow me to shrink to a specified size. pngcrush tried its hardest, but the result was still way to big for my needs. For .jpg files, jpegoptim has an -m option that does allow me to shrink to the size I need. The obvious solution seemed to be to convert to jpg, shrink to the right size, then convert back, but that doesn't work either, the reconstituted .png file just jumps back to its original size.
Presumably, this has something to do with the structure of .png files.
Is there any way to get a small png file? This png file is an example of something i need to shrink to below 1K bytes.
Thanks for any suggestions!

Use ImageMagick to reduce the colors, then pngcrush to get rid of ancillary chunks:
magick in.png -colors 8 temp.png
pngcrush -rem alla temp.png out.png
results in a 1621-byte file. If you have an older version of ImageMagick, use "convert" instead of "magick". Using "-colors 4" instead of "-colors 8" gets you a 1015-byte file, but the dithering looks very spotty.
Note that these preserve the transparency in the image, while converting to JPEG loses the transparency and makes the background a solid color.

The only solution to your problem that I can think of is to use .jpg instead of .png. The .jpg format was mainly created for its high lossy compression but still gets a good enough image. On the other hand, .png is going for the full transparency and no quality loss. To sum it all up, .jpg is ideal for getting smaller files if quality doesn't matter, and .png is perfect for high-quality images that quality and colour really matter.
Sources:
http://www.labnol.org/software/tutorials/jpeg-vs-png-image-quality-or-bandwidth/5385/, http://www.interactivesearchmarketing.com/jpeg-png-proper-image-formatting/

I can get that 9.5 KB file down to 3.4 KB using the 8-bit palette PNG format. The image has a transparent boundary, which adds unnecessary pixels and an alpha channel for the whole image which isn't needed, since it's rectangular. After stripping the transparent boundary, eliminating the alpha channel, and using a palette, I can get it down to 3.2 KB.
To get any further, I have to use JPEG for lossy compression. At a very low image quality of 5 (out of 100), I can get it down to 1 KB. It shows some artifacts from the severe compression (look around the prompt > and _ to see some of those):

Can't find logic behind png file sizes

I'm saving a large number of small png files for use in a game on a phone, so space is at a premium.
I'm trying to figure out the logic behind the file sizes so I can save things most efficiently, but even after using pngcrush the sizes are totally inconsistent.
I saved a 1x1 image and it takes 3kb. I have another 23x21 image which takes only 2kb. I have two images which are almost the same size, but one takes 6kb and the other takes 13kb. I doubled the image height and copied one image into the empty space of the other and saved that. The combined image is only 11kb!
Why is a 1x1 image larger than a 23x21 image? Why can I combine a 13kb image and a 6kb image and get an 11kb image?
Here are the images I'm talking about (there's a 1x1 pixel in between the 1st and second images. It's difficult to see, so I'll just give the URL: http://g42.org/temp/png/1x1.png):
example http://g42.org/temp/png/hat.png
example http://g42.org/temp/png/1x1.png
example http://g42.org/temp/png/helmet1.png
example http://g42.org/temp/png/helmet2.png
example http://g42.org/temp/png/helmet1_2.png

It's not a compression thing, the problem with the 1x1 image is that it has metadata (added by Photoshop, it seems), a color profile (iCCP chunk). If you look inside the binary, its' the data between the strings "iCCP" and "IDAT", it could be removed and you get a 69 bytes file.
If you reopen and save the file most image viewers (xnview), or use pngcrush, you can strip that chunk. : See it here : http://i.stack.imgur.com/fmOdA.png
And regarding the helmet images: besides other informational chunks (imageReady ads some informational text, as you can see), the difference is due to different formats: the two-helmets is a paletted image (8bits per pixel), the single helmet is a RGB with alpha (32bits per pixel)

PNG compression is based on the same algorithm as zlib and is highly sensitive to the data that is being compressed so you won't see a consistent relationship between image size and file size. In the case of the combined image, it is still bigger than the smaller image and given the similarity of the two halves of the image, the compressor was probably able to reuse a lot of the Huffman tree. I don't know enough about the algorithm to say for certain how it ended up smaller than the other half.
As long as you are not seeing oddities like the 1x1 image, which you seem to have figured out in the comments, I don't think this will make a lot of sense without extensive study of image compression.

There is a great utility called pngcrush
http://pmt.sourceforge.net/pngcrush/
Compressing to PNG is a rather difficult task - there are lost of assumptions and strategies to try - do we create a palette, or are we better off without it?
PNGcrush essentially bruteforces 100+ different compression strategies, while at the same time trimming useless tags and sections.

PNG has several sub-formats: 24-bit with or without alpha, 8-bit (includes alpha), grayscale, etc. which use different amount of bytes per pixel and have different "compressibility".
Plus PNG supports several compression tricks (filters and gzip settings) which affect how well image data is compressed.
On top of that PNG can contain metadata, which sometimes can be pretty large, like some embedded color profiles.
ImageAlpha converts images to the most space-efficient PNG8+alpha variant.
ImageOptim removes junk metadata and finds best compression parameters.
With a combination of those two your images can be reduced by 30-50%.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js