Lossless manipulation of JPEG in C++

Lossless manipulation of JPEG in C++ - c++

I have a JPEG file on disk which is not quite normal - this JPEG file has additional rubbish data appended behind End Of Image (FFD9). This JPEG file can still be opened by JPEG viewers though.
I wish to remove those additional information behind End Of Image (FFD9). The constraints are:
Must be done programatically
Must be lossless
Must use native code, e.g. C++
One way I've found out is to re-save the file using IrfanView command-line tool, using lossless JPG_TRANSFORM's command: /jpg_rotate. The additional data behind are automatically stripped off.
However, is there another way in which I can do it in C++ code, If possible, I do not wish to use exes like IrfanView to do it. I wish to do everything in code to keep things lean.
I am thinking of detecting the End Of File (FFD9), then save the buffer before EOF into another JPEG. But, how can I save the buffer losslessly?

I guess this is a file handling question and NOT an Image Processing related.
All you need to do write all data till EOF in your Image file into the resultant Image file.

Related

How to add JPEG comments to an existing JPEG image file

Is there a way to add JPEG comments ("COM" markers) to an existing JPEG image file using libjpeg?
It is certainly possible to do so by first decompressing an existing image to an in-memory buffer and then compressing the raw image again with jpeg_write_marker( ... JPEG_COM ... ) to add comments, and saving to disk. Unless there is a need to decompress first, doing this seems to be an overkill.

There's a tool called wrjpgcom, it's part of libjpeg. I think it does what you want. Perhaps you could look through its source to find out how it's done.

You can use jpeg_write_marker() during the writing of the output file to write the comment after setting it up. Then, use jpeg_read_coefficients() and jpeg_write_coefficients() (in place of the ordinary jpeg_read_scanlines() and jpeg_write_scanlines()) to read and write the raw, compressed data without actually decompressing and recompressing it.
See the section "Really raw data: DCT coefficients" in the libjpeg documentation. Be sure to read all the caveats mentioned there.

Someone may use JPEG Comment Editor created by Mwisoft. It automate adding / editing JPEG comments using Windows, instead of right-click JPEG file and click properties to manually add comments.

why is my libharu pdf oversized with .png images?

I am creating a pdf using libharu in C++ (compiled as a .cgi) that features .png images.
The code is fine, but my pdf's are ridiculously oversized.
Each page features one image of around 30kb and around 4 text characters in libharu's system font. If I open a 20 page output file of 25mb and "print" it to a file in my operating system it becomes 256kb or so with no visible change to the images.
I think the issue is related to libharu because this guy see's it too, here. He is using php so, libharu as a compiled .cgi. (my C++ code is also compiled .cgi, linked to libharu).
Another guy here on stack overflow has also seen size issues with libharu, but his problem does not mention anything to do with .png so it may be unrelated.
Code for reference:
WorkingGraphic = HPDF_LoadPngImageFromMem ( *gPdfPtr,
PngAssets[AssetIndex], //Image data ptr
PngSizes[AssetIndex]); //data length
//Render Appropriate
HPDF_Page_DrawImage (*BlitParams->page,
WorkingGraphic,
BlitParams->OutputRect->X,
BlitParams->OutputRect->Y,
BlitParams->OutputRect->Width,
BlitParams->OutputRect->Height);
Does anyone know how to drive libharu so it creates sensible sized pdf's when you use .png images?

Right I don't know how to remove a question but maybe this info will be useful to others anyway.
I may have had the same issue as this fellow here where I have duplicated this answer.
What I needed to do was enable compression of the .pdf, which I had not done.
Documentation link
C Code:
HPDF_SetCompressionMode (pdf, HPDF_COMP_ALL);
It's because I didn't do enough research to know that .pdf format does not natively support .png, or if it has been updated to do so, libharu still doesn't. So, this option tells libharu to use zlib to zip compress everything it can, including your images.
The implementation is not perfect (you will still see a size difference if you zip your output .pdf) but it is acceptable for my use case.

If you don't need the full-size image in the PDF, you can reduce the image to a thumbnail using GDI+ APIs, equal in size to however big you want the image to appear in the PDF.
Save the scaled PNG to a temporary file, and pass the thumbnail PNG to Haru PDF. This will reduce the size of the PDF file.
The image will be pixellated when the viewer zooms in.

Can we load, display and manipulate image's matrix without using any library in c++?

is it possible to do changes to image's matrix without using any library in c++? to load and display image as well?

Sure. Grab a copy of the specification for whatever image format you're interested and write the read/write functions yourself.
Note that to write display functionality without an external library you'll likely need to run your code in kernel mode to get to the frame buffer memory, but that can certainly be done.
Not that you'd necessarily want to do it that way...

Like any typical file, an image file is simply made up of bytes; there is nothing special about an image file.
In my opinion, the most difficult part of reading/writing image files without the use of a library is understanding the file format. Once you understand the format, all you need to do is define appropriate data structures and read the image data into them (for more advanced formats you may have to do some extra work e.g. decompression).
The simplest image format to work with would have to be PPM. It's a pretty bad format but it's nice and easy to read in and write back to a file.
http://netpbm.sourceforge.net/doc/ppm.html
Apart from that, bitmaps are also pretty simple to work with. Like Drew said, just download a copy of the specification and work from there.
As for displaying images, I think you're best off using a library or framework unless you want to see how it's done for the sake of learning.

C++ Importing and Renaming/Resaving an Image

Greetings all,
I am currently a rising Sophomore (CS major), and this summer, I'm trying to teach myself C++ (my school codes mainly in Java).
I have read many guides on C++ and gotten to the part with ofstream, saving and editing .txt files.
Now, I am interested in simply importing an image (jpeg, bitmap, not really important) and renaming the aforementioned image.
I have googled, asked around but to no avail.
Is this process possible without the download of external libraries (I dled CImg)?
Any hints or tips on how to expedite my goal would be much appreciated

Renaming an image is typically about the same as renaming any other file.
If you want to do more than that, you can also change the data in the Title field of the IPTC metadata. This does not require JPEG decoding, or anything like that -- you need to know the file format well enough to be able to find the IPTC metadata, and study the IPTC format well enough to find the Title field, but that's about all. Exactly how you'll get to the IPTC metadata will vary -- navigating a TIFF (for one example) takes a fair amount of code all by itself.

When you say "renaming the aforementioned image," do you mean changing metadata in the image file, or just changing the file name? If you are referring to metadata, then you need to either understand the file format or use a library that understands the file format. It's going to be different for each type of image file. If you basically just want to copy a file, you can either stream the contents from one file stream to another, or use a file system API.
std::ifstream infs("input.txt", std::ios::binary);
std::ofstream outfs("output.txt", std::ios::binary);
outfs << insfs.rdbuf();
An example of a file system API is CopyFile on Win32.

It's possible without libraries - you just need the image specs and 'C', the question is why?
Targa or bmp are probably the easiest, it's just a header and the image data as a binary block of values.
Gif, jpeg and png are more complex - the data is compressed

How to check if file is/isn't an image without loading full file? Is there an image header-reading library?

edit:
Sorry, I guess my question was vague. I'd like to have a way to check if a file is not an image without wasting time loading the whole image, because then I can do the rest of the loading later. I don't want to just check the file extension.
The application just views the images. By 'checking the validity', I meant 'detecting and skipping the non-image files' also in the directory. If the pixel data is corrupt, I'd like to still treat it as an image.
I assign page numbers and pair up these images. Some images are the single left or right page. Some images are wide and are the "spread" of the left and right pages. For example, pagesAt(3) and pagesAt(4) could return the same std::pair of images or a std::pair of the same wide image.
Sometimes, there is an odd number of 'thin' images, and the first image is to be displayed on its own, similar to a wide image. An example would be a single cover page.
Not knowing which files in the directory are non-images means I can't confidently assign those page numbers and pair up the files for displaying. Also, the user may decide to jump to page X, and when I later discover and remove a non-image file and reassign page numbers accordingly, page X could appear to be a different image.
original:
In case it matters, I'm using c++ and QImage from the Qt library.
I'm iterating through a directory and using the QImage constructor on the paths to the images. This is, of course, pretty slow and makes the application feel unresponsive. However, it does allow me to detect invalid image files and ignore them early on.
I could just save only the paths to the images while going through the directory and actually load them only when they're needed, but then I wouldn't know if the image is invalid or not.
I'm considering doing a combination of these two. i.e. While iterating through the directory, reading only the headers of the images to check validity and then load image data when needed.
So,
Will just loading the image headers be much faster than loading the whole image? Or is doing a bit of i/o to read the header mean I might as well finish off loading image in full? Later on, I'll be uncompressing images from archives as well, so this also applies to uncompressing just the header vs uncompressing the whole file.
Also, I don't know how to load/read just the image headers. Is there a library that can read just the headers of images? Otherwise, I'd have to open each file as a stream and code image header readers for all the filetypes on my own.

The Unix file tool (which has been around since almost forever) does exactly this. It is a simple tool that uses a database of known file headers and binary signatures to identify the type of the file (and potentially extract some simple information).
The database is a simple text file (which gets compiled for efficiency) that describes a plethora of binary file formats, using a simple structured format (documented in man magic). The source is in /usr/share/file/magic (in Ubuntu). For example, the entry for the PNG file format looks like this:
0 string \x89PNG\x0d\x0a\x1a\x0a PNG image
!:mime image/png
>16 belong x \b, %ld x
>20 belong x %ld,
>24 byte x %d-bit
>25 byte 0 grayscale,
>25 byte 2 \b/color RGB,
>25 byte 3 colormap,
>25 byte 4 gray+alpha,
>25 byte 6 \b/color RGBA,
>28 byte 0 non-interlaced
>28 byte 1 interlaced
You could extract the signatures for just the image file types, and build your own "sniffer", or even use the parser from the file tool (which seems to be BSD-licensed).

Just to add my 2 cents: you can use QImageReader to get information about image files without actually loading the files.
For example with the .format method you can check a file's image format.
From the official Qt doc ( http://qt-project.org/doc/qt-4.8/qimagereader.html#format ):
Returns the format QImageReader uses for reading images. You can call
this function after assigning a device to the reader to determine the
format of the device. For example: QImageReader reader("image.png");
// reader.format() == "png" If the reader cannot read any image from
the device (e.g., there is no image there, or the image has already
been read), or if the format is unsupported, this function returns an
empty QByteArray().

I don't know the answer about just loading the header, and it likely depends on the image type that you are trying to load. You might consider using Qt::Concurrent to go through the images while allowing the rest of the program to continue, if it's possible. In this case, you would probably initially represent all of the entries as an unknown state, and then change to image or not-an-image when the verification is done.

If you're talking about image files in general, and not just a specific format, I'd be willing to bet there are cases where the image header is valid, but the image data isn't. You haven't said anything about your application, is there no way you could add in a thread in the background that could maybe keep a few images in ram, and swap them in and out depending on what the user may load next? IE: a slide show app would load 1 or 2 images ahead and behind the current one. Or maybe have a question mark displayed next to the image name until the background thread can verify that validity of the data.

While opening and reading the header of a file on a local filesystem should not be too expensive, it can be expensive if the file is on a remote (networked) file system. Even worse, if you are accessing files saved with hierarchical storage management, reading the file can be very expensive.
If this app is just for you, then you can decide not to worry about those issues. But if you are distributing your app to the public, reading the file before you absolutely have to will cause problems for some users.
Raymond Chen wrote an article about this for his blog The Old New Thing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js