Image steganography that could survive jpeg compression

Image steganography that could survive jpeg compression - compression

I am trying to implement a steganographic algorithm where hidden message could survive jpeg compression.
The typical scenario is the following:
Hide data in image
Compress image using jpeg
The hidden data is not destroyed by jpeg compressiona nd could be restored
I was trying to use different described algorithms but with no success.
For example I was trying to use simple repetition code but the jpeg compression destroyed hidden data. Also I was trying to implementt algorithms described by the following articles:
http://nas.takming.edu.tw/chkao/lncs2001.pdf
http://www.securiteinfo.com/ebooks/palm/irvine-stega-jpg.pdf
Do you know about any algorithm that actually can survive jpeg compression?

You can hide the data in the frequency domain, JPEG saves information using DCT (Discrete Cosine Transform) for every 8x8 pixel block, the information that is invariant under compression is the highest frequency values, and they are arranged in a matrix, the lossy compression is done when the lowest coefficients of the matrix are rounded to 0 after the quantization of the block, these zeroes are arranged in the low-right part of the matrix and that is why the compression works and the information is lost.

Quite a few applications seem to implement Steganography on JPEG, so it's feasible:
http://www.jjtc.com/Steganography/toolmatrix.htm
Here's an article regarding a relevant algorithm (PM1) to get you started:
http://link.springer.com/article/10.1007%2Fs00500-008-0327-7#page-1

Perhaps the answer is late,but ...
You can do it in compressed domain steganography.Read image as binary file and analysis this file with libs like JPEG Parser. Based on your selected algorithm, find location of venues and compute new value of this venue and replace result bits in file data. Finally write file in same input extension.
I hope I helped.

What you're looking for is called watermarking.
A little warning: Watermarking algorithms use insane amounts of redundancy to ensure high robustness of the information being embedded. That means the amount of data you'll be able to hide in an image will be orders of magnitude lower compared to standard steganographic algorithms.

Related

Serialize OpenCv Mat using JSON in C++

I'm trying to write a TCP client/server application that transmits objects containing OpenCv Mat. I'd like to serialize these objects using JSON. I found some libraries that help me in doing that (rapidjson), but they of course do not take into account images as object members.
What would you suggest to serialize in a JSON object a cv::Mat variable? How can I use RapidJson, for example, to achieve that?

imencode can be used to encode an viewable image (with CV_8UC1 or CV_8UC3 pixel formats) into a std::vector<uchar>. Link to documentation.
The vector<uchar> will contain the same bytes as if OpenCV had saved the image into one of the supported image file formats (such as JPEG or PNG) and then have the file bytes loaded back into a byte array.
imencode can be found in highgui module when using OpenCV 2.x, or imgcodecs module when using OpenCV 3.x.
With the compressed data in a vector<uchar>, you can use Base64 encoding to format it into a string, which can then be added as a JSON value inside a JSON object.
When using JSON to transmit large amounts of data, consider very very carefully the character encoding format that the JSON library is instructed to emit. Normally, If a large portion of the data is going to be Base64, you will want to make sure the JSON is emitted in UTF8.
If you have the option of sending in binary (which requires an "out-of-band" design in the web service, something not always doable), it should be seriously considered.
When considering different serialization choices for images, these things should be taken into account:
Typical image sizes (total number of pixels)
Size efficiency is less of a concern if images are small.
Pixel format (number of channels and precision)
Most common image file formats will only allow 8-bit grayscale and 24-bit RGB pixel data. Trying to save higher-precision pixel data into these image formats will result in partial loss of precision.
Available transmission bandwidth (if it is scarce enough to be a concern). With less available bandwidth, compression becomes more important.
Compression options.
Typical (photographic or synthetic) images are highly compressible due to the common sense that images that are too "dense" will be too hard to comprehend when viewed by human eyes.
Compression can be lossless or lossy.
Choice of compression may depend on the statistical characteristics of the pixel values (image content).
As mentioned above, if compression is performed by encoding into some image formats, you have to make sure the image format can satisfy the pixel value precision requirements of your application.
If no existing image format meets your requirements and you still want to perform lossless compression, consider using the zlib API that is integrated into the OpenCV Core module.
If you are good at image processing and data compression theory, you may be able to devise an application-specific compression method based on your own needs.
Remember that reducing the image resolution can be a powerful (and super-lossy) way of reducing the transmission file size. Consider carefully what minimum image resolution is actually needed for your application.
Other considerations
Binary or text
Endianness
Availability of highgui, imgcodecs or an image decoder for the chosen image format on the receiving end.
Information source: just did this a few months ago.

Assessing the quality of an image with respect to compression?

I have images that I am using for a computer vision task. The task is sensitive to image quality. I'd like to remove all images that are below a certain threshold, but I am unsure if there is any method/heuristic to automatically detect images that are heavily compressed via JPEG. Anyone have an idea?

Image Quality Assessment is a rapidly developing research field. As you don't mention being able to access the original (uncompressed) images, you are interested in no reference image quality assessment. This is actually a pretty hard problem, but here are some points to get you started:
Since you mention JPEG, there are two major degradation features that manifest themselves in JPEG-compressed images: blocking and blurring
No-reference image quality assessment metrics typically look for those two features
Blocking is fairly easy to pick up, as it appears only on macroblock boundaries. Macroblocks are a fixed size -- 8x8 or 16x16 depending on what the image was encoded with
Blurring is a bit more difficult. It occurs because higher frequencies in the image have been attenuated (removed). You can break up the image into blocks, DCT (Discrete Cosine Transform) each block and look at the high-frequency components of the DCT result. If the high-frequency components are lacking for a majority of blocks, then you are probably looking at a blurry image
Another approach to blur detection is to measure the average width of edges of the image. Perform Sobel edge detection on the image and then measure the distance between local minima/maxima on each side of the edge. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous approach. "No Reference Block Based Blur Detection" by Debing is a more recent paper
Regardless of what metric you use, think about how you will deal with false positives/negatives. As opposed to simple thresholding, I'd use the metric result to sort the images and then snip the end of the list that looks like it contains only blurry images.
Your task will be a lot simpler if your image set contains fairly similar content (e.g. faces only). This is because the image quality assessment metrics
can often be influenced by image content, unfortunately.
Google Scholar is truly your friend here. I wish I could give you a concrete solution, but I don't have one yet -- if I did, I'd be a very successful Masters student.
UPDATE:
Just thought of another idea: for each image, re-compress the image with JPEG and examine the change in file size before and after re-compression. If the file size after re-compression is significantly smaller than before, then it's likely the image is not heavily compressed, because it had some significant detail that was removed by re-compression. Otherwise (very little difference or file size after re-compression is greater) it is likely that the image was heavily compressed.
The use of the quality setting during re-compression will allow you to determine what exactly heavily compressed means.
If you're on Linux, this shouldn't be too hard to implement using bash and imageMagick's convert utility.
You can try other variations of this approach:
Instead of JPEG compression, try another form of degradation, such as Gaussian blurring
Instead of merely comparing file-sizes, try a full reference metric such as SSIM -- there's an OpenCV implementation freely available. Other implementations (e.g. Matlab, C#) also exist, so look around.
Let me know how you go.

I had many photos shot to an ancient book (so similar layout, two pages per image), but some were much blurred, to the point that the text could not be read. I searched for a ready-made batch script to find the most blurred one, but I didn't find any useful, so I used another part of script got on the net (based on ImageMagick, but no longer working; I couldn't retrieve the author for the credits!), useful to assessing the blur level of a single image, tweaked it, and automatised it over a whole folder. I uploaded here:
https://gist.github.com/888239
hoping it will be useful for someone else. It works on a Linux system, and uses ImageMagick (and some usually command line installed tools, as gawk, sort, grep, etc.).

One simple heuristic could be to look at width * height * color depth < sigma * file size. You would have to determine a good value for sigma, of course. sigma would be dependent on the expected entropy of the images you are looking at.

jpeg compression ratio

Is there a table that gives the compression ratio of a jpeg image at a given quality?
Something like the table given on the wiki page, except for more values.
A formula could also do the trick.
Bonus: Are the [compression ratio] values on the wiki page roughly true for all images? Does the ratio depend on what the image is and the size of the image?
Purpose of these questions: I am trying to determine the upper bound of the size of a compressed image for a given quality.
Note: I am not looking to make a table myself(I already have). I am looking for other data to check with my own.

I had exactly the same question and I was disappointed that no one created such table (studies based on a single classic Lena image or JPEG tombstone are looking ridiculous). That's why I made my own study. I cannot say that it is perfect, but it is definitely better than others.
I took 60 real life photos from different devices with different dimensions. I created a script which compress them with different JPEG quality values (it uses our company imaging library, but it is based on libjpeg, so it should be fine for other software as well) and saved results to CSV file. After some Excel magic, I came to the following values (note, I did not calculated anything for JPEG quality lower than 55 as they seem to be useless to me):
Q=55 43.27
Q=60 36.90
Q=65 34.24
Q=70 31.50
Q=75 26.00
Q=80 25.06
Q=85 19.08
Q=90 14.30
Q=95 9.88
Q=100 5.27
To tell the truth, the dispersion of the values is significant (e.g. for Q=55 min compression ratio is 22.91 while max value is 116.55) and the distribution is not normal. So it is not so easy to understand what value should be taken as typical for a specific JPEG quality. But I think these values are good as a rough estimate.
I wrote a blog post which explains how I received these numbers.
http://www.graphicsmill.com/blog/2014/11/06/Compression-ratio-for-different-JPEG-quality-values
Hopefully anyone will find it useful.

Browsing Wikipedia a little more led to http://en.wikipedia.org/wiki/Standard_test_image and Kodak's test suite. Although they're a little outdated and small, you could make your own table.
Alternately, pictures of stars and galaxies from NASA.gov should stress the compressor well, being large, almost exclusively composed of tiny speckled detail, and distributed in uncompressed format. In other words, HUBBLE GOTCHOO!

The compression you get will depend on what the image is of as well as the size. Obviously a larger image will produce a larger file even if it's of the same scene.
As an example, a random set of photos from my digital camera (a Canon EOS 450) range from 1.8MB to 3.6MB. Another set has even more variation - 1.5MB to 4.6MB.

If I understand correctly, one of the key mechanisms for attaining compression in JPEG is using frequency analysis on every 8x8 pixel block of the image and scaling the resulting amplitudes with a "quantization matrix" that varies with the specified compression quality.
The scaling of high frequency components often result in the block containing many zeros, which can be encoded at negligible cost.
From this we can deduce that in principle there is no relation between the quality and the final compression ratio that will be independent of the image. The number of frequency components that can be dropped from a block without perceptually altering its content significantly will necessarily depend on the intensity of those components, i.e. whether the block contains a sharp edge, highly variable content, noise, etc.

If I take a loss-compressed file and save it again (e.g. JPEG) will there be loss of quality?

I've often wondered, if I load a compressed image file, edit it and the save it again, will it loose some quality? What if I use the same quality grade when saving, will the algorithms somehow detect that the file has already be compressed as a JPEG and therefore there is no point trying to compress the displayed representation again?
Would it be a better idea to always keep the original (say, a PSD) and always make changes to it and then save it as a JPEG or whatever I need?

Yes, you will lose further file information. If making multiple changes, work off of the original uncompressed file.

When it comes to lossy compression image formats such as JPEG, successive compression will lead to perceptible quality loss. The quality loss can be in the forms such as compression artifacts and blurriness of the image.
Even if one uses the same quality settings to save an image, there will still be quality loss. The only way to "preserve quality" or better yet, lose as little quality as possible, is to use the highest quality settings that is available. Even then, there is no guarantee that there won't be quality loss.
Yes, it would be a good idea to keep a copy of the original if one is going to make an image using a lossy compression scheme such as JPEG. The original could be saved with a compression scheme which is lossless such as PNG, which will preserve the quality of the file at the cost of (generally) larger file size.
(Note: There is a lossless version of JPEG, however, the most common one uses techniques such as DCT to process the image and is lossy.)

In general, yes. However, depending on the compression format there are usually certain operations (mainly rotation and mirroring) that can be performed without any loss of quality by software designed to work with the properties of the file format.
Theoretically, since JPEG compresses each 8x8 block pf pixels independantly, it should be possible to keep all unchanged blocks of an image if it is saved with the same compression settings, but I'm not aware of any software that implements this.

Of course. Because level of algorithm used initially will probably be different than in your subsequent saves. You can easily check this by using an Image manipulation software (eg. Photoshop). Save your file several times and change level of of compression each time. Just a slight bit. You'll see image degradation.

If the changes are local (fixing a few pixels, rather than reshading a region) and you use the original editing tool with the same settings, you may avoid degradation in the areas that you do not affect. Still, expect some additional quality loss around the area of change as the compressed blocks are affected, and cannot be recovered.
The real answer remains to carry out editing on the source image, captured without compression where possible, and applying the desired degree of compression before targeting the image for use.

Yes, you will always lose a bit of information when you re-save an image as JPEG. How much you lose depend on what you have done to the image after loading it.
If you keep the image the same size and only make minor changes, you will not lose that much data. When the image is loaded, an approximation of the original image is recreated from the compressed data. If you resave the image using the same compression, most of the data that you lose will be data that was recreated when loading.
If you resize the image, or edit large areas of it, you will lose more data when resaving it. Any edited part of the image will lose about the same amount of information as when you first compressed it.
If you want to get the best possible quality, you should always keep the original.

What is the cause/use/reason for the blocks that show up in high compression videos?

Be patient since I haven't worked with compression algorithms much so this may be obvious to some of you. Something I've always noticed when some streaming video starts to lag. I only realized I was curious when looking over this question:
Twitter image encoding challenge
I'm not talking about the pixels themselves but rather the grid like layout that results from the compression. What sort of algorithm or technique is this indicative of? What can you tell me about it?

Take a look at this Wikipedia article on MPEG-2. To quote a part of it:
Briefly, the raw frame is divided into 8 pixel by 8 pixel blocks. The data in each block is transformed by a discrete cosine transform. The result is an 8 by 8 matrix of coefficients. The transform converts spatial variations into frequency variations, but it does not change the information in the block; the original block can be recreated exactly by applying the inverse cosine transform.
In other words, the grid-like structure you see is a direct effect of this DCT being applied to the 8x8 blocks of pixels.

The rationale for blocks is linked to the location/frequency trade off. The image is divided into blocks before the compression in the spectral domain (DCT) so that the artefacts due to the compression are more localized. In standard JPEG, the blocks are of constant size on the whole picture. For more recent formats like JPEG2000, the blocks are adapted to the picture, using wavelets. I am not familiar with video formats details, but the rationale is the same.
This is the same phenomenon for audio coding (mp3): instead of computing the spectrum on the whole audio file, you split the file into some sections of a few samples (a few hundred generally for 44.1 kHz signals). And similarly, if there is corruption of the compressed data (network, corrupted file), you will hear noises which are due to missing windows.

It's called Macroblocking.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js