I'm looking for some image compression operations, preferably simple in nature, that provide moderate compression ratios while preserving the edges in the images.
Please note that algorithms like JPEG which pack multiple operations are not applicable (unfortunately).
If you're using numpy, I suggest you take a look at the scipy.misc.imsave method
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.misc.imsave.html
You can easily store your data in png without any loss and with compression ratios along the ranges you mentioned in your comment, e.g.,
rgb = np.zeros((255, 255, 3), dtype=np.uint8)
rgb[..., 0] = np.arange(255)
rgb[..., 1] = 55
rgb[..., 2] = 1 - np.arange(255)
imsave('/tmp/rgb_gradient.png', rgb)
Edit after comment 1:
It is really difficult to answer this question because of the lack of specifics.
Retaining a compressed version of the image in memory will certainly slow down your processing, as you will either need to decode and encode relevant parts of the image in each operation, or you'll need to use very specific algorithms that allow you to access and modify pixel values in the compressed domain (e.g., http://ieeexplore.ieee.org/document/232097/).
Now, to answer your question, the simplest way I can think is to use Huffman coding (https://www.geeksforgeeks.org/greedy-algorithms-set-3-huffman-coding/) and store the codewords in memory. You will probably need to encode groups of pixels together so that each byte of codewords results in more than one pixel (and hence you could have any real compression). Otherwise, you'd need to find a way to efficiently pack small codewords (say 2 or 3 bits) together, which will certainly hinder your ability to read and write individual pixel values.
Related
I'm looking for an audio or image compression algorithm that can compress a torrent of 16-bit samples
by a fairly predictable amount (2-3x)
at very high speed (say, 60 cycles per sample at most: >100MB/s)
with lossiness being acceptable but, of course, undesirable
My data has characteristics of images and audio (2-dimensional, correlated in both dimensions and audiolike in one dimension) so algorithms for audio or images might both be appropriate.
An obvious thing to try would be this one-dimensional algorithm:
break up the data into segments of 64 samples
measure the range of values among those samples (as an example, the samples might be between 3101 and 9779 in one segment, a difference of 6678)
use 2 to 4 additional bytes to encode the range
linearly downsample each 16-bit sample to 8 bits in that segment.
For example, I could store 3101 in 16 bits, and store a scaling factor ceil(6678/256) = 27 in 8 bits, then convert each 16-bit sample to 8-bit as s8 = (s16 - base) / scale where base = 3101 + 27>>1, scale = 27, with the obvious decompression "algorithm" of s16 = s8 * 27 + 3101.) Compression ratio: 128/67 = 1.91.
I've played with some ideas to avoid the division operation, but hasn't someone by now invented a superfast algorithm that could preserve fidelity better than this one?
Note: this page says that FLAC compresses at 22 million samples per second (44MB/s) at -q6 which is pretty darn good (assuming its implementation is still single-threaded), if not quite enough for my application. Another page says FLAC has similar performance (40MB/s on a 3.4GHz i3-3240, -q5) as 3 other codecs, depending on quality level.
Take a look at the PNG filters for examples of how to tease out your correlations. The most obvious filter is "sub", which simply subtracts successive samples. The differences should be more clustered around zero. You can then run that through a fast compressor like lz4. Other filter choices may result in even better clustering around zero, if they can find advantage in the correlations in your other dimension.
For lossy compression, you can decimate the differences before compressing them, dropping a few low bits until you get the compression you want, and still retain the character of the data that you would like to preserve.
I have a video source that produce many streams for different devices (such as: HD television, Pads, smart phones, etc.), every of them has to be checked within each other for similarity. The video stream release 50 images per second, one image every 20 milliseconds.
Lets take for instance img1 coming from stream1 at time ts1=1, img2 coming from stream2 at ts2=1 and img1.1 taken from stream1 at ts=2 (20 milliseconds later than ts=1), the comparison result should look something like this:
compare(img1, img1) = 1 same image same size
compare(img1, img2) = 0.9 same image different size
compare(img1, img1.1) = 0.8 different images same size
ideally this should be done real time, so within 20 millisecond, the goal is to understand if the streams are out of synchronization, I already implemented some compare methods (nobody of them works for this case yet):
1) histogram (SSE and OpenCV cuda), result compare(img1, img2) ~= compare(img1, img1.1)
2) pnsr (SSE and OCV cuda), result compare(img1, img2) < compare(img1, img1.1)
3) ssim (SSE and OCV cuda), resulting the same as pnsr
Maybe I get bad results because of the resize interpolation method?
Is it possible to realize a comparison method that fulfill my requirements, any ideas?
I'm afraid that you're running into a Real Problem (TM). This is not a trivial lets-give-it-to-the-intern problem.
The main challenge is that you can't do a brute-force comparison. HD images are 3 MB or more, and you're talking about O(N*M) comparisons (in time and across streams).
What you essentially need is a fingerprint that's robust against resizing but time-variant. And as you didn't realize that (the histogram idea for instance is quite time-stable, for instance) you didn't include the necessary information in this question.
So this isn't a C++ question, really. You need to understand your inputs.
I am reading the official WebP lossless bitstream spec. and I have a feeling, that the document is missing some explanation.
Let me describe some fragments of the specification:
1. Introduction - clear
2. Riff header - clear
3. Transformations
The transformations are used only for the main level ARGB image: the
subresolution images have no transforms, not even the 0 bit indicating
the end-of-transforms.
Nowhere earlier was it mentioned, that the container holds some sub-resolution images. What are they? Where are they described, if not in the specification? How to they add to the final image?
Then, in the Predictor transform paragraph:
We divide the image into squares...
..what image? The main image or sub-resolution image? What if the image cannot be divided into squares (apart from pixel-size squares)?
The first 4 bits of prediction data define the block width and height
in number of bits. The number of block columns, block_xsize, is used
in indexing two-dimensionally.
Does this mean that the image width is block_xsize * block_width ?
The transform data contains the prediction mode for each block of the image.
In what way, what format?
I dont know why I am having a hard time understanding this. Maybe because I am not a native english speaker or because the description is too laconic.
I'd appreciate any help in decoding this specification :)
It was mentioned earlier. Right at the top of the document it says:
The format uses subresolution images, recursively embedded into the
format itself, for storing statistical data about the images, such as
the used entropy codes, spatial predictors, color space conversion,
and color table.
These are arrays (or a vector in the case of the color table) of data where each element applies to a block of pixels in the actual image, e.g. a 16x16 block. These "subresolution images" are not themselves subsamples of the image being compressed.
The format description calls them images because they are stored exactly like the main image is in the format. The transforms are instructions to the decoder to apply to the decompressed main image data. The entropy image is used to decompress the main image, by virtue of providing the Huffman codes for each block.
I'm looking to filter a 1 bit per pixel image using a 3x3 filter: for each input pixel, the corresponding output pixel is set to 1 if the weighted sum of the pixels surrounding it (with weights determined by the filter) exceeds some threshold.
I was hoping that this would be more efficient than converting to 8 bpp and then filtering that, but I can't think of a good way to do it. A naive method is to keep track of nine pointers to bytes (three consecutive rows and also pointers to either side of the current byte in each row, for calculating the output for the first and last bits in these bytes) and for each input pixel compute
sum = filter[0] * (lastRowPtr & aMask > 0) + filter[1] * (lastRowPtr & bMask > 0) + ... + filter[8] * (nextRowPtr & hMask > 0),
with extra faff for bits at the edge of a byte. However, this is slow and seems really ugly. You're not gaining any parallelism from the fact that you've got eight pixels in each byte and instead are having to do tonnes of extra work masking things.
Are there any good sources for how to best do this sort of thing? A solution to this particular problem would be amazing, but I'd be happy being pointed to any examples of efficient image processing on 1bpp images in C/C++. I'd like to replace some more 8 bpp stuff with 1 bpp algorithms in future to avoid image conversions and copying, so any general resouces on this would be appreciated.
I found a number of years ago that unpacking the bits to bytes, doing the filter, then packing the bytes back to bits was faster than working with the bits directly. It seems counter-intuitive because it's 3 loops instead of 1, but the simplicity of each loop more than made up for it.
I can't guarantee that it's still the fastest; compilers and especially processors are prone to change. However simplifying each loop not only makes it easier to optimize, it makes it easier to read. That's got to be worth something.
A further advantage to unpacking to a separate buffer is that it gives you flexibility for what you do at the edges. By making the buffer 2 bytes larger than the input, you unpack starting at byte 1 then set byte 0 and n to whatever you like and the filtering loop doesn't have to worry about boundary conditions at all.
Look into separable filters. Among other things, they allow massive parallelism in the cases where they work.
For example, in your 3x3 sample-weight-and-filter case:
Sample 1x3 (horizontal) pixels into a buffer. This can be done in isolation for each pixel, so a 1024x1024 image can run 1024^2 simultaneous tasks, all of which perform 3 samples.
Sample 3x1 (vertical) pixels from the buffer. Again, this can be done on every pixel simultaneously.
Use the contents of the buffer to cull pixels from the original texture.
The advantage to this approach, mathematically, is that it cuts the number of sample operations from n^2 to 2n, although it requires a buffer of equal size to the source (if you're already performing a copy, that can be used as the buffer; you just can't modify the original source for step 2). In order to keep memory use at 2n, you can perform steps 2 and 3 together (this is a bit tricky and not entirely pleasant); if memory isn't an issue, you can spend 3n on two buffers (source, hblur, vblur).
Because each operation is working in complete isolation from an immutable source, you can perform the filter on every pixel simultaneously if you have enough cores. Or, in a more realistic scenario, you can take advantage of paging and caching to load and process a single column or row. This is convenient when working with odd strides, padding at the end of a row, etc. The second round of samples (vertical) may screw with your cache, but at the very worst, one round will be cache-friendly and you've cut processing from exponential to linear.
Now, I've yet to touch on the case of storing data in bits specifically. That does make things slightly more complicated, but not terribly much so. Assuming you can use a rolling window, something like:
d = s[x-1] + s[x] + s[x+1]
works. Interestingly, if you were to rotate the image 90 degrees during the output of step 1 (trivial, sample from (y,x) when reading), you can get away with loading at most two horizontally adjacent bytes for any sample, and only a single byte something like 75% of the time. This plays a little less friendly with cache during the read, but greatly simplifies the algorithm (enough that it may regain the loss).
Pseudo-code:
buffer source, dest, vbuf, hbuf;
for_each (y, x) // Loop over each row, then each column. Generally works better wrt paging
{
hbuf(x, y) = (source(y, x-1) + source(y, x) + source(y, x+1)) / 3 // swap x and y to spin 90 degrees
}
for_each (y, x)
{
vbuf(x, 1-y) = (hbuf(y, x-1) + hbuf(y, x) + hbuf(y, x+1)) / 3 // 1-y to reverse the 90 degree spin
}
for_each (y, x)
{
dest(x, y) = threshold(hbuf(x, y))
}
Accessing bits within the bytes (source(x, y) indicates access/sample) is relatively simple to do, but kind of a pain to write out here, so is left to the reader. The principle, particularly implemented in this fashion (with the 90 degree rotation), only requires 2 passes of n samples each, and always samples from immediately adjacent bits/bytes (never requiring you to calculate the position of the bit in the next row). All in all, it's massively faster and simpler than any alternative.
Rather than expanding the entire image to 1 bit/byte (or 8bpp, essentially, as you noted), you can simply expand the current window - read the first byte of the first row, shift and mask, then read out the three bits you need; do the same for the other two rows. Then, for the next window, you simply discard the left column and fetch one more bit from each row. The logic and code to do this right isn't as easy as simply expanding the entire image, but it'll take a lot less memory.
As a middle ground, you could just expand the three rows you're currently working on. Probably easier to code that way.
I had been reading a webpage on Image Compression (Lossy and Non-lossy).
Now this is my problem, I was successful in making a project on Face detection using opencv - however - my Project Guide is not satisfied - my project simply captures the frames from a Capture device [ webcam ] and passes frames in a function to detect the Faces in those frames and outputs the detect frames in Windows.
My Project Guide wants me to implement some algorithm either of image compression or morphing , etc. but was not happy on seeing such heavy usage of the Library -
So what I would like to know - is it possible to code using C or C++ - image compression algorithms? If yes would not the code size be huge? (my project is supposed to be a minor one)
Please help me out, suppose I want to use the RLE compression using C++ how should I go about it?
You want to invent your own image compression or implement one of the standard ones?
( I assume this is for some sort of class/assignment, you wouldn't do this in the real world!)
You can compress simple images a little using something like Run-Length, especially if you can reduce the number of colours ie. a cartoon or graphic, but for a real photo style image it isn't going to work - that's why complex lossy techniques like jpeg or wavelets were invented.
It's very possible, and RLE compression is quite easy. If you want to look at a relatively straight-forward approach to RLE that won't use a lot of code, look at implementing a version of packbits.
Here's another link as well: http://michael.dipperstein.com/rle/index.html (includes an implementation with source-code for both traditional RLE and packbits)
BTW, keep in mind that you could, with noisy data, actually end up with more data than uncompressed using RLE schemes. For most "real-world" images though that have some form of low-pass filtering applied and a relatively good signal-to-noise ration (i.e,. above 40db), you should expect around 1.5:1 to 1.7:1 compression ratios.
Another option for lossless compression would be huffman-encoding ... that algorithm is more tolerant of noisy images, in that it generally prevents the data-expansion that could occur with those types of images when encoded with a RLE compression algorithm.
Finally, you didn't mention whether you were working with color or grayscale images ... if it's a color image, remember that you will find much greater redundancy if you compress each color-channel in a planar-color-channel image, rather than trying to compress contiguous RGB data.
RLE is the best way to go here. Even the "simplest" compression algorithms are non-trivial and require in-depth knowledge of color space transforms, discrete sin/cosine transforms, entropy, etc.
Back to RLE... to loop through pixesls use something like this:
cv::Mat img = cv::imread("lenna.png");
for(int i=0; i < img.rows; i++)
for(int j=0; i < img.cols; j++)
// You can now access the pixel value with cv::Vec3b
std::cout << img.at<cv::Vec3b>(i,j)[0] << " " << img.at<cv::Vec3b>(i,j)[1] << " " << img.at<cv::Vec3b>(i,j)[2] << std::endl;
Count the number of similar pixels in a row and store them in any data structure (maybe a < #Occurences, Vec3b > tuple in a vector?). Once you have your final vector, don't forget to store the size of your image somewhere with the aforementioned vector (maybe in a simple compressedImage struct) and voilĂ , you just compressed an image. To store it in a file, I suggest you use boost::serialize or something similar.
Your final struct may look something similar to:
struct compressedImage {
int height;
int width;
vector< pair<int, Vec3b> > data;
};
Happy coding!
You want to implement a compression based on colour reduction with a space-filling-curve or a spatial index. A si reduce the 2d complexity to a 1d complexity and it looks like a quadtree and a bit like a fractal. You want to look for Nick's hilbert curve quadtree spatial index blog!
Here is another interesting RLE encoding idea: Lossless hierarchical run length encoding. Maybe that's something for you?
if you need to abstract the raster type, you can use GDAL C++ library. Here is the list of supported by default or on request raster formats:
http://gdal.org/formats_list.html