What algorithms to use for image downsizing? - c++

What algorithms to use for image downsizing?
What is faster?
What algorithm is performed for image resizing ( specially downsizing from big 600x600 to super small 6x6 for example) by such giants as flash and silver player, and html5?

Bilinear is the most widely used method and can be made to run about as fast as the nearest neighbor down-sampling algorithm, which is the fastest but least accurate.
The trouble with a naive implementation of bilinear sampling is that if you use it to reduce an image by more than half, then you can run into aliasing artifacts similar to what you would encounter with nearest neighbor. The solution to this is to use an pyramid based approach. Basically if you want to reduce 600x600 to 30x30, you first reduce to 300x300, then 150x150, then 75x75, then 38x38, and only then use bilinear to reduce to 30x30.
When reducing an image by half, the bilinear sampling algorithm becomes much simpler. Basically for each alternating row and column of pixels:
y[i/2][j/2] = (x[i][j] + x[i+1][j] + x[i][j+1] + x[i+1][j+1]) / 4;

There is an excellent article at The Code Project showing the effects of various image filters.
For shrinking an image I suggest the bicubic algorithm; this has a natural sharpening effect, so detail in the image is retained at smaller sizes.

There's one special case: downsizing JPG's by more than a factor of 8. A direct factor of 8 rescale can be done on the raw JPG data, without decompressing it. JPG's are stored as compressed blocks of 8x8 pixels, with the average pixel value first. As a result, it typically takes more time to read the file from disk or the network than it takes to downscale it.

Normally I would stick to a bilinear filter for scaling down. For resizing images to tiny sizes, though, you may be out of luck. Most icons are pixel-edited by hand to make them look their best.
Here is a good resource which explains the concepts quite well.

Related

What is the fastest mosaic image blending algorithm?

I'm recently working on building panoramas from a large number of mosaic images where the luminance varies among virtually all of them. Therefore, I have to devise an algorithm to blend the images (can be assumed to be well aligned) in each panorama to remove the seams between adjacent tiles. According to Perez's paper (https://www.cs.jhu.edu/~misha/Fall07/Papers/Perez03.pdf) I've developed a program that works well, but the critical issue is that it is too computationally expensive. Blending each pair of images takes 10~15 minutes to solve the Poisson Equation, while each tile in my panorama is 2048*2448 in size; there are around 100 tiles in each panorama, and totally 4500 panoramas to be built, so the method is definitely unfeasible. In view of the large amount of data, speed is virtually everything, so I've been looking for an algorithm (gradient domain, optimal seam, etc, but not alpha blending since it's not very effective) that is most efficient for blending large images. Any suggestions are appreciated!
What about Laplace pyramid blending.
Look at
https://compvisionlab.wordpress.com/2013/05/13/image-blending-using-pyramid/

Algorithm to zoom images clearly

I know images can be zoomed with the help of image pyramids. And I know opencv pyrUp() method can zoom images. But, after certain extent, the image gets non-clear. For an example, if we zoom a small image 15 times of its original size, it is definitely not clear.
Are there any method in OpenCV to zoom the images but keep the clearance as it is in the original one? Or else, any algorithm to do this?
One thing to remember: You can't pull extra resolution out of nowhere. When you scale up an image, you can have either a blurry, smooth image, or you can have a sharp, blocky image, or you can have something in between. Better algorithms, that appear to have better performance with specific types of subjects, make certain assumptions about the contents of the image, which, if true, can yield higher apparent performance, but will mess up if those assumptions prove false; there you are trading accuracy for sharpness.
There are several good algorithms out there for zooming specific types of subjects, including pixel art,
faces, or text.
More general algorithms for sharpening images include unsharp masking, edge enhancement, and others, however all of these are assume specific things about the contents of the image, for instance, that the image contains text, or that a noisy area would still be noisy (or not) at a higher resolution.
A low-resolution polka-dot pattern, or a sandy beach's gritty pattern, will not go over very well, and the computer may turn your seascape into something more reminiscent of a mosh pit. Every zoom algorithm or sharpening filter has a number of costs associated with it.
In order to correctly select a zoom or sharpening algorithm, more context, including sample images, are absolutely necessary.
OpenCV has the Super Resolution module. I haven't had a chance to try it yet so not too sure how well it works.
You should check out Super-Resolution From a Single Image:
Methods for super-resolution (SR) can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods.
You most likely want to experiment with different interpolation schemes for your images. OpenCV provides the resize function that can be used with various different interpolation schemes (docs). You will likely be trading off bluriness (e.g., in bicubic or bilinear interpolation schemes) with jagged aliasing effects (for example, in nearest-neighbour interpolation). I'd recommend experimenting with the different schemes that it provides and see which ones give you the best results.
The supported interpolation schemes are listed as:
INTER_NEAREST nearest-neighbor interpolation
INTER_LINEAR bilinear interpolation (used by default)
INTER_AREA resampling using pixel area relation. It may be the preferred method
for image decimation, as it gives moire-free results. But when the image is
zoomed, it is similar to the INTER_NEAREST method
INTER_CUBIC bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 Lanczos interpolation over 8x8 pixel neighborhood
Wikimedia commons provides this nice comparison image for nearest-neighbour, bilinear, and bicubic interpolation:
You can see that you are unlikely to get the same sharpness as the original image when zoomed, but you can trade off "smoothness" for aliasing effects (i.e., jagged edges).
Take a look at quick image scaling algorithms.
First, I will discuss a simple algorithm, dubbed "smooth Bresenham" that can best be described as nearest neighbour interpolation on a zoomed grid, using a Bresenham algorithm. The algorithm is quick, it produces a quality equivalent to that of linear interpolation and it can zoom up and down, but it is only suitable for a zoom factor that is within a fairly small range. To offset this, I next develop a directional interpolation algorithm that can only magnify (scale up) and only with a factor of 2×, but that does so in a way that keeps edges sharp. This directional interpolation method is quite a bit slower than the smooth Bresenham algorithm, and it is therefore practical to cache those 2× images, once computed. Caching images with relative sizes that are powers of 2, combined with simple interpolation, is actually a third image zooming technique: MIP-mapping.
A related question is Image scaling and rotating in C/C++. Also, you can use CImpg.
What your asking goes out of this universe physics: there are simply not enough bits in the original image to represent 15*15 times more details. Whatever algorithm cannot invent the "right information" that is not there. It can just find a suitable interpolation. But it will never increase the details.
Despite what happens in many police fiction, getting a picture of fingerprint on a car door handle stating from a panoramic view of a city is definitively a fake.
You Can easily zoom in or zoom out an image in opencv using the following two functions.
For Zoom In
pyrUp(tmp, dst, Size(tmp.cols * 2, tmp.rows * 2));
For Zoom Out
pyrDown(tmp, dst, Size(tmp.cols / 2, tmp.rows / 2));
You can get details about the method in the following link:
Image Zoom Out and Zoom In using OpenCV

Detect if images are different in real-time

I am working on a microscope that streams live images via a built-in video camera to a PC, where further image processing can be performed on the streamed image. Any processing done on the streamed image must be done in "real-time" (minimal frames dropped).
We take the average of a series of static images to counter random noise from the camera to improve the output of some of our image processing routines.
My question is: how do I know if the image is no longer static - either the sample under inspection has moved or rotated/camera zoom-in or out - so I can reset the image series used for averaging?
I looked through some of the threads, and some ideas that seemed interesting:
Note: using Windows, C++ and Intel IPP. With IPP the image is a byte array (Ipp8u).
1. Hash the images, and compare the hashes (normal hash or perceptual hash?)
2. Use normalized cross correlation (IPP has many variations - which to use?)
Which do you guys think is suitable for my situation (speed)?
If you camera doesn't shake, you can, as inVader said, subtract images. Then a sum of absolute values of all pixels of the difference image is sometimes enough to tell if images are the same or different. However, if your noise, lighting level, etc... varies, this will not give you a good enough S/N ratio.
And in noizy conditions normal hashes are even more useless.
The best would be to identify that some features of your object has changed, like it's boundary (if it's regular) or it's mass center (if it's irregular). If you have a boundary position, you'll need to analyze just one line of pixels, perpendicular to that boundary, to tell that boundary has moved.
Mass center position may be a subject to frequent false-negative responses, but adding a total mass and/or moment of inertia may help.
If the camera shakes, you may have to align images before comparing (depending on comparison method and required accuracy, a single pixel misalignment might be huge), and that's where cross-correlation helps.
And further, you doesn't have to analyze each image. You can skip one, and if the next differs, discard both of them. Here you have twice as much time to analyze an image.
And if you are averaging images, you might just define an optimal amount of images you need and compare just the first and the last image in the sequence.
So, simplest thing to try would be to take subsequent images, subtract them from each other and have a look at the difference. Then define some rules including local and global thresholds for the difference in which two images are considered equal. Simple subtraction of bitmap/array data, looking for maxima and calculating the average differnce across the whole thing should be ne problem to do in real time.
If there are varying light conditions or something moving in a predictable way(like a door opening and closing), then something more powerful, albeit slower, like gaussian mixture models for background modeling, might be worth looking into, click here. It is quite compute intensive, but can be parallelized pretty easily.
Motion detection algorithms is what is used.
http://www.codeproject.com/Articles/10248/Motion-Detection-Algorithms
http://www.codeproject.com/Articles/22243/Real-Time-Object-Tracker-in-C
First of all I would take a series of images at a slow fps rate and downsample those images to make them smaller, not too much but enough to speed up the process.
Now you have several options:
You could make a sum of absolute differences of the two images by subtracting them and use a threshold to value if the image has changed.
If you want to speed it up even further I would suggest doing a progressive SAD using a small kernel and moving from the top of the image to the bottom. You can value the complessive amount of differences during the process and eventually stop when you are satisfied.

Why JPEG compression processes image by 8x8 blocks?

Why JPEG compression processes image by 8x8 blocks instead of applying Discrete Cosine Transform to the whole image?
8 X 8 was chosen after numerous experiments with other sizes.
The conclusions of experiments are:
1. Any matrices of sizes greater than 8 X 8 are harder to do mathematical operations (like transforms etc..) or not supported by hardware or take longer time.
2. Any matrices of sizes less than 8 X 8 dont have enough information to continue along with the pipeline. It results in bad quality of the compressed image.
Because, that would take "forever" to decode. I don't remember fully now, but I think you need at least as many coefficients as there are pixels in the block. If you code the whole image as a single block I think you need to, for every pixel, iterate through all the DCT coefficients.
I'm not very good at big O calculations but I guess the complexity would be O("forever"). ;-)
For modern video codecs I think they've started using 16x16 blocks instead.
One good reason is that images (or at least the kind of images humans like to look at) have a high degree of information correlation locally, but not globally.
Every relatively smooth patch of skin, or piece of sky or grass or wall eventually ends in a sharp edge and is replaced by something entirely different. This means you still need a high frequency cutoff in order to represent the image adequately rather than just blur it out.
Now, because Fourier-like transforms like DCT "jumble" all the spacial information, you wouldn't be able to throw away any intermediate coefficients either, nor the high-frequency components "you don't like".
There are of course other ways to try to discard visual noise and reconstruct edges at the same time by preserving high frequency components only when needed, or do some iterative reconstruction of the image at finer levels of detail. You might want to look into space-scale representation and wavelet transforms.

Video upsampling with C/C++

I want to upsample an array of captured (from webcam) OpenCV images or corresponding float arrays (Pixel values don't need to be discrete integer). Unfortunately the upsampling ratio is not always integer, so I cannot figure myself how to do it with simple linear interpolation.
Is there an easier way or a library to do this?
Well, I dont know a library to to do framerate scaling.
But I can tell you that the most appropriate way to do it yourself is by just dropping or doubling frames.
Blending pictures by simple linear pixel interpolation will not improve quality, playback will still look jerky and even also blurry now.
To proper interpolate frame rates much more complicated algorithms are needed.
Modern TV's have build in hardware for that and video editing software like e.g. After-Effects has functions that do it.
These algorithms are able to create in beetween pictures by motion analysis. But that is beyond the range of a small problem solution.
So either go on searching for an existing library you can use or do it by just dropping/doubling frames.
The ImageMagick MagickWand library will resize images using proper filtering algorithms - see the MagickResizeImage() function (and use the Sinc filter).
I am not 100% familiar with video capture, so I'm not sure what you mean by "pixel values don't need to be discrete integer". Does this mean the color information per pixel may not be integers?
I am assuming that by "the upsampling ratio is not always integer", you mean that you will upsample from one resolution to another, but you might not be doubling or tripling. For example, instead of 640x480 -> 1280x960, you may be doing, 640x480 -> 800x600.
A simple algorithm might be:
For each pixel in the larger grid
Scale the x/y values to lie between 0,1 (divide x by width, y by height)
Scale the x/y values by the width/height of the smaller grid -> xSmaller, ySmaller
Determine the four pixels that contain your point, via floating point floor/ceiling functions
Get the x/y values of where the point lies within that rectangle,between 0,1 (subtract the floor/ceiling values xSmaller, ySmaller) -> xInterp, yInterp
Start with black, and add your four colors, scaled by the xInterp/yInterp factors for each
You can make this faster for multiple frames by creating a lookup table to map pixels -> xInterp/yInterp values
I am sure there are much better algorithms out there than linear interpolation (bilinear, and many more). This seems like the sort of thing you'd want optimized at the processor level.
Use libswscale from the ffmpeg project. It is the most optimized and supports a number of different resampling algorithms.