What is the fastest mosaic image blending algorithm? - computer-vision

I'm recently working on building panoramas from a large number of mosaic images where the luminance varies among virtually all of them. Therefore, I have to devise an algorithm to blend the images (can be assumed to be well aligned) in each panorama to remove the seams between adjacent tiles. According to Perez's paper (https://www.cs.jhu.edu/~misha/Fall07/Papers/Perez03.pdf) I've developed a program that works well, but the critical issue is that it is too computationally expensive. Blending each pair of images takes 10~15 minutes to solve the Poisson Equation, while each tile in my panorama is 2048*2448 in size; there are around 100 tiles in each panorama, and totally 4500 panoramas to be built, so the method is definitely unfeasible. In view of the large amount of data, speed is virtually everything, so I've been looking for an algorithm (gradient domain, optimal seam, etc, but not alpha blending since it's not very effective) that is most efficient for blending large images. Any suggestions are appreciated!

What about Laplace pyramid blending.
Look at
https://compvisionlab.wordpress.com/2013/05/13/image-blending-using-pyramid/

Related

OpenCV Image stiching when camera parameters are known

We have pictures taken from a plane flying over an area with 50% overlap and is using the OpenCV stitching algorithm to stitch them together. This works fine for our version 1. In our next iteration we want to look into a few extra things that I could use a few comments on.
Currently the stitching algorithm estimates the camera parameters. We do have camera parameters and a lot of information available from the plane about camera angle, position (GPS) etc. Would we be able to benefit anything from this information in contrast to just let the algorithm estimate everything based on matched feature points?
These images are taken in high resolution and the algorithm takes up quite amount of RAM at this point, not a big problem as we just spin large machines up in the cloud. But I would like to in our next iteration to get out the homography from down sampled images and apply it to the large images later. This will also give us more options to manipulate and visualize other information on the original images and be able to go back and forward between original and stitched images.
If we in question 1 is going to take apart the stitching algorithm to put in the known information, is it just using the findHomography method to get the info or is there better alternatives to create the homography when we actually know the plane position and angles and the camera parameters.
I got a basic understanding of opencv and is fine with c++ programming so its not a problem to write our own customized stitcher, but the theory is a bit rusty here.
Since you are using homographies to warp your imagery, I assume you are capturing areas small enough that you don't have to worry about Earth curvature effects. Also, I assume you don't use an elevation model.
Generally speaking, you will always want to tighten your (homography) model using matched image points, since your final output is a stitched image. If you have the RAM and CPU budget, you could refine your linear model using a max likelihood estimator.
Having a prior motion model (e.g. from GPS + IMU) could be used to initialize the feature search and match. With a good enough initial estimation of the feature apparent motion, you could dispense with expensive feature descriptor computation and storage, and just go with normalized crosscorrelation.
If I understand correctly, the images are taken vertically and overlap by a known amount of pixels, in that case calculating homography is a bit overkill: you're just talking about a translation matrix, and using more powerful algorithms can only give you bad conditioned matrixes.
In 2D, if H is a generalised homography matrix representing a perspective transformation,
H=[[a1 a2 a3] [a4 a5 a6] [a7 a8 a9]]
then the submatrixes R and T represent rotation and translation, respectively, if a9==1.
R= [[a1 a2] [a4 a5]], T=[[a3] [a6]]
while [a7 a8] represents the stretching of each axis. (All of this is a bit approximate since when all effects are present they'll influence each other).
So, if you known the lateral displacement, you can create a 3x3 matrix having just a3, a6 and a9=1 and pass it to cv::warpPerspective or cv::warpAffine.
As a criteria of matching correctness you can, f.e., calculate a normalized diff between pixels.

Algorithm to zoom images clearly

I know images can be zoomed with the help of image pyramids. And I know opencv pyrUp() method can zoom images. But, after certain extent, the image gets non-clear. For an example, if we zoom a small image 15 times of its original size, it is definitely not clear.
Are there any method in OpenCV to zoom the images but keep the clearance as it is in the original one? Or else, any algorithm to do this?
One thing to remember: You can't pull extra resolution out of nowhere. When you scale up an image, you can have either a blurry, smooth image, or you can have a sharp, blocky image, or you can have something in between. Better algorithms, that appear to have better performance with specific types of subjects, make certain assumptions about the contents of the image, which, if true, can yield higher apparent performance, but will mess up if those assumptions prove false; there you are trading accuracy for sharpness.
There are several good algorithms out there for zooming specific types of subjects, including pixel art,
faces, or text.
More general algorithms for sharpening images include unsharp masking, edge enhancement, and others, however all of these are assume specific things about the contents of the image, for instance, that the image contains text, or that a noisy area would still be noisy (or not) at a higher resolution.
A low-resolution polka-dot pattern, or a sandy beach's gritty pattern, will not go over very well, and the computer may turn your seascape into something more reminiscent of a mosh pit. Every zoom algorithm or sharpening filter has a number of costs associated with it.
In order to correctly select a zoom or sharpening algorithm, more context, including sample images, are absolutely necessary.
OpenCV has the Super Resolution module. I haven't had a chance to try it yet so not too sure how well it works.
You should check out Super-Resolution From a Single Image:
Methods for super-resolution (SR) can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods.
You most likely want to experiment with different interpolation schemes for your images. OpenCV provides the resize function that can be used with various different interpolation schemes (docs). You will likely be trading off bluriness (e.g., in bicubic or bilinear interpolation schemes) with jagged aliasing effects (for example, in nearest-neighbour interpolation). I'd recommend experimenting with the different schemes that it provides and see which ones give you the best results.
The supported interpolation schemes are listed as:
INTER_NEAREST nearest-neighbor interpolation
INTER_LINEAR bilinear interpolation (used by default)
INTER_AREA resampling using pixel area relation. It may be the preferred method
for image decimation, as it gives moire-free results. But when the image is
zoomed, it is similar to the INTER_NEAREST method
INTER_CUBIC bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 Lanczos interpolation over 8x8 pixel neighborhood
Wikimedia commons provides this nice comparison image for nearest-neighbour, bilinear, and bicubic interpolation:
You can see that you are unlikely to get the same sharpness as the original image when zoomed, but you can trade off "smoothness" for aliasing effects (i.e., jagged edges).
Take a look at quick image scaling algorithms.
First, I will discuss a simple algorithm, dubbed "smooth Bresenham" that can best be described as nearest neighbour interpolation on a zoomed grid, using a Bresenham algorithm. The algorithm is quick, it produces a quality equivalent to that of linear interpolation and it can zoom up and down, but it is only suitable for a zoom factor that is within a fairly small range. To offset this, I next develop a directional interpolation algorithm that can only magnify (scale up) and only with a factor of 2×, but that does so in a way that keeps edges sharp. This directional interpolation method is quite a bit slower than the smooth Bresenham algorithm, and it is therefore practical to cache those 2× images, once computed. Caching images with relative sizes that are powers of 2, combined with simple interpolation, is actually a third image zooming technique: MIP-mapping.
A related question is Image scaling and rotating in C/C++. Also, you can use CImpg.
What your asking goes out of this universe physics: there are simply not enough bits in the original image to represent 15*15 times more details. Whatever algorithm cannot invent the "right information" that is not there. It can just find a suitable interpolation. But it will never increase the details.
Despite what happens in many police fiction, getting a picture of fingerprint on a car door handle stating from a panoramic view of a city is definitively a fake.
You Can easily zoom in or zoom out an image in opencv using the following two functions.
For Zoom In
pyrUp(tmp, dst, Size(tmp.cols * 2, tmp.rows * 2));
For Zoom Out
pyrDown(tmp, dst, Size(tmp.cols / 2, tmp.rows / 2));
You can get details about the method in the following link:
Image Zoom Out and Zoom In using OpenCV

Why JPEG compression processes image by 8x8 blocks?

Why JPEG compression processes image by 8x8 blocks instead of applying Discrete Cosine Transform to the whole image?
8 X 8 was chosen after numerous experiments with other sizes.
The conclusions of experiments are:
1. Any matrices of sizes greater than 8 X 8 are harder to do mathematical operations (like transforms etc..) or not supported by hardware or take longer time.
2. Any matrices of sizes less than 8 X 8 dont have enough information to continue along with the pipeline. It results in bad quality of the compressed image.
Because, that would take "forever" to decode. I don't remember fully now, but I think you need at least as many coefficients as there are pixels in the block. If you code the whole image as a single block I think you need to, for every pixel, iterate through all the DCT coefficients.
I'm not very good at big O calculations but I guess the complexity would be O("forever"). ;-)
For modern video codecs I think they've started using 16x16 blocks instead.
One good reason is that images (or at least the kind of images humans like to look at) have a high degree of information correlation locally, but not globally.
Every relatively smooth patch of skin, or piece of sky or grass or wall eventually ends in a sharp edge and is replaced by something entirely different. This means you still need a high frequency cutoff in order to represent the image adequately rather than just blur it out.
Now, because Fourier-like transforms like DCT "jumble" all the spacial information, you wouldn't be able to throw away any intermediate coefficients either, nor the high-frequency components "you don't like".
There are of course other ways to try to discard visual noise and reconstruct edges at the same time by preserving high frequency components only when needed, or do some iterative reconstruction of the image at finer levels of detail. You might want to look into space-scale representation and wavelet transforms.

Fastest way to calculate OpenGL texture similarity/distance?

Here's what I have:
I load a texture from disk, say 256x256, say a picture of penguin
I create another texture with same dimensions and I draw some stuff on it
I want to find distance between the two openGL textures AS FAST AS POSSIBLE (accuracy of the distance function is of LEAST concern).
Actually, they might not be textures in the first place.
I might as well compare a region of the framebuffer to the loaded texture or do some other "magic".
How to do this super-fast?
This has little to do with OpenGL in fact. OpenGL is just a 3D rasterization "driver".
The main idea is to get a distance/similarity algorithm, which is a tricky task. For beginning you could do a Root Mean Square Deviation algorithm. It will give you a number of how far away pixel values are.
When you implement it on CPU you could benchmark it for your needs and maybe convert to OpenCL. I don't think loading the "penguin" to GPU just to compare it to other shapes you prepare there is a super-fast process by itself.
Try to be more specific with your next question and avoid attitude of "I have a very well microscope, how do I hit nails with it super-fast?"
You could use the Pearson-Product for matching different images. You can find it on an answer of mine. Instead of matching a template more little than the original image, you could correlated directly two images.
Shortly, you get the deviation of each textel from the average, giving you a correlation factor.
But improving that algorithm by using shaders could be a little hard. First, you have to compute the textel average: maybe some OpenGL extension (like histogram) may help you in this task.
Then, you could use fragment shaders to perform single-component computation (the difference between each textel with the averaged one computed previously. The average textel has to be passed as uniform, and the result should be stored in floating-point texture (you can render it on a framebuffer object).
Then, you should sum up all textel of the resulting texture in order to get the correlation between the two souce textures.
This may worth in the case images are very big. Otherwise I think it's just better to execute the algorithm on CPU, using SIMD instruction set (like MMX, SSE, AVX).

What algorithms to use for image downsizing?

What algorithms to use for image downsizing?
What is faster?
What algorithm is performed for image resizing ( specially downsizing from big 600x600 to super small 6x6 for example) by such giants as flash and silver player, and html5?
Bilinear is the most widely used method and can be made to run about as fast as the nearest neighbor down-sampling algorithm, which is the fastest but least accurate.
The trouble with a naive implementation of bilinear sampling is that if you use it to reduce an image by more than half, then you can run into aliasing artifacts similar to what you would encounter with nearest neighbor. The solution to this is to use an pyramid based approach. Basically if you want to reduce 600x600 to 30x30, you first reduce to 300x300, then 150x150, then 75x75, then 38x38, and only then use bilinear to reduce to 30x30.
When reducing an image by half, the bilinear sampling algorithm becomes much simpler. Basically for each alternating row and column of pixels:
y[i/2][j/2] = (x[i][j] + x[i+1][j] + x[i][j+1] + x[i+1][j+1]) / 4;
There is an excellent article at The Code Project showing the effects of various image filters.
For shrinking an image I suggest the bicubic algorithm; this has a natural sharpening effect, so detail in the image is retained at smaller sizes.
There's one special case: downsizing JPG's by more than a factor of 8. A direct factor of 8 rescale can be done on the raw JPG data, without decompressing it. JPG's are stored as compressed blocks of 8x8 pixels, with the average pixel value first. As a result, it typically takes more time to read the file from disk or the network than it takes to downscale it.
Normally I would stick to a bilinear filter for scaling down. For resizing images to tiny sizes, though, you may be out of luck. Most icons are pixel-edited by hand to make them look their best.
Here is a good resource which explains the concepts quite well.