Algorithm to zoom images clearly - c++

I know images can be zoomed with the help of image pyramids. And I know opencv pyrUp() method can zoom images. But, after certain extent, the image gets non-clear. For an example, if we zoom a small image 15 times of its original size, it is definitely not clear.
Are there any method in OpenCV to zoom the images but keep the clearance as it is in the original one? Or else, any algorithm to do this?

One thing to remember: You can't pull extra resolution out of nowhere. When you scale up an image, you can have either a blurry, smooth image, or you can have a sharp, blocky image, or you can have something in between. Better algorithms, that appear to have better performance with specific types of subjects, make certain assumptions about the contents of the image, which, if true, can yield higher apparent performance, but will mess up if those assumptions prove false; there you are trading accuracy for sharpness.
There are several good algorithms out there for zooming specific types of subjects, including pixel art,
faces, or text.
More general algorithms for sharpening images include unsharp masking, edge enhancement, and others, however all of these are assume specific things about the contents of the image, for instance, that the image contains text, or that a noisy area would still be noisy (or not) at a higher resolution.
A low-resolution polka-dot pattern, or a sandy beach's gritty pattern, will not go over very well, and the computer may turn your seascape into something more reminiscent of a mosh pit. Every zoom algorithm or sharpening filter has a number of costs associated with it.
In order to correctly select a zoom or sharpening algorithm, more context, including sample images, are absolutely necessary.

OpenCV has the Super Resolution module. I haven't had a chance to try it yet so not too sure how well it works.
You should check out Super-Resolution From a Single Image:
Methods for super-resolution (SR) can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods.

You most likely want to experiment with different interpolation schemes for your images. OpenCV provides the resize function that can be used with various different interpolation schemes (docs). You will likely be trading off bluriness (e.g., in bicubic or bilinear interpolation schemes) with jagged aliasing effects (for example, in nearest-neighbour interpolation). I'd recommend experimenting with the different schemes that it provides and see which ones give you the best results.
The supported interpolation schemes are listed as:
INTER_NEAREST nearest-neighbor interpolation
INTER_LINEAR bilinear interpolation (used by default)
INTER_AREA resampling using pixel area relation. It may be the preferred method
for image decimation, as it gives moire-free results. But when the image is
zoomed, it is similar to the INTER_NEAREST method
INTER_CUBIC bicubic interpolation over 4x4 pixel neighborhood
INTER_LANCZOS4 Lanczos interpolation over 8x8 pixel neighborhood
Wikimedia commons provides this nice comparison image for nearest-neighbour, bilinear, and bicubic interpolation:
You can see that you are unlikely to get the same sharpness as the original image when zoomed, but you can trade off "smoothness" for aliasing effects (i.e., jagged edges).

Take a look at quick image scaling algorithms.
First, I will discuss a simple algorithm, dubbed "smooth Bresenham" that can best be described as nearest neighbour interpolation on a zoomed grid, using a Bresenham algorithm. The algorithm is quick, it produces a quality equivalent to that of linear interpolation and it can zoom up and down, but it is only suitable for a zoom factor that is within a fairly small range. To offset this, I next develop a directional interpolation algorithm that can only magnify (scale up) and only with a factor of 2×, but that does so in a way that keeps edges sharp. This directional interpolation method is quite a bit slower than the smooth Bresenham algorithm, and it is therefore practical to cache those 2× images, once computed. Caching images with relative sizes that are powers of 2, combined with simple interpolation, is actually a third image zooming technique: MIP-mapping.
A related question is Image scaling and rotating in C/C++. Also, you can use CImpg.

What your asking goes out of this universe physics: there are simply not enough bits in the original image to represent 15*15 times more details. Whatever algorithm cannot invent the "right information" that is not there. It can just find a suitable interpolation. But it will never increase the details.
Despite what happens in many police fiction, getting a picture of fingerprint on a car door handle stating from a panoramic view of a city is definitively a fake.

You Can easily zoom in or zoom out an image in opencv using the following two functions.
For Zoom In
pyrUp(tmp, dst, Size(tmp.cols * 2, tmp.rows * 2));
For Zoom Out
pyrDown(tmp, dst, Size(tmp.cols / 2, tmp.rows / 2));
You can get details about the method in the following link:
Image Zoom Out and Zoom In using OpenCV

Related

how to detect the coordinates of certain points on image

I'm using the ORB algorithm to detect and get the coordinates of the crossings of rope shown in the image, which is represented by the red dot. I want to detect the coordinates of the four points surrounding the crossing represented by the blue dots. All the four points have the same distance from the red spot.
Any idea how to get their coordinates by getting use of the red spot coordinate.
Thank you in Advance
Although you're using ORB, you're still going to need an algorithm to segment the rope from the background, or at least some technique to identify image chunks that belong to the rope and that are equidistant from the red dot. There are a number of options to explore.
It's important to consider your lighting & imaging as separate problems to be solved if this is meant to be a real-world application. This looks a bit like a problem for a class rather than for a application you'll sell and support, but you should still consider lighting:
Will your algorithm(s) still work when light level is reduced?
How will detection be affected by changes in camera pose relative to the surface where the rope will be located?
If you'll be detecting "black" rope, will the algorithm also be required to detect rope of different colors? dirty rope? rope on different backgrounds?
Since you're object of interest is rope, you have to consider a class of algorithms suitable for detection of non-rigid objects. Always consider the simplest solution first!
Connected Components
Connected components labeling is a traditional image processing algorithm and still suitable as the starting point for many applications. The last I knew, this was implemented in OpenCV as findContours(). This can also be called "blob finding" or some variant thereof.
https://en.wikipedia.org/wiki/Connected-component_labeling
https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?highlight=findcontours
Depending on lighting, you may have to take different steps to binarize the image before running connected components. As a start, convert the color image to grayscale, which will simplify the task significantly.
Try a manual threshold since you can quickly test a number of values to see the effect. Don't be too discouraged if the binarization isn't quite right--this can often be fixed with preprocessing.
If a range of manual thresholds works (e.g. 52 - 76 in an 8-bit grayscale range), then use an algorithm that will automatically calculate the threshold for you: Otsu, entropy-based methods, etc., will all offer comparable performance. Whichever technique works best, the code/algorithm can be tweaked further to optimize for your rope application.
If thresholding and binarization don't work--which for your rope application seems unlikely, at least how you've presented it--then switch to thinking in terms of gradient-based (edge-based, energy-based) techniques.
But assuming you can separate the rope from the background, you're still going to need a method to start at the red dot [within the rope] and move equal distances out to the blue points. More about that later after a discussion of other rope segmentation methods.
Note: connected components labeling can work in scenarios beyond just binarizing black & white images. If you can create a texture field or some other 2D representation of the image that makes it possible to distinguish the black rope from the relatively light background, you may be able to use a connected components algorithm. (Finding a "more complicated" or "more modern" algorithm isn't necessarily going to be the right approach.)
In a binarized image, blobs can be nested: on a white background you can have several black blobs, inside of one or more of which are white blobs, inside of which are black blobs, etc. An earlier version of OpenCV handled this reasonably well. (OpenCV is a nice starting point, and a touchpoint for many, but for a number of reasons it doesn't always compare favorably to other open source and commercial packages; popularity notwithstanding, OpenCV has some issues.)
Once you have a "blob" (a 4-connected region of pixels) in a 2D digital image, you can treat the blob as an object, at which point you have a number of options:
Edge tracing: trace around the inside and outside edges of the blob. From what I recall, OpenCV does (or at least should) have some relatively straightforward method to get the edges.
Split the blob into component blobs, each of which can be treated separately
Convert the blob to a polygon
...
A connected components algorithm should be high on the list of techniques to try if you have a non-rigid object.
Boolean Operations
Once you have the rope as a connected component (and possibly even without this), you can use boolean image operations to find the spots at the blue dots in your image:
Create a circular region in data, or even in the image
Find the intersection of the circle (an annulus) and the black region representing the rope. Using your original image, you should have four regions.
Find the center point of the intersection regions.
You could even try this without using connected components at all, but using connected components as part of the solution could make it more robust.
Polygon Simplification
If you have a blob, which in your application would be a connected set of black pixels representing the rope on the floor, then you can consider converting this blob to one or more polygons for further processing. There are advantages to working with polygons.
If you consider only the outside boundary of the rope, then you can see that the set of pixels defining the boundary represents a polygon. It's a polygon with a lot of points, and not a convex polygon, but a polygon nonetheless.
To simplify the polygon, you can use an algorithm such as Ramer-Douglas-Puecker:
https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
Once you have a simplified polygon, you can try a few techniques to render useful data from the polygon
Angle Bisector Network
Triangulation (e.g. using ear clipping)
Triangulation is typically dependent on initial conditions, so the resulting triangulation for slighting different polygons (that is, rope -> blob -> polygon -> simplified polygon). So in your application it might be useful to triangulate the dark rope region, and then to connect the center of one triangle to the center of the next nearest triangle. You'll also have to deal with crossings, such as the rope overlap. Ultimately this can yield a "skeletonization" of the rope. Speaking of which...
Skeletonization
If the rope problem was posed to you as a class exercise, then it may have been a prompt to try skeletonization. You can read about it here:
https://en.wikipedia.org/wiki/Topological_skeleton
Skeletonization and thinning have their own problems to solve, but you should dig into them a bit and see those problems themselves.
The Medial Axis Transform (MAT) is a related concept. Long story there.
Edge-based techniques
There are a number of techniques to generate "edge images" based on edge strength, energy, entropy, etc. Making them robust takes a little effort. If you've had academic training in image processing you've likely heard of Harris, Sobel, Canny, and similar processing methods--none are magic bullets, but they're simple and dependable and will yield data you need.
An "edge image" consists of pixels representing the image gradient strength [and sometimes the gradient direction]. People may call this edge image something else, but it's the concept that matters.
What you then do with the edge data is another subject altogether. But one reason to think of edge images (or at least object borders) is that it reduces the amount of information your algorithm(s) will need to process.
Mean Shift (and related)
To get back to segmentation mentioned in the section on connected components, there are other methods for segmenting figures from a background: K-means, mean shift, and so on. You probably won't need any of those, but they're neat and worth studying.
Stroke Width Transform
This is an intriguing technique used to extract text from noisy backgrounds. Although it's intended for OCR, it could work for rope since the rope width is relatively constant, the rope shape varies, there are crossings, etc.
In short, and simplifying quite a bit, you can think of SWT as a means to find "strokes" (thick lines) by finding gradients antiparallel to each other. On either side of a stroke (or line), the edge gradient points normal to the object edge. The normal on one side of the stroke points opposite the direction of the normal on the other side of the stroke. By filtering for pixel-gradient pairs within a certain distance of each other, you can isolate certain strokes--even automatically. For your example the collection of points representing edge pairs for the rope would be much more common than other point pairs.
Non-Rigid Matching
There are techniques for matching non-rigid shapes, but they would not be worth exploring. If any of the techniques I mentioned above is unfamiliar to you, explore some of those first before you try any fancier algorithms.
CNNs, machine learning, etc.
Just don't even think of these methods as a starting point.
Other Considerations
If this were an application for industry, security, or whatnot, you'd have to determine how well your image processing worked under all environmental considerations. That's not an easy task, and can make all the difference between a setup that "works" in the lab and a setup that actually works in practice.
I hope that's of some help. Feel free to post a reply if I've confused more than helped, or if you want to explore some idea in more detail. Though I tried to touch on some common(ish) techniques, I didn't mention all the different ways of addressing this problem.
And briefly: once you have a skeleton, point network, or whatever representing a reduced data set for the rope and the red dot (the identified feature), a few techniques to find the items at the blue dots:
For a skeleton, trace along each "branch" of the rope outward from the know until the geodesic distance or straight-line 2D distance is the distance D that you want.
To use geometry, create a circle of width 1 - 2 pixels. Find the intersection of that circle and the rope. Find the center point of the intersections of circle and rope. (Also described above.)
Good luck!

Calculating dispersion in Human Tracking

I am currently trying to track human heads from a CCTV. I am currently using colour histogram and LBP histogram comparison to check the affinity between bounding boxes. However sometimes these are not enough.
I was reading through a paper in the following link : paper where dispersion metric is described. However I still cannot clearly get it. For example I cannot understand what pi,j is referring to in the equation. Can someone kindly & clearly explain how I can find dispersion between bounding boxes in separate frames please?
You assistance is much appreciated :)
This paper tackles the tracking problem using a background model, as most CCTV tracking methods do. The BG model produces a foreground mask, and the aforementioned p_ij relates to this mask after some morphology. Specifically, they try to separate foreground blobs into components, based on thresholds on allowed 'gaps' in FG mask holes. The end result of this procedure is a set of binary masks, one for each hypothesized object. These masks are then used for tracking using spatial and temporal consistency. In my opinion, this is an old fashioned way of processing video sequences, only relevant if you're limited in processing power and the scenes are not crowded.
To answer your question, if O is the mask related to one of the hypothesized objects, then p_ij is the binary pixel in the (i,j) location within the mask. Thus, c_x and c_y are the center of mass of the binary shape, and the dispersion is simply the average distance from the center of mass for the shape (it is larger for larger objects. This enforces scale consistency in tracking, but in a very weak manner. You can do much better if you have a calibrated camera.

OpenCV Image stiching when camera parameters are known

We have pictures taken from a plane flying over an area with 50% overlap and is using the OpenCV stitching algorithm to stitch them together. This works fine for our version 1. In our next iteration we want to look into a few extra things that I could use a few comments on.
Currently the stitching algorithm estimates the camera parameters. We do have camera parameters and a lot of information available from the plane about camera angle, position (GPS) etc. Would we be able to benefit anything from this information in contrast to just let the algorithm estimate everything based on matched feature points?
These images are taken in high resolution and the algorithm takes up quite amount of RAM at this point, not a big problem as we just spin large machines up in the cloud. But I would like to in our next iteration to get out the homography from down sampled images and apply it to the large images later. This will also give us more options to manipulate and visualize other information on the original images and be able to go back and forward between original and stitched images.
If we in question 1 is going to take apart the stitching algorithm to put in the known information, is it just using the findHomography method to get the info or is there better alternatives to create the homography when we actually know the plane position and angles and the camera parameters.
I got a basic understanding of opencv and is fine with c++ programming so its not a problem to write our own customized stitcher, but the theory is a bit rusty here.
Since you are using homographies to warp your imagery, I assume you are capturing areas small enough that you don't have to worry about Earth curvature effects. Also, I assume you don't use an elevation model.
Generally speaking, you will always want to tighten your (homography) model using matched image points, since your final output is a stitched image. If you have the RAM and CPU budget, you could refine your linear model using a max likelihood estimator.
Having a prior motion model (e.g. from GPS + IMU) could be used to initialize the feature search and match. With a good enough initial estimation of the feature apparent motion, you could dispense with expensive feature descriptor computation and storage, and just go with normalized crosscorrelation.
If I understand correctly, the images are taken vertically and overlap by a known amount of pixels, in that case calculating homography is a bit overkill: you're just talking about a translation matrix, and using more powerful algorithms can only give you bad conditioned matrixes.
In 2D, if H is a generalised homography matrix representing a perspective transformation,
H=[[a1 a2 a3] [a4 a5 a6] [a7 a8 a9]]
then the submatrixes R and T represent rotation and translation, respectively, if a9==1.
R= [[a1 a2] [a4 a5]], T=[[a3] [a6]]
while [a7 a8] represents the stretching of each axis. (All of this is a bit approximate since when all effects are present they'll influence each other).
So, if you known the lateral displacement, you can create a 3x3 matrix having just a3, a6 and a9=1 and pass it to cv::warpPerspective or cv::warpAffine.
As a criteria of matching correctness you can, f.e., calculate a normalized diff between pixels.

Which deconvolution algorithm is best suited for removing motion blur from text?

I'm using OpenCV to process pictures taken with a mobile phone. The pictures contain text, and they have small amounts of motion blur, which I need to remove.
What would be the most viable algorithm to use? I have tested so far Lucy-Richardson and Weiner deconvolution, but they did not yield satisfactory results.
Agree with #TheJuice, your problem lies in the PSF estimation. Usually to be able to do this from a single frame, several assumptions need to be made about the factors leading to the blur (motion of object, type of motion of the sensor, etc.).
You can find some pointers, especially on the monodimensional case, here. They use a filtering method that leaves mostly correlation from the blur, discarding spatial correlation of original image, and use this to deduce motion direction and thence the PSF. For small blurs you might be able to consider the motion as constant; otherwise you will have to use a more complex accelerated motion model.
Unfortunately, mobile phone blur is often a compound of CCD integration and non-linear motion (translation perpendicular to line of sight, yaw from wrist motion, and rotation around the wrist), so Yitzhaky and Kopeika's method will probably only yield acceptable results in a minority of cases. I know there are methods to deal with that ("depth awareness" and other) but I have never had occasion of dealing with them.
You can preview the results using photo recovery software such as Focus Magic; while they do not employ YK estimator (motion description is left to you), the remaining workflow is necessarily very similar. If your pictures are amenable to Focus Magic recovery, then probably YK method will work. If they are not (or not enough, or not enough of them to be worthwhile), then there's no point even trying to implement it.
Motion blur is a difficult problem to overcome. The best results are gained when
The speed of the camera relative to the scene is known
You have many pictures of the blurred object which you can correlate.
You do have one major advantage in that you are looking at text (which normally constitutes high contrast features). If you only apply deconvolution to high contrast (I know that the theory is often to exclude high contrast) areas of your image you should get results which may enable you to better recognise characters. Also a combination of sharpening/blurring filters pre/post processing may help.
I remember being impressed with this paper previously. Perhaps an adaption on their implementation would be worth a go.
I think the estimation of your point-spread function is likely to be more important than the algorithm used. It depends on the kind of motion blur you're trying to remove, linear motion is likely to be the easiest but is unlikely to be the kind you're trying to remove: i imagine it's non-linear caused by hand movement during the exposure.
You cannot eliminate motion blur. The information is lost forever. What you are dealing with is a CCD that is recording multiple real objects to a single pixel, smearing them together. In other words if the pixel reads 56, you cannot magically determine that the actual reading should have been 37 at time 1, and 62 at time 2, and 43 at time 3.
Another way to look at this: imagine you have 5 pictures. You then use photoshop to blend the pictures together, averaging the value of each pixel. Can you now somehow from the blended picture tell what the original 5 pictures were? No, you cannot, because you do not have the information to do that.

Target Detection - Algorithm suggestions

I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?
In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.
A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.
I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.