I need to know the location of black portions such as in the following image
.
So far I am able to get the coordinates for the biggest contour in the image, which is the image without the black part, by help of the code given here.
For example, coordinates for the first image (1598 x 1288):
(Please note **cv2.approxPolyDP()** outputted the coordinates as (y,x), opposite to convention)
[column# row#]
[[[ 1 358]]
[[ 1 1288]]
[[ 1598 1288]]
[[1598 1]]
[[677 1]]
[[677 358]]
In this case I can find coordinates for black patch as follows:
1)Since patch is located top left (assuming we know this), so there must be 2 pair of coordinates like [min.col.# , row#]. Here, [1 358] and [1 1288]. Take the minimum row# (here 358) from this. This will be the maximum row# for the black patch. (assuming top left is [1 1] not [0 0])
2) Search for a coordinate like [some_col.# 358] (row # extracted from above step). Here, we get [677 358]. So 677 is the maximum column# for the black patch.
3) So points for black patch become [1 1], [1 358], [677 358], [677 1]
This is of course a very clumsy way of determining the coordinates, and also requires knowledge about location (top left, top right, bottom right, bottom left) of the patch.
After determining the contour, it should be fairly easy to know the coordinates of black patch.
What can you suggest?
Any other way of finding black patch in an image, besides finding contours and bounding rectangle, as suggested in the link?
Related
I am writing a disparity matching algorithm using block matching, but I am not sure how to find the corresponding pixel values in the secondary image.
Given a square window of some size, what techniques exist to find the corresponding pixels? Do I need to use feature matching algorithms or is there a simpler method, such as summing the pixel values and determining whether they are within some threshold, or perhaps converting the pixel values to binary strings where the values are either greater than or less than the center pixel?
I'm going to assume you're talking about Stereo Disparity, in which case you will likely want to use a simple Sum of Absolute Differences (read that wiki article before you continue here). You should also read this tutorial by Chris McCormick before you read more here.
side note: SAD is not the only method, but it's really common and should solve your problem.
You already have the right idea. Make windows, move windows, sum pixels, find minimums. So I'll give you what I think might help:
To start:
If you have color images, first you will want to convert them to black and white. In python you might use a simple function like this per pixel, where x is a pixel that contains RGB.
def rgb_to_bw(x):
return int(x[0]*0.299 + x[1]*0.587 + x[2]*0.114)
You will want this to be black and white to make the SAD easier to computer. If you're wondering why you don't loose significant information from this, you might be interested in learning what a Bayer Filter is. The Bayer Filter, which is typically RGGB, also explains the multiplication ratios of the Red, Green, and Blue portions of the pixel.
Calculating the SAD:
You already mentioned that you have a window of some size, which is exactly what you want to do. Let's say this window is n x n in size. You would also have some window in your left image WL and some window in your right image WR. The idea is to find the pair that has the smallest SAD.
So, for each left window pixel pl at some location in the window (x,y) you would the absolute value of difference of the right window pixel pr also located at (x,y). you would also want some running value, which is the sum of these absolute differences. In sudo code:
SAD = 0
from x = 0 to n:
from y = 0 to n:
SAD = SAD + absolute_value|pl - pr|
After you calculate the SAD for this pair of windows, WL and WR you will want to "slide" WR to a new location and calculate another SAD. You want to find the pair of WL and WR with the smallest SAD - which you can think of as being the most similar windows. In other words, the WL and WR with the smallest SAD are "matched". When you have the minimum SAD for the current WL you will "slide" WL and repeat.
Disparity is calculated by the distance between the matched WL and WR. For visualization, you can scale this distance to be between 0-255 and output that to another image. I posted 3 images below to show you this.
Typical Results:
Left Image:
Right Image:
Calculated Disparity (from the left image):
you can get test images here: http://vision.middlebury.edu/stereo/data/scenes2003/
I am studying image processing these days and I am a beginner to the subject. I got stuck on the subject of convolution and how to implement it for images. Let me brief - there is a general formula of convolution for images like so:
x(n1,n2) represents a pixel in the output image, but I do not know what k1 and k2 stand for. Actually, this is what would like to learn. In order to implement this in some programming language, I need to know what k1 and k2 stand for. Can someone explain me this to me or lead me to an article? I would be really appreciative of any help.
Convolution in this case deals with extracting out patches of image pixels that surround a target image pixel. When you perform image convolution, you perform this with what is known as a mask or point spread function or kernel and this is usually much smaller than the size of the image itself.
For each target image pixel in the output image, you grab a neighbourhood of pixel values from the input, including the pixel that is at the same target coordinates in the input. The size of this neighbourhood coincides with exactly the same size as the mask. At that point, you rotate the mask so that it's 180 degrees, then do an element-by-element multiplication of each value in the mask with the pixel values that coincide at each location in the neighbourhood. You add all of these up, and that is the output for the target pixel in the target image.
For example, let's say I had this small image:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
And let's say I wanted to perform an averaging within a 3 x 3 window, so my mask would all be:
[1 1 1]
1/9*[1 1 1]
[1 1 1]
To perform 2D image convolution, rotating the mask by 180 degrees still gives us the same mask, and so let's say I wanted to find the output at row 2, column 2. The 3 x 3 neighbourhood I would extract is:
1 2 3
6 7 8
11 12 13
To find the output, I would multiply each value in the mask by the same location of the neighbourhood:
[1 2 3 ] [1 1 1]
[6 7 8 ] ** (1/9)*[1 1 1]
[11 12 13] [1 1 1]
Perform a point by point multiplication and adding the values would give us:
1(1/9) + 2(1/9) + 3(1/9) + 6(1/9) + 7(1/9) + 8(1/9) + 11(1/9) + 12(1/9) + 13(1/9) = 63/9 = 7
The output at location (2,2) in the output image would be 7.
Bear in mind that I didn't tackle the case where the mask would go out of bounds. Specifically, if I tried to find the output at row 1, column 1 for example, there would be five locations where the mask would go out of bounds. There are many ways to handle this. Some people consider those pixels outside to be zero. Other people like to replicate the image border so that the border pixels are copied outside of the image dimensions. Some people like to pad the image using more sophisticated techniques like doing symmetric padding where the border pixels are a mirror reflection of what's inside the image, or a circular padding where the border pixels are copied from the other side of the image.
That's beyond the scope of this post, but in your case, start with the most simplest case where any pixels that go outside the bounds of the image when you're collecting neighbourhoods, set those to zero.
Now, what does k1 and k2 mean? k1 and k2 denote the offset with respect to the centre of the neighbourhood and mask. Notice that the n1 - k1 and n2 - k2 are important in the sum. The output position is denoted by n1 and n2. Therefore, n1 - k1 and n2 - k2 are the offsets with respect to this centre in both the horizontal sense n1 - k1 and the vertical sense n2 - k2. If we had a 3 x 3 mask, the centre would be k1 = k2 = 0. The top-left corner would be k1 = k2 = -1. The bottom right corner would be k1 = k2 = 1. The reason why they go to infinity is because we need to make sure we cover all elements in the mask. Masks are finite in size so that's just to ensure that we cover all of the mask elements. Therefore, the above sum simplifies to that point by point summation I was talking about earlier.
Here's a better illustration where the mask is a vertical Sobel filter which finds vertical gradients in an image:
Source: http://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator-in-ruby/
As you can see, for each output pixel in the target image, we take a look at a neighbourhood of pixels in the same spatial location in the input image, and that's 3 x 3 in this case, we perform a weighted element by element sum between the mask and the neighbourhood and we set the output pixel to be the total sum of these weighted elements. Bear in mind that this example does not rotate the mask by 180 degrees, but that's what you do when it comes to convolution.
Hope this helps!
$k_1$ and $k_2$ are variables that should cover the whole definition area of your kernel.
Check out wikipedia for further description:
http://en.wikipedia.org/wiki/Kernel_%28image_processing%29
I need your expertise in this problem: I am currently processing an image and would like to extract the skeleton. So far by means of preprocessing I could reach a skeleton of 2 pixel thickness. However I would really like to minimize the size of the skeleton to end up with a thickness of 1. Therefore I propose the following algorithm,which makes sense to me. Before I start the whole code process, I would like to get rid of some suspicions.
Let me explain:
My algorithm is as follows:
Scour the image pixels (remember the ROI described acts as a sliding window)
for the first pixel (skipping boundary pixels) create a region of interest of 3x3 ( the pixel being the anchor(center) of the ROI)
does that pixel carry a maximum value ? (check this condition using pointers w.r.t its 8 neighbors)
take at the same time a second 3x3 ROI for the previous pixel's right neighbor
Is it also a maximum ?
Now create the following logic:
If the first ROI returns true and the second ROI returns true
take the center of the first ROI as true and skip 1 pixel to the right
If the first ROI return true and the other false
take the center of the first ROI true and continue to next pixel
Any other suggestions ? my idea is to get a skeleton of thickness 1.
Please anyone help me to resolve my issue. I am working on image processing based project and I stuck at a point. I got this image after some processing and for further processing i need to crop or detect only deer and remove other portion of image.
This is my Initial image:
And my result should be something like this:
It will be more better if I get only a single biggest blob in the image and save it as a image.
It looks like the deer in your image is pretty much connected and closed. What we can do is use regionprops to find all of the bounding boxes in your image. Once we do this, we can find the bounding box that gives the largest area, which will presumably be your deer. Once we find this bounding box, we can crop your image and focus on the deer entirely. As such, assuming your image is stored in im, do this:
im = im2bw(im); %// Just in case...
bound = regionprops(im, 'BoundingBox', 'Area');
%// Obtaining Bounding Box co-ordinates
bboxes = reshape([bound.BoundingBox], 4, []).';
%// Obtain the areas within each bounding box
areas = [bound.Area].';
%// Figure out which bounding box has the maximum area
[~,maxInd] = max(areas);
%// Obtain this bounding box
%// Ensure all floating point is removed
finalBB = floor(bboxes(maxInd,:));
%// Crop the image
out = im(finalBB(2):finalBB(2)+finalBB(4), finalBB(1):finalBB(1)+finalBB(3));
%// Show the images
figure;
subplot(1,2,1);
imshow(im);
subplot(1,2,2);
imshow(out);
Let's go through this code slowly. We first convert the image to binary just in case. Your image may be an RGB image with intensities of 0 or 255... I can't say for sure, so let's just do a binary conversion just in case. We then call regionprops with the BoundingBox property to find every bounding box of every unique object in the image. This bounding box is the minimum spanning bounding box to ensure that the object is contained within it. Each bounding box is a 4 element array that is structured like so:
[x y w h]
Each bounding box is delineated by its origin at the top left corner of the box, denoted as x and y, where x is the horizontal co-ordinate while y is the vertical co-ordinate. x increases positively from left to right, while y increases positively from top to bottom. w,h are the width and height of the bounding box. Because these points are in a structure, I extract them and place them into a single 1D vector, then reshape it so that it becomes a M x 4 matrix. Bear in mind that this is the only way that I know of that can extract values in arrays for each structuring element efficiently without any for loops. This will facilitate our searching to be quicker. I have also done the same for the Area property. For each bounding box we have in our image, we also have the attribute of the total area encapsulated within the bounding box.
Thanks to #Shai for the spot, we can't simply use the bounding box co-ordinates to determine whether or not something has the biggest area within it as we could have a thin diagonal line that could drive the bounding box co-ordinates to be higher. As such, we also need to rely on the total area that the object takes up within the bounding box as well. Simply put, it's just the sum of all of the pixels that are contained within the object.
Therefore, we search the entire area vector that we have created to see which has the maximum area. This corresponds to your deer. Once we find this location, extract the bounding box locations, then use this to crop the image. Bear in mind that the bounding box values may have floating point numbers. As the image co-ordinates are in integer based, we need to remove these floating point values before we decide to crop. I decided to use floor. I then write code that displays the original image, with the cropped result.
Bear in mind that this will only work if there is just one object in the image. If you want to find multiple objects, check bwboundaries in MATLAB. Otherwise, I believe this should get you started.
Just for completeness, we get the following result:
While object detection is a very general CV task, you can start with something simple if the assumptions are strong enough and you can guarantee that the input images will contain a single prominent white blob well described by a bounding box.
One very simple idea is to subdivide the picture in 3x3=9 patches, calculate the statistics for each patch and compute some objective function. In the most simple case you just do a grid search over various partitions and select that with the highest objective metric. Here's an illustration:
If every line is a parameter x_1, x_2, y_1 and y_2, then you want to optimize
either by
grid search (try all x_i, y_i in some quantization steps)
genetic-algorithm-like random search
gradient descent (move every parameter in that direction that optimizes the target function)
The target function F can be define over statistics of the patches, e.g. like this
F(9 patches) {
brightest_patch = max(patches)
others = patches \ brightest_patch
score = brightness(brightest_patch) - 1/8 * brightness(others)
return score
}
or anything else that incorporates relevant statistics of the patches as well as their size. This also allows to incorporate a "prior knowledge": if you expect the blob to appear in the middle of the image, then you can define a "regularization" term that will penalize F if the parameters x_i and y_i deviate from the expected position too much.
Thanks to all who answer and comment on my Question. With your help I got my exact solution. I am posting my final code and result for others.
img = im2bw(imread('deer.png'));
[L, num] = bwlabel(img, 4);
%%// Get biggest blob or object
count_pixels_per_obj = sum(bsxfun(#eq,L(:),1:num));
[~,ind] = max(count_pixels_per_obj);
biggest_blob = (L==ind);
%%// crop only deer
bound = regionprops(biggest_blob, 'BoundingBox');
%// Obtaining Bounding Box co-ordinates
bboxes = reshape([bound.BoundingBox], 4, []).';
%// Obtain this bounding box
%// Ensure all floating point is removed
finalBB = floor(bboxes);
out = biggest_blob(finalBB(2):finalBB(2)+finalBB(4),finalBB(1):finalBB(1)+finalBB(3));
%%// Show images
figure;
imshow(out);
I'm working on stackfile export to JSON for use in a VCS system and I've found some bizarre results from exporting/importing gradients. The dictionary says the following about the properties:
fillGradient["from"] - A coordinate specifying the starting point of
the gradient
fillGradient["to"] - A coordinate specifying the end point of the
gradient
fillGradient["via"] - A coordinate specifying the intermediate point
of the gradient (affects scaling and shearing of the gradient)
As you can see the coordinate system isn't specified. From some tests it appears the coordinates are relative to the card however this does not make sense to me as the value would change with every move. Does anyone have any further documentation on these properties and/or reasons the properties don't follow the markerPoints convention being relative to the object points where it clearly could do so.
Points locations are relative to the card as you found.
You might want to see this stack for reference: http://www.tactilemedia.com/site_files/downloads/gradient_explorer.rev
Actually, these gradient properties are relative to the topleft of the card.
This is the way that I was able to import Gradients from Adobe Ilustrator
version 7 into LiveCode.
You could check the code in this stack:
http://andregarzia.on-rev.com/alejandro/stacks/Eps_Import_V05C.zip
Some time ago when I also got irritated over the strange coordinate system I added the following behavior to my graphics:
setProp relFillGradient[pKind] pPoint
put round(item 1 of pPoint*the width of me + item 1 of the topLeft of me) into tX
put round(item 2 of pPoint*the height of me + item 2 of the topLeft of me) into tY
set the fillGradient[pKind] of me to tX,tY
end relFillGradient
getProp relFillGradient[pKind]
put the fillGradient[pKind] of me into tPoint
put (item 1 of tPoint - item 1 of the topleft of me)/the width of me into tRelX
put (item 2 of tPoint - item 2 of the topleft of me)/the height of me into tRelY
return (tRelX,tRelY)
end relFillGradient
Then to set the fillGradient you can do:
set the relFillGradient["from"] of graphic "myGraphic" to 0.1,0.3
Where the relative points is 0,0 for top left and 1,1 for bottom right.
NOTE: As you need to set the values to a rounded value you might not get the exact same value back from getProp.
If you don't want a percentage value (as I did) it gets even simpler as you can remove the multiplication and you get the benefit of not having to round your values.