I am using the OpenCV library for an image processing project to detect hands. I initialized the image in iplimage, colored it, and then converted it to HSV with cvCvtColor(imageHand,imageHand,CV_BGR2HSV );
I don't know the efficient algorithm so that's my problem. Please check my code:
for( int row = 0; row < imageHand->height; row++ )
{
for ( int col = 0; col < imageHand->width; col++ )
{
h =(imageHand->imageData[imageHand->widthStep * row + col * 3]) ;
s = (imageHand->imageData[imageHand->widthStep * row + col * 3 + 1]);
v = (imageHand->imageData[imageHand->widthStep * row + col * 3 + 2]);
if( h>85)
{
imageHand->imageData[imageHand->widthStep * row + col * 3 ] = 0 ;
imageHand->imageData[imageHand->widthStep * row + col * 3 + 1 ] =0 ;
imageHand->imageData[imageHand->widthStep * row + col * 3 + 2 ] = 0 ;
}
else
{
imageHand->imageData[imageHand->widthStep * row + col * 3 ] = 255 ;
imageHand->imageData[imageHand->widthStep * row + col * 3 + 1 ] = 255 ;
imageHand->imageData[imageHand->widthStep * row + col * 3 + 2 ] = 255 ;
}
}
}
I think the range of the searched h is > 85!?
If you know a better algorithm than please guide me.
If you take a look at this site, Hand detection using opencv, you'll find a similar algorithm to what you're using. I would say that the easiest way of detecting a hand would be through the use of colour (i.e. skin detection). I would definitely recommend looking at the algorithm provided by that site first. There's another part that also goes into gesture recognition, if that's an eventual problem you're going to need to handle.
Other possibilities include:
Background Subtraction
This is very simple and prone to breaking, especially if you're planning on the background changing. But, if you're expecting to only use it in front of, say, a white wall... this could be an easy way of going about it.
Shape Analysis
There has been some success with detecting fingertips using the Generalised Hough Transform. False positives can become a worry, however and efficiency is a worry, particularly in situations with a significant amount of background.
as Ancallan has mentioned hand detection using opencv above, I would like to add some more information on the topic of gesture detection. In that post the author used a method of skin colour segmentation, which has got quite good results under specific circumstances.
a new post of hand gesture detection using openCV has been updated, in which the author used a HAAR classifier to detect closed palm, and the results are much more robust than the former ones. but need to point out that the detection objects are somehow limited as one classifier only works for one gesture.
Related
I am drawing a graph with 2000+ points to a pdf file. The resolution of the pdf is 612 x 792. I can only draw 612 points to the pdf because the width is 612. I am mapping 1 point to 1 pixel. How can I plot all 2000+ samples to the pdf. I am using this lib http://www.vulcanware.com/cpp_pdf/index.html.
Option 1: Scale the points, using x = (x * 612) / 2000. This will mean that if 2 points are close to each other (including "similar y") they will overwrite each other.
Option 2: Treat each point as a square; and calculate floating point values for the "left edge x" and "right edge x" that have been scaled (left_x = ( (x-width/2.0) * 612.0) / 2000.0; right_x = ( (x+width/2.0) * 612.0) / 2000.0;), and draw the square using anti-aliasing, by calculating "area of destination pixel that the square overlaps" for each destination pixel that overlaps with the square. In this case you will need to do "dest_pixel = max(dest_pixel + area, 1);" to clamp pixel values when squares overlap.
Option 3: Rotate the whole thing 90 degrees so that "x axis" goes vertically down the page (and can be split across multiple pages if necessary); and if this causes a problem for y then use one of the options above for y.
Note that "option 2" can be done in both (vertical and horizontal) directions at the same time. To do this, start by determining the edges of the square, like:
left_x = point_x / MAX_SRC_X * MAX_DEST_X;
right_x = (point_x + 1) / MAX_SRC_X * MAX_DEST_X;
top_y = point_y / MAX_SRC_Y * MAX_DEST_Y;
bottom_y = (point_y + 1) / MAX_SRC_Y * MAX_DEST_Y;
Then have a "for each row that is effected" loop that calculates how much each row is effected, like:
for(int y = top_y; y < bottom_y; y++) {
row_top = fmax(y, top_y);
row_bottom = fmin(y+1, bottom_y);
row_weight = row_bottom - row_top;
Then have a similar "for each column that is effected" loop, like:
for(int x = left_x; x < right_x; x++) {
column_left = fmax(x, left_x);
column_right = fmin(x+1, right_x);
column_weight = column_right - column_left;
Then calculate the area for the pixel, set the pixel, and complete the loops:
dest_pixel_area = row_weight * column_weight;
pixel[y][x].red = min(pixel[y][x].red + dest_pixel_area * red, MAX_RED);
pixel[y][x].green = min(pixel[y][x].green + dest_pixel_area * green, MAX_GREEN);
pixel[y][x].blue = min(pixel[y][x].blue + dest_pixel_area * blue, MAX_BLUE);
}
}
Note: All code above is untested and simplified. It can be faster to break the loops up into "first line/column; loop for middle area only; then last line/column" to remove most of the fmin/fmax.
If you only need to do this in one direction, delete the parts for the direction you don't need and use 1.0 for the corresponding row_weight or column_weight.
I would like to create my own nonlinear filter in OpenCV using C++, and if I see it correctly, I can use the FilterEngine class to do so. Unfortunately, I'm not really able to follow the documentation of this class. (Link: http://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#filterengine).
Could someone be so kind to explain the class to me in a little bit more detail?
I'm grateful for every input and every example you can provide me with :-)
.
My specific needs:
1) I would like learn how to create my own nonlinear filters in general.
2) I would like to apply a rank-transform filter to my images:
Meaning: I have a kernel/region and I would like to flag every pixel inside that region with a one if the intensity-value of that (neighbourhood-) pixel is lower than the intensity of the center-pixel. Next, I want to use a simple convolution to save the sum of the transformed region, and store the value at the center-pixel. Let's look at a simple example:
100 120 200 rank-trans. 1 0 0 convolution
110 120 220 --> 1 0 0 --> 2
180 200 200 0 0 0
P.S: I know that I can archive the result of 2) by combining 255 threshold-operations with 255 box-filter operations, and then looping over every pixel and selecting the correct value. However, that seems quite inefficient to me ...
.
Code-Snipped [Edit]:
As I still struggle to understand the FilterEngine(), I started to write my own function for the above-descripted usecase. I would also be happy if you could comment on it to improve its efficiency, as it is quite slow at the moment. (~2sec. for a 1080x1920 image on one CPU-core).
void rankTransform(Mat& out, Mat in, int kernal_size, int borderType) {
// Issue warning if neccessary:
if (kernal_size >= 17) {
std::cout << "Warning, need to change Mat-type. Unsigned short only supports kernels up-to the size of 15x15" << std::endl << std::endl;
};
// First: Get borders around the image:
int border_size = (kernal_size - 1) / 2;
Mat in_incl_border = Mat(1080 + 2 * border_size, 1920 + 2 * border_size, in.depth());
copyMakeBorder(in, in_incl_border, border_size, border_size, border_size, border_size, borderType);
// Second: Loop through the image, conduct a rank transform and
// then sum over the kernel-size:
int start_pixel = 0 + (border_size + 1);
int end_pixel_width = 1920 + border_size;
int end_pixel_height = 1080 + border_size;
int i, j;
int x_1, x_2, y_1;
for (i = start_pixel; i < end_pixel_height; ++i) {
x_1 = i - border_size;
x_2 = i + border_size + 1;
for (j = start_pixel; j < end_pixel_width; ++j) {
y_1 = j - border_size;
out.at<unsigned short>(x_1-1, y_1-1) = static_cast<unsigned short>( (sum( in_incl_border(Range(x_1, x_2), Range(y_1, j + border_size + 1)) < in_incl_border.at<unsigned short>(i, j) )[0])/255 );
};
};
I need to filter given width of lines in a image.
I am coding a program which will detect lines of road image. And I found something like that but can't understand logic of it. My function has to do that:
I will send image and width of line in terms of pixel size(e.g 30 pixel width), the function will filter just these lines in image.
I found that code:
void filterWidth(Mat image, int tau) // tau=width of line I want to filter
int aux = 0;
for (int j = 0; j < quad.rows; ++j)
{
unsigned char *ptRowSrc = quad.ptr<uchar>(j);
unsigned char *ptRowDst = quadDst.ptr<uchar>(j);
for (int i = tau; i < quad.cols - tau; ++i)
{
if (ptRowSrc[i] != 0)
{
aux = 2 * ptRowSrc[i];
aux += -ptRowSrc[i - tau];
aux += -ptRowSrc[i + tau];
aux += -abs((int)(ptRowSrc[i - tau] - ptRowSrc[i + tau]));
aux = (aux < 0) ? (0) : (aux);
aux = (aux > 255) ? (255) : (aux);
ptRowDst[i] = (unsigned char)aux;
}
}
}
What is the mathematical explanation of that code? And how does that work?
Read up about convolution filters. This code is a particular case of a 1 dimensional convolution filter (it only convolves with other pixels on the currently processed line).
The value of aux is started with 2 * the current pixel value, then pixels on either side of it at distance tau are being subtracted from that value. Next the absolute difference of those two pixels is also subtracted from it. Finally it is capped to the range 0...255 before being stored in the output image.
If you have an image:
0011100
This convolution will cause the centre 1 to gain the value:
2 * 1
- 0
- 0
- abs(0 - 0)
= 2
The first '1' will become:
2 * 1
- 0
- 1
- abs(0 - 1)
= 0
And so will the third '1' (it's a mirror image).
And of course the 0 values will always stay zero or become negative, which will be capped back to 0.
This is a rather weird filter. It takes the pixel values three by three on the same line, with a tau spacing. Let these values by Vl, V and Vr.
The filter computes - Vl + 2 V - Vr, which can be seen as a second derivative, and deducts |Vl - Vr|, which can be seen as a first derivative (also called gradient). The second derivative gives a maximum response in case of a maximum configuration (Vl < V > Vr); the first derivative gives a minimum response in case of a symmetric configuration (Vl = Vr).
So the global filter will give a maximum response for a symmetric maximum (like with a light road on a dark background, vertical, with a width less than 2.tau).
By rearranging the terms, you can see that the filter also yields the smallest of the left and right gradients, V - Vm and V - Vp (clamped to zero).
I am fond of random generation - and random colors - so I decided to combine them both and made a simple 2d landscape generator. What my idea was is to, depending on how high a block is, (yes, the terrain is made of blocks) make it lighter or darker, where things nearest the top are lighter, and towards the bottom are darker. I got it working in grayscale, but as I figured out, you cannot really use a base RGB color and make it lighter, given that the ratio between RGB values, or anything of the sort, seem to be unusable. Solution? HSL. Or perhaps HSV, to be honest I still don't know the difference. I am referring to H 0-360, S & V/L = 0-100. Although... well, 360 = 0, so that is 360 values, but if you actually have 0-100, that is 101. Is it really 0-359 and 1-100 (or 0-99?), but color selection editors (currently referring to GIMP... MS paint had over 100 saturation) allow you to input such values?
Anyhow, I found a formula for HSL->RGB conversion (here & here. As far as I know, the final formulas are the same, but nonetheless I will provide the code (note that this is from the latter easyrgb.com link):
Hue_2_RGB
float Hue_2_RGB(float v1, float v2, float vH) //Function Hue_2_RGB
{
if ( vH < 0 )
vH += 1;
if ( vH > 1 )
vH -= 1;
if ( ( 6 * vH ) < 1 )
return ( v1 + ( v2 - v1 ) * 6 * vH );
if ( ( 2 * vH ) < 1 )
return ( v2 );
if ( ( 3 * vH ) < 2 )
return ( v1 + ( v2 - v1 ) * ( ( 2 / 3 ) - vH ) * 6 );
return ( v1 );
}
and the other piece of code:
float var_1 = 0, var_2 = 0;
if (saturation == 0) //HSL from 0 to 1
{
red = luminosity * 255; //RGB results from 0 to 255
green = luminosity * 255;
blue = luminosity * 255;
}
else
{
if ( luminosity < 0.5 )
var_2 = luminosity * (1 + saturation);
else
var_2 = (luminosity + saturation) - (saturation * luminosity);
var_1 = 2 * luminosity - var_2;
red = 255 * Hue_2_RGB(var_1, var_2, hue + ( 1 / 3 ) );
green = 255 * Hue_2_RGB( var_1, var_2, hue );
blue = 255 * Hue_2_RGB( var_1, var_2, hue - ( 1 / 3 ) );
}
Sorry, not sure of a good way to fix the whitespace on those.
I replaced H, S, L values with my own names, hue, saturation, and luminosity. I looked it back over, but unless I am missing something I replaced it correctly. The hue_2_RGB function, though, is completely unedited, besides the parts needed for C++. (e.g. variable type). I also used to have ints for everything - R, G, B, H, S, L - then it occured to me... HSL was a floating point for the formula - or at least, it would seem it should be. So I made variable used (var_1, var_2, all the v's, R, G, B, hue, saturation, luminosity) to floats. So I don't beleive it is some sort of data loss error here. Additionally, before entering the formula, I have hue /= 360, saturation /= 100, and luminosity /= 100. Note that before that point, I have hue = 59, saturation = 100, and luminosity = 70. I believe I got the hue right as 360 to ensure 0-1, but trying /= 100 didn't fix it either.
and so, my question is, why is the formula not working? Thanks if you can help.
EDIT: if the question is not clear, please comment on it.
Your premise is wrong. You can just scale the RGB color. The Color class in Java for example includes commands called .darker() and .brighter(), these use a factor of .7 but you can use anything you want.
public Color darker() {
return new Color(Math.max((int)(getRed() *FACTOR), 0),
Math.max((int)(getGreen()*FACTOR), 0),
Math.max((int)(getBlue() *FACTOR), 0),
getAlpha());
}
public Color brighter() {
int r = getRed();
int g = getGreen();
int b = getBlue();
int alpha = getAlpha();
/* From 2D group:
* 1. black.brighter() should return grey
* 2. applying brighter to blue will always return blue, brighter
* 3. non pure color (non zero rgb) will eventually return white
*/
int i = (int)(1.0/(1.0-FACTOR));
if ( r == 0 && g == 0 && b == 0) {
return new Color(i, i, i, alpha);
}
if ( r > 0 && r < i ) r = i;
if ( g > 0 && g < i ) g = i;
if ( b > 0 && b < i ) b = i;
return new Color(Math.min((int)(r/FACTOR), 255),
Math.min((int)(g/FACTOR), 255),
Math.min((int)(b/FACTOR), 255),
alpha);
}
In short, multiply all three colors by the same static factor and you will have the same ratio of colors. It's a lossy operation and you need to be sure to crimp the colors to stay in range (which is more lossy than the rounding error).
Frankly any conversion to RGB to HSV is just math, and changing the HSV V factor is just math and changing it back is more math. You don't need any of that. You can just do the math. Which is going to be make the max component color greater without messing up the ratio between the colors.
--
If the question is more specific and you simply want better results. There are better ways to calculate this. You rather than static scaling the lightness (L does not refer to luminosity) you can convert to a luma component. Which is basically weighted in a specific way. Color science and computing is dealing with human observers and they are more important than the actual math. To account for some of these human quirks there's a need to "fix things" to be more similar to what the average human perceives. Luma scales as follows:
Y = 0.2126 R + 0.7152 G + 0.0722 B
This similarly is reflected in the weights 30,59,11 which are wrongly thought to be good color distance weights. These weighs are the color's contribution to the human perception of brightness. For example the brightest blue is seen by humans to be pretty dark. Whereas yellow (exactly opposed to blue) is seen to be so damned bright that you can't even make it out against a white background. A number of colorspaces Y'CbCr included account for these differences in perception of lightness by scaling. Then you can change that value and it will be scaled again when you scale it back.
Resulting in a different color, which should be more akin to what humans would say is a "lighter" version of the same color. There are better and better approximations of this human system and so using better and fancier math to account for it will typically give you better and better results.
For a good overview that touches on these issues.
http://www.compuphase.com/cmetric.htm
I am trying to figure out how to port the del2() function in matlab to C++.
I have a couple of masks that I am working with that are ones and zeros, so I wrote code liket his:
for(size_t i = 1 ; i < nmax-1 ; i++)
{
for(size_t j = 1 ; j < nmax-1 ; j++)
{
transmask[i*nmax+j] = .25*(posmask[(i+1)*nmax + j]+posmask[(i-1)*nmax+j]+posmask[i*nmax+(j+1)]+posmask[i*nmax+(j-1)]);
}
}
to compute the interior points of the laplacians. I think according to some info in "doc del2" in matlab, the border conditions just use the available info to compute, right? SO i guess I just need to write cases for the border conditions at i,j = 0 and nmax
However, i would think these values from the code I have posted here would be correct for the interior points as is, but it seems like the del2 results are different!
I dug through the del2 source, and I guess I am not enough of a matlab wizard to figure out what is going on with some of the code for the interior computation
You can see the code of del2 by edit del2 or type del2.
Note that del2 does cubic interpolation on the boundaries.
The problem is that the line you have there:
transmask[i*nmax+j] = .25*(posmask[(i+1)*nmax + j]+posmask[(i-1)*nmax+j]+posmask[i*nmax+(j+1)]+posmask[i*nmax+(j-1)]);
isn't the discrete Laplacian at all.
What you have is (I(i+1,j) + I(i-1,j) + I(i,j+1) + I(i,j-1) ) / 4
I dont' know what this mask is, but the discrete Laplacian (assuming the spacing between each pixel in each dimension is 1) is:
(-4 * I(i,j) + I(i+1,j) + I(i-1,j) + I(i,j+1) + I(i,j-1) )
So basically, you missed a term, and you don't need to divide by 4. I suggest going back and rederiving the discrete Laplacian from its definition, which is the second x derivative of the image plus the second y derivative of the image.
Edit: I see where you got the /4 from, as Matlab uses this definition for some reason (even though this isn't standard mathematically).
I think that with the Matlab compiler you can convert the m code into C code. Have you tried that?
I found this link where another methot to convert to C is explained.
http://www.kluid.com/mlib/viewtopic.php?t=337
Good luck.