How to code for a probability map in opencv? - c++

Hope I am posting in correct forum.
Just want to sound out my ideas and approach to solve problem. Would welcome any pointers, help (if given code would definitely be ideal :) )
Problem:
I want to code for the probability distribution (in a 400 x 400 map) in order to find the spatial location (x,y) of another line (let us call it fL) based upon the probability, in the probability map.
I have gotten a nearly horizontal line cue (let call it lC) from prior processing to calculate the probability to determine fL. fL is estimated to lie at D distance away from this horizontal line cue. My task is to calculate this probability map
Approach:
1) I would take the probability map distribution as Gaussian and to be
P(fL | point ) = exp( ( x-D )^2 /sigma^2 )
which is giving probability of the line fL given the point in line cue lC is at D distance away, pending on sigma (which defines how fast the probability decrease)
2) I would use a LineIterator to find every single pixel that lie on the line cue lC (given that I know the start and end point of line). Let say I have gotten n pixel in this line
3) For every single pixel in the 400 x 400 image, I would calculate the probability using 1) as described above for all n points that I have gotten for the line. I would sum up each line point contribution
4) After finishing all the pixel calculation in the 400x400 image, I would then normalize the probability based the largest pixel probability value. This part I am not unsure that I should normalize by the sum of all pixel probability or by using the step above.
5) After this I would multiply this probability map with other probability map. So I would get
P(fL | Cuefromthisline, Cuefromsomeother....) = P( fL | Cuefromthisline)P( fL | Cuefromsomeother).....
And I would set pixel with near 0 probability to be 0.001
6) That outlines my approach
Question
1) Is this workable? Or if there is any better method to doing this? ie getting the probability map
2) How do I normalize the map. by normalizing with the sum of all pixel probability or by normalizing with the max value
Thanks in advance for reading out this long post

Related

Wall detection using PCL and RANSAC

I have been working on wall detection in PCD (Point Cloud Data) file using PCL (Point Cloud Library). The PCD file has been generated through a depth camera. I found that in many of the similar applications e.g. floor detection, RANSAC has been used. So, I thought of applying RANSAC here as well and I tried my best to understand RANSAC in-general but I still have certain questions pertaining to my application:
In brief, RANSAC tries to remove the outliers in the given data and generalize the inliers through a model iteratively. So, in the case of floor detection, would it consider the rest of the point clouds corresponding to other objects as just outliers and the floor as inlier? The same is the case for the walls?
As per Plane model segmentation tutorial by PCL, RANSAC is giving the coefficients of the model plane i.e. a, b, c, and d in the equation of plane: a*x + b*y + c*z + d = 0 through coefficients->values vector. So, I assume that in the case of wall detection, it would try to give the equation of the plane corresponding to the wall. However, what if the depth camera is at the corner of a room and the top view of walls look like this:
             wall 1
       ______________
       |
       |
       | wall 2
       |
       |
       So, in this case, what would be the resultant model plane look like? Would it be kind of a        hypotenuse (making a triangle)?
             wall 1
       ---------------------
                                |
                                | wall 2
                                |
                                 ----------------------
                                
         wall 3
       Even in this case, how would it look like?
As per the Extracting indices from a PointCloud tutorial by PCL, ExtractIndices <pcl::ExtractIndices> filter is used to extract a subset of points from a point cloud based on the indices output by a segmentation algorithm. But, what exactly this filter is doing? In fact, in the case of floor detection or wall detection (assuming there is only one straight wall), RANSAC is already giving an equation of one plane. So, is there any need of using that filter? If yes then why and how?
How can I detect multiple walls in the following case? ExtractIndices <pcl::ExtractIndices> filter can do this? If yes then how?
             wall 1
       ---------------------
                                |
                                | wall 2
                                |
                                 ----------------------
                                
         wall 3
If you think that there are better ways than using RANSAC then also please let me know.
Answers for a few question you asked:
As fas as I know, in case of using plane model, RANSAC chooses 3 points from the cloud randomly, and considers it as a plane.(this is a provisional statement that will be substantiated later) All the points that are closer to this plane that the given threshold as a perpendicular distance are considered as inliers.
The algorithm gives back the plane which contains the most points in it (the found plane also depends on the iteration number you choose, if it is too low maybe it misses the largest plane).
In case of walls the story is the same. You can search for planes, but should choose the searching directions well. Walls are casually perpendicular to plane x-y. The parameters should be set considered this.
Example:
pcl::SACSegmentation<pcl::PointXYZI> seg;
Eigen::Vector3f axis;
//HELPER VARIABLES
float angle = 12.0;
void set_segmentation(float threshold, int max_iteration, float probability) {
seg.setModelType(pcl::SACMODEL_PERPENDICULAR_PLANE);
seg.setMethodType(pcl::SAC_RANSAC);
// set cloud, threshold, and other paramatres
seg.setDistanceThreshold(threshold);//Distance need to be adjusted according
to the obj
seg.setMaxIterations(max_iteration);
seg.setProbability(probability);
}
PlaneSegment(float x, float y, float z, float set_angle, float threshold, int
max_iteration, float probability) {
axis = Eigen::Vector3f(x, y, z);
angle = set_angle;
seg.setAxis(axis);
seg.setEpsAngle(angle * (3.1415 / 180.0f));
//SET SEGMENTATION
set_segmentation(threshold, max_iteration, probability);
}
pcl::PointIndices::Ptr segment_plane(pcl::PointCloud<pcl::PointXYZI>::Ptr cloud,
pcl::PointIndices::Ptr inliers) {
seg.setInputCloud(cloud);
seg.segment(*inliers, *coefficients);
if (inliers->indices.size() == 0)
{
PCL_ERROR("COULD NOT ESTIMATE PLANAR MODEL.\n");
exit(-1);
}
return inliers;
}
pcl::PointCloud<pcl::PointXYZI>::Ptr
extraction(pcl::PointCloud<pcl::PointXYZI>::Ptr cloud, pcl::PointIndices::Ptr
inliers) {
extract.setInputCloud(cloud);
extract.setIndices(inliers);
extract.setNegative(true);
extract.filter(*cloud);
return cloud;
}
You can set an acceptance angle, in this case 12 degrees, and also the searching directions based on the axis.
For your second point:
In case of multiple walls, it will give back the one which contains the most points. But you should be able to extract the other planes also if needed. (advice, save all planes which contains more points than a threshold you choose)
I chekced after your problem, this is also a solution for that: pcl::RANSAC segmentation, get all planes in cloud?. First comment gives very good answer.
Third point:
Check the example code. Note, this is a class so thats why there is a constructor. The segment_plane function returns inliers. Based on that you can call the extraction function, and it removes the inliers from the cloud. This is a very simple and fast soulition for this. You can avoid the suffering with the coefficients values. Also, if you dont want to remove them just colour them by iterating through the inliers and set its intensity to a chosen value.
RANSAC algorithm can be robust, but sometimes it just really does not work. Also it can be slow because of the iteration number.
There are multiple ways to solve this problem on another way.
Just an example: consider a grid below the cloud. A lot of equal sized square cells. In each cell you check the minimum and maximum point heights. Based on this you can get the ground plane (If these values are just slightly different, and are close to each other. The difference of the maximum heigth and the minimum height is very low in case of ground cell) or the walls. With walls you can assume that the points have an even distribution in each cells and the difference of the maximum - minimum values are high.
Best regards.

Linear interpolation of two vector arrays with different lengths

I have two curves. One handdrawn and one is a smoothed version of the handdrawn.
The data of each curve is stored in 2 seperate vector arrays.
Time Delta is also stored in the handdrawn curve vector, so i can replay the drawing process and so that it looks natural.
Now i need to transfer the Time Delta from Curve 1 (Raw input) to Curve 2 (already smoothed curve).
Sometimes the size of the first vector is larger and sometimes smaller than the second vector.
(Depends on the input draw speed)
So my question is: How do i fill vector PenSmoot.time with the correct values?
Case 1: Input vector is larger
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenInput.time[5] = 100
PenInput.time[6] = 20
PenInput.time[7] = 3
PenInput.time[8] = 9
PenInput.time[9] = 33
Case 2: Input vector is smaller
PenInput.time[0] = 0 PenSmoot.time[0] = 0
PenInput.time[1] = 5 PenSmoot.time[1] = ?
PenInput.time[2] = 12 PenSmoot.time[2] = ?
PenInput.time[3] = 2 PenSmoot.time[3] = ?
PenInput.time[4] = 50 PenSmoot.time[4] = ?
PenSmoot.time[5] = ?
PenSmoot.time[6] = ?
PenSmoot.time[7] = ?
PenSmoot.time[8] = ?
PenSmoot.time[9] = ?
Simplyfied representation:
PenInput holds the whole data of a drawn curve (Raw Input)
PenInput.x // X coordinate)
PenInput.y // Y coordinate)
PenInput.pressure // The pressure of the pen)
PenInput.timetotl // Total elapsed time)
PenInput.timepart // Time fragments)
PenSmoot holds the data of the massaged (smoothed,evenly distributed) curve of PenInput
PenSmoot.x // X coordinate)
PenSmoot.y // Y coordinate)
PenSmoot.pressure // Unknown - The pressure of the pen)
PenSmoot.timetotl // Unknown - Total elapsed time)
PenSmoot.timepart // Unknown - Time fragments)
This is the struct that i have.
struct Pencil
{
sf::VertexArray vertices;
std::vector<int> pressure;
std::vector<sf::Int32> timetotl;
std::vector<sf::Int32> timepart;
};
[This answer has been extensively revised based on editing to the question.]
Okay, it seems to me that you just about need to interpolate the time stamps in parallel with the points.
I'm going to guess that the incoming data is something on the order of an array of points (e.g., X, Y coordinates) and an array of time deltas with the same number of each, so time-delta N tells you the time it took to get from point N-1 to point N.
When you interpolate the points, you're probably going to want to do it intelligently. For example, in the shape shown in the question, we have what look like two nearly straight lines, one with positive slope, and the other with negative slope. According to the picture, that's composed of 263 points. We could reduce that to three points and still have a fairly reasonable representation of the original shape by choosing the two end-points plus one point where the two lines meet.
We probably don't need to go quite that far though. Especially taking time into account, we'd probably want to use at least 7 points for the output--one for each end-point of each colored segment. That would give us 6 straight line segments. Let's say those are at points 0, 30, 140, 180, 200, 250, and 263.
We'd then use exactly the same segmentation on the time deltas. Add up the deltas from 0 to 30 to get an average speed for the first segment. Add up the deltas for 31 through 140 to get an average speed for the second segment (and so on to the end).
Increasing the number of points works out roughly the same way. We need to look at exactly which input points were used to create a pair of output points. For a simplistic example, let's assume we produced output that was precisely double the number of input points. We'd then interpolate time deltas exactly halfway between each pair of input points.
In the case shown in the question, we start with unevenly distributed inputs, but produce evenly distributed outputs. So the second output point might be an average of the first four input points. The next output point might be an average of three input points (and so on). In many cases, it's likely that neither end-point of a segment in the output corresponds precisely to any point in the input.
That's fine too. We interpolate between two points of the input to figure out the time hack for the starting point of the output segment. Likewise for the ending point. Then we can compute the total time it should have taken to travel between them based on the time delta between the points.
If you want to get fancy, you could use a higher order interpolation instead of linear. That does require more input points per interpolation, but it looks like you probably have plenty to do something like a quadratic or cubic interpolation (in most cases). This is likely to make the most differences at transitions--places the "pen" was accelerating or decelerating quickly. In such an place, linear interpolation can give somewhat misleading results (though, given the number of points you seem to be working with, it may not make enough difference to notice).
As an illustration, let's consider a straight line. We're going to start from 5 input points, and produce 7 output points.
So, the input points are [0, 2, 7, 10, 15], and the associated time deltas are [0, 1, 4, 8, 3].
So, out total distance traveled is 16, and we want our output points to be evenly distributed. So, the distance between output points will be 16/7 = (roughly) 2.29.
So, obviously the first output point and time are both 0. The second output point is 2.29. To compute the output time, we take the entirety of the time to the first input point (0->2), plus .29 / (7-2) * (4-1). That interpolated section gives 1.37, so our first output time delta is 2.37.
The next output point should be at a distance of 4.58. Since the second input segment goes from 2 to 7, our entire second output segment will lie within the second input segment. So, we take 2.29 / (7-2), telling use that this output segment occupies .458 of the input segment. We then multiply that by the time for the second input segment to get the time delta for the second output segment: .458 * (4-1) = 1.374.
[...and it continues on the same way until we reach the end.]

Canny edge detector

I am coding my own version of Canny. So, from the literature we have to:
Smooth with gaussian
Here, i'm using a 5x5 mask
Compute gradient magnitude and orientation
Here, i'm using Sobel and then
Grad = abs(Gx)+ abs(Gy)
Orient = ( atan2(Gy/Gx) * 180/3.14159265 ) + 180
Non maximum suppression
For ex, if Orient = 0º => if G(i;j) > G(i;j-1) && G(i;j) > G(i;j+1) => MAX here, otherwise = 0
Double threshold
In this step, we get NL and NH
At this point, it is clear that NL contains NH, so NL = NL - NH
Now, for each non zero pixel p in NH(x,y) i have to mark as valid pixels all the weak pixels in NL(x,y) that are connected to p
Final image
It will be NL + NH
At the end I get the output from openCV Canny to compare.
What am i doing wrong ?
myCanny
openCVCanny
OpenCV canny does not do any Gaussian filtering. Try not to filter it then compare the results.
P.S. I did not review all your steps. They may have another error/s
Bloody hell! There was an error rounding the orientations to the possible four cases. I forgot to put the equal sign in some cases. Now it's all fixed.
Now i'm happy with the result :)
Thank you all!
Best regards

HOG: What is done in the contrast-normalization step?

According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.
I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?
As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.
Is there something I don't understand well?
Should I normalize the cells' values?
Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?
Link to the paper:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
The contrast normalization is achieved by normalization of each block's local histogram.
The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract
When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.
The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.
The steps of the algorithm due to my understanding - worked for me with 95% success rate:
Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):
A cell size in pixels (e.g. 6x6)
A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
Calculate gradient:
Compute the X difference using convolution with the vector [-1 0 1]
Compute the Y difference using convolution with the transpose of the above vector
Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
"Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):
Split to cells
For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
Create final vector:
Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318

Determine difference in stops between images with no EXIF data

I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.