I am looking for an Algorithm/Method for the fuzzy comparison of small like datasets (x,y of world coordinates and sensor angle etc)
Basically I am developing an autonomous mapping robot that seeks the edges and objects in its environment and because of small amounts of jitter in the drive train and Time of Flight sensor returned distance. The system identifies several points that are in fact the same point or edge, so I would like to do a fuzzy compare of a pair of datasets to see if they are the same.
Any ideas or code would be most welcome.
Many thanks imk
Related
I have created a dataset with a very accurate ground truth in 6DoF (both position and attitude) and would like to use this to compare the accuracy of the path in 6DoF for different monocular SLAM algorithms.
The ground truth results in a path in 6DoF relative to the ground truth's coordinate frame. The SLAM algorithms results in a path in 6DoF relative to the SLAM's coordinate frame
Due to the nature of monocular SLAM algorithms, I do not have a scale of the path.
How can I solve this with my dataset? Is there any available scripts?
What you want to do is find a transform between local and global coordinates. Depending on your exact state model the equations will change. But the basic idea is to start off with a known point in both frames. Say at initial time in our Global frame we have (Position global) Pg=[xg0 yg0 zg0 rg0 pg0 yg0] and the robots coordinate is Pr =[xr0 yr0 zr0 rr0 pr0 yr0] at this point we need create the mapping from Pg to Pr Once we have this we can represent all data in the same.
Mapping from 6dof to another 6dof is very difficult and highly nonlinear. It can usually be thought of in two steps
map between xyz to get the axis in the same area.
map the orientations between the two axis (roll, pitch yaw)
I couldn't find many sources on doing both simultaneously, but if you do them sequentially it will still work (order matters so be consistent) here is a nice post that has xyz transforms https://gamedev.stackexchange.com/questions/79765/how-do-i-convert-from-the-global-coordinate-space-to-a-local-space
this website is great (I used it for a 3d SLAM problem, it was incredibly helpful) and it has information on roll pitch yaw transforms. http://planning.cs.uiuc.edu/node104.html if you explore the website you should also find xyz transforms. Sometimes it helsp to start off with the 2D examples first so you understand the concept then look at the 3D after
Good luck
edit
I originally posted the wrong link to the planning website but its fixed. Here is the main equation
Your landmark points for SLAM are the output of this equation Global landmark = T * Landmark w/respect Robot and each point is represented as [x,y,z,1] the 1 is needed to preserve translation. The roll(alpha) pitch(beta) and yaw(gamma) are obtained from the rotation matrix between the Global coordinates and the robots coordinates
Suppose I want to find out if there is a person in a bed or not using cameras and computer vision algorithms. One can assume that the camera provides RGB, infrared and depth data.
I don't really have a good idea how to solve this. So far I came up with this:
Estimate a plane using RANSAC of the bed object. This plane should be further away from the ground plane, if there is a person in the bed. This seems very unstable though, assumes that the normal height of a bed is known and can easily be broken if the bed has an adjustable head part (e.g. in a hospital)
Face detection. Try to detect a face in the bed. Probably also isn't very reliable since the face can be sideways to the camera and partly covered.
Use the infrared-image. I am not sure how much you would see through the blanket and what would happen if the person just left the bed and the bed is still warm?
Is there a good way to do this? Or, to be reliable, you would have to use pressure sensors in the bed?
Thanks!
I dont know about infrared images but for camera based video processing this kind of problem is widely studied.
If your problem is to detect a person in a bed which is "Normally empty" then I think the simplest algorithm would be to capture successive frames and calculate their difference.
The existence of human in the frame would make it different from a frame capturing only empty bed. Depending on various algorithms like this you would get different reliability.
Otherwise you can go directly for human detection in video frames. One possible algorithm is described here.
Edit:
Your problem is harder than i thought. The following approach might solve the cases.
The main idea is to use bunch of features at once to get higher accuracy and remove false positives.
Use HOG person detector at top level to detect a person's entry in the scene. If the position of the possible entry doors are known or detectable using edge lines in the scene use it to increase accuracy. (At the point of entry the diference in successive frames will be located near the doors)
Use Edge lines to track the human. And use the bed edges to track the position of the human. The edges of human should be bounded by the edges of the bed.
If the difference is located within the bed implies human is in the bed but moving.
If needed as a preprocessing step include analysis of texture, connected component to remove possible moving objects in the room for higher accuracy (for example:- movement of clothes because of air).
Also use face detectors to increase accuracy.
Infrared that camera uses has a different frequency than infrared signal from a warm object. Unless you are using military grade IR scanners you can forget about connection IR-warmth. But IR is still useful if there is limited light or you use it for depth maps.
Go with depth (Kinect style) and estimate bed as a segment at your image. It should have some features in depth (certain dimension, flatness, etc). The bed usually surrounded by walls or floor that are easy to segment out. You algorithm can also be tuned to the distance to the bed and cut it out based just on depth range.
As other people said, it will be useful to learn more about your particular goal or application. What is background or environment around the bed? how does it looks when there is no person in it? Can a person simulate his/her presence(as in prison escape scenario), etc. etc.
I have an image which was shown to groups of people with different domain knowledge of its content. I than recorded gaze fixation data of them watching the image.
I now kind of want to compare the results of the two groups - so what I need to know is, if there is a correlation of the positions of the sampling data between the two groups or not.
I have the original image as well as the fixation coords. Do you have any good idea how to start analyzing the data?
It's more about the idea or the plan so you don't have to be too technical on that one.
Thanks
Simple idea: render all the coordinates on the original image in a 'heat map' like way, one image for each group. You can then visually compare the images for correlation, and you have some nice graphics for in your paper.
There is something like the two-dimensional correlation coefficient. With software like R or Matlab you can do the number crunching for the correlation.
Matlab has a function for this:
Two Dimensional Correlation Function: corr2
Computes two dimensional correlation coefficient between two matrices
and the matrices must be of the same size. r = corr2 (A,B)
In gaze tracking, the most interesting data lies in two areas.
In where all people look, for that you can use the heat map Daan suggests. Make a heat map for all people, and heat maps for separate groups of people.
In when people look there. For that I would recommend you start by making heat maps as above, but for short time intervals starting from the time the picture was first shown. Again, for all people, and for the separate groups you have.
The resulting set of heat-maps, perhaps animated for the ones from the second point, should give you some pointers for further analysis.
I've been requested to take a given video, probably a simple cartoon, and return an array of its scenes.
I need to use the opencv libary in order to do it, and the result format is irrelevent (i.e. I can return the timespans of each scene or actualy split the video).
Any help would be appriciated.
Thanks
Technically, a scene is a group shots which are successively taken together at a single location. A shot is a basic narrative element of the video which is composed of a number of frames that are presented from a continuous viewpoint.
Automatically dividing a video into its shots is called the shot boundary detection problem in which the basic idea is identifying consecutive frames that form a transition from one shot to another.
Identifying transitions generally involve calculating a similarity value between two frames. This value can be calculated using low level image features such as color, edge or motion. A simple similarity metric could be:
s(f1, f2) = sum(i in all pixel locations)(abs(ficolor(i) - f2color(i))) / N
where f1 and f2 represent two distinct video frames and N represents number pixels in those frames. This is the average first order (Manhattan) pixel color distance between two frames.
Say you have a video composed of frames { f1, f2 ... fM } and you have calculated this distance between neighboring frames. A simple decision measure could be labeling a transition from fa to fb as a shot boundary if s(fa, fb) is below a certain threshold.
A successful shot boundary detector uses distances of second order (or more) such as Euclidean distance or Pearson correlation coefficient and utilizes a combination of different features instead of using only one, say color.
Usually, a camera or object movement breaks the pixel correspondence between frames. Using frequencies of low level details with the help of histograms will be a cure here.
Also, performing decision making over more than two frames helps in finding smooth transitions where one shot dissolves into or replaces another for a duration. Deciding for a group of frames also help us in identifying false transitions caused by light flashes or fast moving cameras.
For your problem, please start from basic approaches like comparing RGB colors and edge responses between video frames. Analyze your results and data together and try to adapt new features, distance metrics and decision making methods for better performance.
The best way of segmenting a video into shots will vary depending on your data. Machine learning approaches like probabilistically modeling frame transitions with Gaussian mixture models or classification through support vector machines are expected to perform better than hand selected thresholds. However it is important that you learn the basics before effectively choosing input features.
Automatically finding shot boundaries will be sufficent to divide your video into meaningful parts. Dividing your video into scenes, on the other hand, is considered a harder semantic problem. Nevertheless, shot segmentation is the first step to it.
There is a huge research area focusing on this. Search for papers, you can find various algorithms described in detail.
Here some examples:
Scene detection in Hollywood movies and TV shows
A framework for video scene boundary detection
Video summarization and scene detection by graph modeling
There are plenty more out there, just google.
I am currently working on a data visualization project.My aim is to produce contour lines ,in other words iso-lines, from gridded data.Data can be temperature, weather data or any kind of other environmental parameters but only condition is it must be regularly spaced.
I searched in internet , however i could not find a good algorithm, pseudo-code or source code for producing contour lines from grids.
Does anybody knows a library, source code or an algorithm for producing contour lines from gridded data?
it will be good if your suggestion has a good run time performance, i don't want to wait my users so much :)
Edit: thanks for response but isolines have some constrains like they should not intersects
so just generating bezier curves does not accomplish my goal.
See this question: How to approximate a vector contour from an elevation raster?
It's a near duplicate, but uses quite different terminology. You'll find that cartography and computer graphics solve many of the same problems, but use different terminology for them.
there's some reasonably good contouring available in GNUplot - if you're able to use GPL code that may help.
If your data is placed at regular intervals, this can be done fairly easily (assuming I understand your problem correctly). First you need to determine at what interval you want your contours. Next create the grid you are going to use to store the contour information (i'm assuming just a simple on/off or elevation at this contour level type of data), which should be one interval smaller than the source data.
Now the trick here is to offset the 2 grids by 1/2 an interval (won't actually show up in code like this, but its the concept I'm dealing with here) and compare the 4 coordinates surrounding the current point in the contour data grid you are calculating. If any of the 4 points are in a different interval range, then that 'pixel' in the contour grid should be set to true (or the value of the contour range being crossed).
With this method, there will be a problem when the interval is too fine which will cause several contours to overlap onto each other.
As the link from Paul Tomblin suggests, Bezier curves (which are a subset of B-splines) are a ripe solution for your problem. If runtime performance is an issue, Bezier curves have the added benefit of being constructable via the very fast de Casteljau algorithm, instead of drawing them according to the parametric equations. On the off chance you're working with DirectX, it has a library function for the de Casteljau, but it should not be challenging to brew one yourself using the 1001 web pages that describe it.