I need to obtain the 3D plot of the joint probability distribution of two random variables x and y. Whereas this plot can be easily obtained with Mathematica, I wasn't able to find any documentation in Python.
Can you help me out with that?
Related
I'm trying to generate a set of numbers that would follow a 1/R density distribution function, where R is the distance from the origin in polar coordinates. Basically the number of points/concentration of points should fall of as 1/R away from the center. Since I'm still fairly new to Fortran 90, I've managed to make an array of random numbers but taking it to the next step to make it follow 1/R seems tough..The way I create the array of random numbers is below:
PROGRAM randomnumbers
REAL::mynum
REAL,DIMENSION(1,10)::matrix
call random_number(matrix)
write(*,*)matrix
END PROGRAM randomnumbers
The way I did this in Python is using the inverse CDF and then interpolating but i'm not sure how to go about doing that on Fortran 90. The way it was done in Python is as follows : Points that follow a 1/R density distribution in an XY grid?
I have two arrays with the same dimension, lets name it x and y.
When I plot them, plt.plot(x,y) the plot itself gives me back a continuous interpolation of my discrete data x and y.
How can I recover these interpolation from the plot?
Is there any other alternative to obtain more data in (x,y)?
pyplot.plot() connects points on the graph with a line. That corresponds to linear interpolation if the plot is not logarithmic.
Among ready made functions, look at numpy.interp().
For theory, refer to https://en.wikipedia.org/wiki/Linear_interpolation
I'm applying 3D graph cuts based on Yuri Boykov's implementation in C++, itk and boost for min cut/max flow. First I provide some foreground and background seeds. Then I create the graph and assign the weight to the edges using 3D neighborhood (boundary term):
weight=vcl_exp(-vcl_pow(pixelDifference,2)/(2.0*sigma*sigma)),
being sigma a noise function.
Then I assign the source/sink edges depending on the intensity probability histogram (regional term):
this->Graph->add_tweights(nodeIterator.Get(),
-this->Lambda*log(sinkHistogramValue),
-this->Lambda*log(sourceHistogramValue));
So the energy function is E= regional term+boundary term. Then, the cut is compute with Boycov's implementation, but I don't understand exactly how. Anyway, now I want to add a shape prior to the cut, but I have no clue on how to do it.
Do I have to change the weight of the edges?
Do I have to create another energy function? And if so, how?
How could I provide both functions to the mincut/max flow algorithm?
Hope my questions are easily understandable.
Thank you very much,
Karen
I am trying to classify MRI images of brain tumors into benign and malignant using C++ and OpenCV. I am planning on using bag-of-words (BoW) method after clustering SIFT descriptors using kmeans. Meaning, I will represent each image as a histogram with the whole "codebook"/dictionary for the x-axis and their occurrence count in the image for the y-axis. These histograms will then be my input for my SVM (with RBF kernel) classifier.
However, the disadvantage of using BoW is that it ignores the spatial information of the descriptors in the image. Someone suggested to use SPM instead. I read about it and came across this link giving the following steps:
Compute K visual words from the training set and map all local features to its visual word.
For each image, initialize K multi-resolution coordinate histograms to zero. Each coordinate histogram consist of L levels and each level
i has 4^i cells that evenly partition the current image.
For each local feature (let's say its visual word ID is k) in this image, pick out the k-th coordinate histogram, and then accumulate one
count to each of the L corresponding cells in this histogram,
according to the coordinate of the local feature. The L cells are
cells where the local feature falls in in L different resolutions.
Concatenate the K multi-resolution coordinate histograms to form a final "long" histogram of the image. When concatenating, the k-th
histogram is weighted by the probability of the k-th visual word.
To compute the kernel value over two images, sum up all the cells of the intersection of their "long" histograms.
Now, I have the following questions:
What is a coordinate histogram? Doesn't a histogram just show the counts for each grouping in the x-axis? How will it provide information on the coordinates of a point?
How would I compute the probability of the k-th visual word?
What will be the use of the "kernel value" that I will get? How will I use it as input to SVM? If I understand it right, is the kernel value is used in the testing phase and not in the training phase? If yes, then how will I train my SVM?
Or do you think I don't need to burden myself with the spatial info and just stick with normal BoW for my situation(benign and malignant tumors)?
Someone please help this poor little undergraduate. You'll have my forever gratefulness if you do. If you have any clarifications, please don't hesitate to ask.
Here is the link to the actual paper, http://www.csd.uwo.ca/~olga/Courses/Fall2014/CS9840/Papers/lazebnikcvpr06b.pdf
MATLAB code is provided here http://web.engr.illinois.edu/~slazebni/research/SpatialPyramid.zip
Co-ordinate histogram (mentioned in your post) is just a sub-region in the image in which you compute the histogram. These slides explain it visually, http://web.engr.illinois.edu/~slazebni/slides/ima_poster.pdf.
You have multiple histograms here, one for each different region in the image. The probability (or the number of items would depend on the sift points in that sub-region).
I think you need to define your pyramid kernel as mentioned in the slides.
A Convolutional Neural Network may be better suited for your task if you have enough training samples. You can probably have a look at Torch or Caffe.
What would b the best way to implement a simple shape-matching algorithm to match a plot interpolated from just 8 points (x, y) against a database of similar plots (> 12 000 entries), each plot having >100 nodes. The database has 6 categories of plots (signals measured under 6 different conditions), and the main aim is to find the right category (so for every category there's around 2000 plots to compare against).
The 8-node plot would represent actual data from measurement, but for now I am simulating this by selecting a random plot from the database, then 8 points from it, then smearing it using gaussian random number generator.
What would be the best way to implement non-linear least-squares to compare the shape of the 8-node plot against each plot from the database? Are there any c++ libraries you know of that could help with this?
Is it necessary to find the actual formula (f(x)) of the 8-node plot to use it with least squares, or will it be sufficient to use interpolation in requested points, such as interpolation from the gsl library?
You can certainly use least squares without knowing the actual formula. If all of your plots are measured at the same x value, then this is easy -- you simply compute the sum in the normal way:
where y_i is a point in your 8-node plot, sigma_i is the error on the point and Y(x_i) is the value of the plot from the database at the same x position as y_i. You can see why this is trivial if all your plots are measured at the same x value.
If they're not, you can get Y(x_i) either by fitting the plot from the database with some function (if you know it) or by interpolating between the points (if you don't know it). The simplest interpolation is just to connect the points with straight lines and find the value of the straight lines at the x_i that you want. Other interpolations might do better.
In my field, we use ROOT for these kind of things. However, scipy has a great collections of functions, and it might be easier to get started with -- if you don't mind using Python.
One major problem you could have would be that the two plots are not independent. Wikipedia suggests McNemar's test in this case.
Another problem you could have is that you don't have much information in your test plot, so your results will be affected greatly by statistical fluctuations. In other words, if you only have 8 test points and two plots match, how will you know if the underlying functions are really the same, or if the 8 points simply jumped around (inside their error bars) in such a way that it looks like the plot from the database -- purely by chance! ... I'm afraid you won't really know. So the plots that test well will include false positives (low purity), and some of the plots that don't happen to test well were probably actually good matches (low efficiency).
To solve that, you would need to either use a test plot with more points or else bring in other information. If you can throw away plots from the database that you know can't match for other reasons, that will help a lot.