Why 100x100 images form a 10'000-dimension space? - computer-vision

While reading a paper, I came through that when viewed as vectors of pixel values, face images are extremely high-dimensional. For example, 100x100 images form a 10'000-dimension space.
How is that possible, I don't seem to understand it.

A vector has only one dimension so if you convert a 2D array into 1D known as Flatten in terms of Neural Networks, the result you'll get would be a vector of 100*100 = 10000 values in one dimension. So, basically, you are accumulating a 2D quantity into 1D.
If you need more info on this topic, you can understand the concept of Flatten from YouTube, it will help you get a pictorial understanding of the concept.
Hope this would help clear your doubt.

Related

get Slice Number from 4D dicom tages

I have 4D DCE images and already wrote a code to order and make 3d images from 4D dicom images, for sorting of images, I used acquisition number from dicom tags and divide total number of slices to got slice number, but in two of my data the acquisition number is not true(don't know why). So when I look at 3D image, it is one and half of normal image. I am thinking to use slice number tags directly( 0054,0081) but because I have 3d images, my program just crash during debugging. Do you have any idea how to use that?
any other ideas for achieving slice number are well com.
Cheers,
Nady
I don't have experience with 4D Dicoms, however with 3D dicom it's better to use Patient Position rather than slice number. See https://stackoverflow.com/a/6598664/1136458 and also http://public.kitware.com/pipermail/insight-users/2005-September/014711.html could be useful
From your question it's not very clear how you have used the acquisition number. Could Number of Frames (0028,0008) and Frame Increment Pointer (0028,0009) be used in your case? I found their definition here http://nipy.org/nibabel/dicom/dicom_fields.html#multi-frame-images

3D reconstruction C++ with OpenCV..Fundamental Matrix too large

Ok I am posting my conundrums of life to stackoverflow after 4 days of mindless programming when nothing seems to get things right or atleast close to right. sorry for being a little dramatic but I feel like a lousy programmer today.
Anyway, my problem is:
To obtain Fundamental matrix using RANSAC (N>8).
I have two images with wide baseline but sufficient overlap so that adequate amount of SURF keypoints (~308) are matched correctly (i plot them).
Now lies the problem. I pass the 2D points to cv::findFindamentalMat but I get completly baseless results. The function returns:
FundMat=[2.05148e-13 3.72341 -2.03671e+10
1.6701e+26 -4.17712 4.59533e+29
3.32414e+18 2.8843 1.91069e-26]
To circumvent the large dynamic range of the matrix, Hartley suggested to normalise the data points (in euclidean space and not the projection space normalization)....Even after doing that the result is the almost the same. (10^-9 to 10^9)
I understand that FundMat is accurate only upto scale but a difference of 10^-9 to 10^+9 is too much.
I referred to other questions here but i dont seem to get any leads:findfundamentalmatrix-doesnt-find-fundamental-matrix
how-to-calculate-the-fundamental-matrix-for-stereo-vision
Any ideas would be great. This is a very important step when considering uncalibrated images for the rest of the software pipeline.
n case the code is helpful. (its not indented and colored though..space is too less here.)
https://sites.google.com/site/3drecon124/
its solved...silly human error. there was a data type conversion from double to float and it caused data to be fetched from incorrect locations in memory. now its smooth and epipolar constraint is satisfied upto scale.

Bilinear interpolation to enlarge bitmap images

I'm a student, and I've been tasked to optimize bilinear interpolation of images by invoking parallelism from CUDA.
The image is given as a 24-bit .bmp format. I already have a reader for the .bmp and have stored the pixels in an array.
Now I need to perform bilinear interpolation on the array. I do not understand the math behind it (even after going through the wiki article and other Google results). Because of this I'm unable to come up with an algorithm.
Is there anyone who can help me with a link to an existing bilinear interpolation algorithm on a 1-D array? Or perhaps link to an open source image processing library that utilizes bilinear and bicubic interpolation for scaling images?
The easiest way to understand bilinear interpolation is to understand linear interpolation in 1D.
This first figure should give you flashbacks to middle school math. Given some location a at which we want to know f(a), we take the neighboring "known" values and fit a line between them.
So we just used the old middle-school equations y=mx+b and y-y1=m(x-x1). Nothing fancy.
We basically carry over this concept to 2-D in order to get bilinear interpolation. We can attack the problem of finding f(a,b) for any a,b by doing three interpolations. Study the next figure carefully. Don't get intimidated by all the labels. It is actually pretty simple.
For a bilinear interpolation, we again using the neighboring points. Now there are four of them, since we are in 2D. The trick is to attack the problem one dimension at a time.
We project our (a,b) to the sides and first compute two (one dimensional!) interpolating lines.
f(a,yj) where yj is held constant
f(a,yj+1) where yj+1 is held constant.
Now there is just one last step. You take the two points you calculated, f(a,yj) and f(a,yj+1), and fit a line between them. That's the blue one going left to right in the diagram, passing through f(a,b). Interpolating along this last line gives you the final answer.
I'll leave the math for the 2-D case for you. It's not hard if you work from the diagram. And going through it yourself will help you really learn what's going on.
One last little note, it doesn't matter which sides you pick for the first two interpolations. You could have picked the top and bottom, and then done the third interpolation line between those two instead. The answer would have been the same.
When you enlarge an image by scaling the sides by an integral factor, you may treat the result as the original image with extra pixels inserted between the original pixels.
See the pictures in IMAGE RESIZE EXAMPLE.
The f(x,y)=... formula in this article in Wikipedia gives you a method to compute the color f of an inserted pixel:
For every inserted pixel you combine the colors of the 4 original pixels (Q11, Q12, Q21, Q22) surrounding it. The combination depends on the distance between the inserted pixel and the surrounding original pixels, the closer it is to one of them, the closer their colors are:
The original pixels are shown as red. The inserted pixel is shown as green.
That's the idea.
If you scale the sides by a non-integral factor, the formulas still hold, but now you need to recalculate all pixel colors as you can't just take the original pixels and simply insert extra pixels between them.
Don't get hung up on the fact that 2D arrays in C are really 1D arrays. It's an implementation detail. Mathematically, you'll still need to think in terms of 2D arrays.
Think about linear interpolation on a 1D array. You know the value at 0, 1, 2, 3, ... Now suppose I ask you for the value at 1.4. You'd give me a weighted mix of the values at 1 and 2: (1 - 0.4)*A[1] + 0.4*A[2]. Simple, right?
Now you need to extend to 2D. No problem. 2D interpolation can be decomposed into two 1D interpolations, in the x-axis and then y-axis. Say you want (1.4, 2.8). Get the 1D interpolants between (1, 2)<->(2,2) and (1,3)<->(2,3). That's your x-axis step. Now 1D interpolate between them with the appropriate weights for y = 2.8.
This should be simple to make massively parallel. Just calculate each interpolated pixel separately. With shared memory access to the original image, you'll only be doing reads, so no synchronization issues.

the dimensional reduction issues in self-organizing map (SOM)

Self organizing map is claimed to be able to visualize/cluster the high-dimensional data on a smaller dimensional space. I have some difficulties in understanding this statement.
Consider a six-dimensional data set, the codebook vector/reference vector is also of six-dimensional. According to the SOM algorithm, updating these reference vectors are also conducted in the six-dimensional vector space. If we are considering a two dimensional map, how should I understand the map between the six-dimensional data space and two-dimensional map space?
The map between the N-dimensional input space and the 2D SOM space is a non-linear projection preserving as much of the topology as possible.
It means that information about distance and angle is lost in the process but that proximity relationship between points is preserved (i.e. 2 points which are close one to another in the input space should be close in the SOM space).
I got my best insight in "what does a SOM do?" by using it on the 3D RGB color space: the work of the SOM can easily be visualized in this case and should help to grasp the concept.
The 2D self organizing map (SOM) distributes the input vectors onto a 2D plane. Mathematically the SOM is a 3D matrix and the length of the third dimension is given by the length of your input data. To visualize the SOM it's usual to compute the U-matrix. The U-matrix gives for each neuron of the SOM the mean Euclidean distance between the considered neuron and its neighbors.
The resulting 2D matrix allows the visualization of the high dimensional space onto a 2D plane. The high values give barrier between clusters represented as deep blue valleys in the following figure:
This U-matrix comes from the learning on this 3D data set:
And here the U-matrix in the 3D original space:
You cannot understand it but it's possible to use it so you can try to think of it as a discrete function that can map for example a 4d vector space to a 1d vector. Most important is that your function is some sort of recursion. A L-system for example uses recursion or repetition a lot. A better description about monster curves can be found here at Nick' spatial index hilbert curve blog.

Analyzing gaze tracking data

I have an image which was shown to groups of people with different domain knowledge of its content. I than recorded gaze fixation data of them watching the image.
I now kind of want to compare the results of the two groups - so what I need to know is, if there is a correlation of the positions of the sampling data between the two groups or not.
I have the original image as well as the fixation coords. Do you have any good idea how to start analyzing the data?
It's more about the idea or the plan so you don't have to be too technical on that one.
Thanks
Simple idea: render all the coordinates on the original image in a 'heat map' like way, one image for each group. You can then visually compare the images for correlation, and you have some nice graphics for in your paper.
There is something like the two-dimensional correlation coefficient. With software like R or Matlab you can do the number crunching for the correlation.
Matlab has a function for this:
Two Dimensional Correlation Function: corr2
Computes two dimensional correlation coefficient between two matrices
and the matrices must be of the same size. r = corr2 (A,B)
In gaze tracking, the most interesting data lies in two areas.
In where all people look, for that you can use the heat map Daan suggests. Make a heat map for all people, and heat maps for separate groups of people.
In when people look there. For that I would recommend you start by making heat maps as above, but for short time intervals starting from the time the picture was first shown. Again, for all people, and for the separate groups you have.
The resulting set of heat-maps, perhaps animated for the ones from the second point, should give you some pointers for further analysis.