Discrete cosine transform in htk is not the same with DCT in wiki - wiki

I saw the DCT in wiki:
but the DCT in htk book is :
These two formulas are not the same and I can't transform the first to the second one,anybody tell me which one is used in mfcc?

Related

Shadow Removal by inverse laplacian in C++

I have some query to code the algorithm in
Removing shadows from images by finlayson
I have gotten the matlab code to get illumination invariance image by jose alvarez and convert it to c++ coding
I have continued to follow the image reconstruction step outlined by finlayson algorithm but was stumped at the part of the Possion equation which is the part after removing the shadow edge from the log channel image
How should I proceed after that. The discussion of this part is vague to me. SI have read the following presentation slides. It say I must perform a inverse laplace operation on the image.
what should i do ? inverse laplace is not so common to code. Would need any advice I could get
ThANKS

Spatial pyramid matching (SPM) for SIFT then input to SVM in C++

I am trying to classify MRI images of brain tumors into benign and malignant using C++ and OpenCV. I am planning on using bag-of-words (BoW) method after clustering SIFT descriptors using kmeans. Meaning, I will represent each image as a histogram with the whole "codebook"/dictionary for the x-axis and their occurrence count in the image for the y-axis. These histograms will then be my input for my SVM (with RBF kernel) classifier.
However, the disadvantage of using BoW is that it ignores the spatial information of the descriptors in the image. Someone suggested to use SPM instead. I read about it and came across this link giving the following steps:
Compute K visual words from the training set and map all local features to its visual word.
For each image, initialize K multi-resolution coordinate histograms to zero. Each coordinate histogram consist of L levels and each level
i has 4^i cells that evenly partition the current image.
For each local feature (let's say its visual word ID is k) in this image, pick out the k-th coordinate histogram, and then accumulate one
count to each of the L corresponding cells in this histogram,
according to the coordinate of the local feature. The L cells are
cells where the local feature falls in in L different resolutions.
Concatenate the K multi-resolution coordinate histograms to form a final "long" histogram of the image. When concatenating, the k-th
histogram is weighted by the probability of the k-th visual word.
To compute the kernel value over two images, sum up all the cells of the intersection of their "long" histograms.
Now, I have the following questions:
What is a coordinate histogram? Doesn't a histogram just show the counts for each grouping in the x-axis? How will it provide information on the coordinates of a point?
How would I compute the probability of the k-th visual word?
What will be the use of the "kernel value" that I will get? How will I use it as input to SVM? If I understand it right, is the kernel value is used in the testing phase and not in the training phase? If yes, then how will I train my SVM?
Or do you think I don't need to burden myself with the spatial info and just stick with normal BoW for my situation(benign and malignant tumors)?
Someone please help this poor little undergraduate. You'll have my forever gratefulness if you do. If you have any clarifications, please don't hesitate to ask.
Here is the link to the actual paper, http://www.csd.uwo.ca/~olga/Courses/Fall2014/CS9840/Papers/lazebnikcvpr06b.pdf
MATLAB code is provided here http://web.engr.illinois.edu/~slazebni/research/SpatialPyramid.zip
Co-ordinate histogram (mentioned in your post) is just a sub-region in the image in which you compute the histogram. These slides explain it visually, http://web.engr.illinois.edu/~slazebni/slides/ima_poster.pdf.
You have multiple histograms here, one for each different region in the image. The probability (or the number of items would depend on the sift points in that sub-region).
I think you need to define your pyramid kernel as mentioned in the slides.
A Convolutional Neural Network may be better suited for your task if you have enough training samples. You can probably have a look at Torch or Caffe.

Projection of images in fisherspace(LDA)

In Linear Discriminant Analysis algorithm for face recognition, the between class scatter matrix and within class scatter matrix are both of size MxM (M=total number of images, C=number of classes). The fisherspace(matrix with eigenvectors as columns) consists of the eigenvectors corresponding to non-zero eigenvalues and hence has dimension Mx(C-1). How am I supposed to project the training phase images, each in N-dimensional space, onto the fisherspace.(Correct me if I am wrong). Can anybody help me figure this out?
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5256630
This is the research paper I followed to implement LDA. I am trying to implement it in OpenCV using C++

Smoothing motion parameters

I have been working on video stabilization for quite a few weeks now. The algorithm I'm following basically involves 3 steps :-
1. FAST feature detection and Matching
2. Calculating affine transformation (scale + rotation + translation x + translation y ) from matched keypoints
3. Smooth motion parameters using cubic spline or b-spline.
I have been able to calculate affine transform. But I am stuck at smoothing motion parameters. I have been unable to evaluate spline function to smooth the three parameters.
Here is a graph for smoothed data points
Any suggestion or help as to how can I code to get a desired result as shown in the graph?
Here is the code that calculate the points on the curve
B-spline Curves
But now the code will use all control points as transform parameters to formulate.
I think i will run in post-processing (not real time).
Did you run B spline smoothing in real time?

displacement between two images using opencv surf

I am working on image processing with OPENCV.
I want to find the x,y and the rotational displacement between two images in OPENCV.
I have found the features of the images using SURF and the features have been matched.
Now i want to find the displacement between the images. How do I do that? Can RANSAC be useful here?
regards,
shiksha
Rotation and two translations are three unknowns so your min number of matches is two (since each match delivers two equations or constraints). Indeed imagine a line segment between two points in one image and the corresponding (matched) line segment in another image. The difference between segments' orientations gives you a rotation angle. After you rotated just use any of the matched points to find translation. Thus this is 3DOF problem that requires two points. It is called Euclidean transformation or rigid body transformation or orthogonal Procrustes.
Using Homography (that is 8DOF problem ) that has no close form solution and relies on non-linear optimization is a bad idea. It is slow (in RANSAC case) and inaccurate since it adds 5 extra DOF. RANSAC is only needed if you have outliers. In the case of pure noise and overdetrmined system (more than 2 points) your optimal solution that minimizes the sum of squares of geometric distance between matched points is given in a close form by:
Problem statement: min([R*P+t-Q]2), R-rotation, t-translation
Solution: R = VUT, t = R*Pmean-Qmean
where X=P-Pmean; Y=Q-Qmean and we take SVD to get X*YT=ULVT; all matrices have data points as columns. For a gentle intro into rigid transformations see this