t-SNE Choosing the Number of Dimensions

t-SNE Choosing the Number of Dimensions - pca

I am using t-SNE for exploratory data analysis. I am using this instead of PCA because PCA is linear and t-SNE is non-linear.
It's really straight-forward to know how many dimensions are required to capture the necessary variance with PCA.
How do I know how many dimensions are required for my data using t-SNE?
I have read a popular website of very useful information, but it doesn't discuss dimensionality.
https://distill.pub/2016/misread-tsne/

Related

Principal component analysis on proportional data

Is it valid to run a PCA on data that is comprised of proportions? For example, I have data on the proportion of various food items in the diet of different species. Can I run a PCA on this type of data or should I transform the data or do something else beforehand?

I had a similar question. You should search for "compositional data analysis". There are transformation to apply to proportions in order to analyze them with multivariate tecniques such as PCA. You can find also "robust" PCA algorithms to run your analysis in R. Let us know if you find an appropriate solution to your specific problem.

I don't think so.
PCA will give you "impossible" answers. You might get principal components with values that proportions can't have, like negative values or values greater than 1. How would you interpret this component?
In technical terms, the support of your data is a subset of the support of PCA. Say you have $k$ classes. Then:
the support for PCA vectors is $\R^k$
the support for your proportion vectors is the $k$- dimensional simplex. By simplex I mean the set of $p$ vectors of length $k$ such that:
$0 \le p_i \le 1$ where $i = 1, ..., k$
$\sum_{i=1}^k{p_i} = 1$
One way around this is if there's a one to one mapping between the $k$-simplex to all of $\R^k$. If so, you could map from your proportions to $\R^k$, do PCA there, then map the PCA vectors to the simplex.
But I'm not sure the simplex is a self-contained linear space. If you add two elements of the simplex, you don't get an element of the simplex :/
A better approach, I think, is clustering, eg with Gaussian mixtures, or spectral clustering. This is related to PCA. But a nice property of clustering is you can express any element of your data as a "convex combination" of the clusters. If you analyze your proportion data and find clusters, they (unlike PCA vectors) will be within the simplex space, and any mixture of them will be, too.
I also recommend looking into nonnegative matrix factorization. This is like PCA but, as the name suggests, avoids negative components and also negative eigenvectors. It's very useful for inferring structure in strictly positive data, like proportions. But nmf does not give you a basis for simplex space.

Why in CNN for image recognition tasks, the filters are always chosen to be extremely localized?

In CNN, the filters are usually set as 3x3, 5x5 spatially. Can the sizes be comparable to the image size? One reason is for reducing the number of parameters to be learnt. Apart from this, is there any other key reasons? for example, people want to detect edges first?

You answer a point of the question. Another reason is that most of these useful features may be found in more than one place in an image. So, it makes sense to slide a single kernel all over the image in the hope of extracting that feature in different parts of the image using the same kernel. If you are using big kernel, the features could be interleaved and not concretely detected.
In addition to yourself answer, reduction in computational costs is a key point. Since we use the same kernel for different set of pixels in an image, the same weights are shared across these pixel sets as we convolve on them. And as the number of weights are less than a fully connected layer, we have lesser weights to back-propagate on.

dense sift and vlfeat

I want to ask two questions about dense sift(dsift) and vlfeat:
Any material that details dsift? I have seen many that said "dense SIFT is the SIFT's application to dense grids". But what does this mean? Can it be described in a more detailed manner? I read the source code dsift.c and dsift.h in vlfeat and the technique details about dsift. But there are many things I cannot understand. Existing papers usually focus on the application of dsift.
I use vlfeat in my C program and it works fine. But when I custom the parameters with vl_dsift_set_geometry, it goes wrong. Because I do not know how dsift works, I do not know how to set binSizeX/Y and numBinX/Y properly. I read in a paper "patch size 76". Does patch refer to a 4*4 grid? I somewhat got confused by the terms bin, patch and grid. Well, my question is, with patch size being 76, how to set binSizeX/Y and numBinX/Y?(image size 256*256)?

In SIFT, the first step is to detect key points. Key points detection is performed at multiple scale.
The next step is to describe the key point to generate the descriptor.
The distribution of the key points over the image is not uniform, depending on the detected key points.
In dense sift features, there is no key points detection, based on a grid at specific points, sift features will be detected at specific scale. This is not useful if you are matching objects that may appear at different scales.
There is the phow version which is a combination between dense sift and sift. Instead of detecting sift at pre-specified locations and pre-specified scales, sift features are detected at pre-specified locations but different scales. In phow, all sift features detected at the same point ( different scales) will be combined together to construct a single feature at the location

Given that the data features are all nominal; does it make any sense to apply PCA to the data?

If PCA also helps to normalize the data, how a normalized data is going to be improved by PCA. Thanks

PCA does more than normalize the data (in fact, it only normalizes if whiten=True). It also projects into a different space using the n_components eigenvectors with maximal variation (largest eignevalues) - this can provide better performance for your classifier/clustering if your data is "stretched" along a particular dimension. See this example for more information.

cvSVM training produces poor results for HOGDescriptor

My objective is to train an SVM and get support vectors which i can plug into opencv's HOGdescriptor for object detection.
I have gathered 4000~ positives and 15000~ negatives and I train using the SVM provided by opencv. the results give me too many false positives.(up to 20 per image) I would clip out the false positives and add them into the pool of negatives to retrain. and I would end up with even more false positives at times! I have tried adjusting L2HysThreshold of my hogdescriptor upwards to 300 without significant improvement. is my pool of positives and negatives large enough?
the SVM training is also much faster than expected. I have tried with a feature vector size of 2916 and 12996, using grayscale images and color images on separate tries. SVM training has never taken longer than 20 minutes. I use auto_train. I am new to machine learning but from what i hear training with a dataset as large as mine should take at least a day no?
I believe cvSVM is not doing much learning and according to http://opencv-users.1802565.n2.nabble.com/training-a-HOG-descriptor-td6363437.html, it is not suited for this purpose. does anyone with experience with cvSVM have more input on this?
I am considering using SVMLight http://svmlight.joachims.org/ but it looks like there isn't a way to visualize the SVM hyperplane. What are my options?
I use opencv2.4.3 and have tried the following setsups for hogdescriptor
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(5,5);
hog.blockSize = cv::Size(10,10);
hog.blockStride = cv::Size(5,5); //12996 feature vector
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(10,10);
hog.blockSize = cv::Size(20,20);
hog.blockStride = cv::Size(10,10); //2916 feature vector

Your first descriptor dimension is way too large to be any useful. To form any reliable SVM hyperplane, you need at least the same number of positive and negative samples as your descriptor dimensions. This is because ideally you need separating information in every dimension of the hyperplane.
The number of positive and negative samples should be more or less the same unless you provide your SVM trainer with a bias parameter (may not be available in cvSVM).
There is no guarantee that HOG is a good descriptor for the type of problem you are trying to solve. Can you visually confirm that the object you are trying to detect has a distinct shape with similar orientation in all samples? A single type of flower for example may have a unique shape, however many types of flowers together don't have the same unique shape. A bamboo has a unique shape but may not be distinguishable from other objects easily, or may not have the same orientation in all sample images.
cvSVM is normally not the tool used to train SVMs for OpenCV HOG. Use the binary form of SVMLight (not free for commercial purposes) or libSVM (ok for commercial purposes). Calculate HOGs for all samples using your C++/OpenCV code and write it to a text file in the correct input format for SVMLight/libSVM. Use either of the programs to train a model using linear kernel with the optimal C. Find the optimal C by searching for the best accuracy while changing C in a loop. Calculate the detector vector (a N+1 dimensional vector where N is the dimension of your descriptor) by finding all the support vectors, multiplying alpha values by each corresponding support vector, and then for each dimension adding all the resulting alpha * values to find an ND vector. As the last element add -b where b is the hyperplane bias (you can find it in the model file coming out of SVMLight/libSVM training). Feed this N+1 dimensional detector to HOGDescriptor::setSVMDetector() and use HOGDescriptor::detect() or HOGDescriptor::detectMultiScale() for detection.

I have had successful results using SVMLight to learn SVM models when training from OpenCV, but haven't used cvSVM, so can't compare.
The hogDraw function from http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html will visualise your descriptor.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js