Which is a good choice LDA or PCA for feature reduction in the supervised learning model? - data-mining

PCA -> Unsupervised Model or use for supervise learning too
LDA -> supervise Model
Both used for the feature reduction.
Which is batter LDA or PCA for supervising learning feature reduction and why?
Data-set: It is very famous data-set of wine to find out the customer category.

If you have labels, a supervised approach will usually be much better than an unsupervised approach.
At least if the labels suit your problem.
If you do not have labels, then you can't use LDA.


Dealing with imbalance dataset for multi-label classification

In my case, I’ve 33 labels per samples. The input label tensors for a corresponding image are like [0,0,1,0,1,1,1,0,0,0,0,0…...33]. And the samples for some labels are quite low and some are high. I'm looking for predict the regression values. So what will be the best approach to improve the prediction? I would like to apply data balancing technique. But so far I found the balancing technique available only for multi-class. I’m grateful to you if you share your best knowledge about regarding my problem or any other idea to improve the performance. Thanks in Advance.
When using a single.model to regress multiple values, it is usually beneficial to preprocess the predictions to be in roughly the same range.
Look for example on the way detection models predict (regress) bounding box coordinates: values are scaled and the net predicts only corrections.

Anomaly detection using Bayesian networks

I am working on the problem of anomaly detection in multivariate time series data using Bayesian networks.
I am confused is it a good approach to use the dynamic Bayesian network model for anomaly detection? As using Bayesian approach one can only calculate the probability of occurrence of similarity of data on trained data.
Is there is any other approach to solve the same using other outlier score method?
'similar to trained data' would be considered as 'Normal' then,
1 - (probability of occurrence of similarity of data on trained data)
would be your anomaly score/probability. You can use many different methods for anomaly detection with simple anomaly score calculation design.
examples of uni-variable time series anomaly detection

similarity measure scikit-learn document classification

I am doing some work in document classification with scikit-learn. For this purpose, I represent my documents in a tf-idf matrix and feed a Random Forest classifier with this information, works perfectly well. I was just wondering which similarity measure is used by the classifier (cosine, euclidean, etc.) and how I can change it. Haven't found any parameters or informatin in the documentation.
Thanks in advance!
As with most supervised learning algorithms, Random Forest Classifiers do not use a similarity measure, they work directly on the feature supplied to them. So decision trees are built based on the terms in your tf-idf vectors.
If you want to use similarity then you will have to compute a similarity matrix for your documents and use this as your features.

What is class_weight parameter does in scikit-learn SGD

I am a frequent user of scikit-learn, I want some insights about the “class_ weight ” parameter with SGD.
I was able to figure out till the function call
plain_sgd(coef, intercept, est.loss_function,
penalty_type, alpha, C, est.l1_ratio,
dataset, n_iter, int(est.fit_intercept),
int(est.verbose), int(est.shuffle), est.random_state,
pos_weight, neg_weight,
learning_rate_type, est.eta0,
est.power_t, est.t_, intercept_decay)
After this it goes to sgd_fast and I am not very good with cpython. Can you give some celerity on these questions.
I am having a class biased in the dev set where positive class is somewhere 15k and negative class is 36k. does the class_weight will resolve this problem. Or doing undersampling will be a better idea. I am getting better numbers but it’s hard to explain.
If yes then how it actually does it. I mean is it applied on the features penalization or is it a weight to the optimization function. How I can explain this to layman ?
class_weight can indeed help increasing the ROC AUC or f1-score of a classification model trained on imbalanced data.
You can try class_weight="auto" to select weights that are inversely proportional to class frequencies. You can also try to pass your own weights has a python dictionary with class label as keys and weights as values.
Tuning the weights can be achieved via grid search with cross-validation.
Internally this is done by deriving sample_weight from the class_weight (depending on the class label of each sample). Sample weights are then used to scale the contribution of individual samples to the loss function used to trained the linear classification model with Stochastic Gradient Descent.
The feature penalization is controlled independently via the penalty and alpha hyperparameters. sample_weight / class_weight have no impact on it.

Conditional Random Fields

Is there a training and optimization algorithm for 2-D (two dimensional) conditional random fields (CRF) suited for classification of imagery?
Has anyone used CRF package in R (http://crf.r-forge.r-project.org/html/CRF-package.html) for image classification? I would like to have a view of a working example code.
Look up on Markov Random Fields. Here's a link to a paper you might be interested in: Patric Perez: Markov Random Fields and Images (1998).
I do not think it will work alone. Since image classification is about scaling and affine transformation, so the key feature for accurate image classification is preprocessing not classification algorithm.
classification of imagery usually involves bag of words and feature pooling and stuff, whereas conditional random field is for labeling sequential data. so it might not be appropriate to use crf in this scenario.