Anomaly detection using Bayesian networks - python-2.7

I am working on the problem of anomaly detection in multivariate time series data using Bayesian networks.
I am confused is it a good approach to use the dynamic Bayesian network model for anomaly detection? As using Bayesian approach one can only calculate the probability of occurrence of similarity of data on trained data.
Is there is any other approach to solve the same using other outlier score method?

'similar to trained data' would be considered as 'Normal' then,
1 - (probability of occurrence of similarity of data on trained data)
would be your anomaly score/probability. You can use many different methods for anomaly detection with simple anomaly score calculation design.
examples of uni-variable time series anomaly detection

Related

Dealing with imbalance dataset for multi-label classification

In my case, I’ve 33 labels per samples. The input label tensors for a corresponding image are like [0,0,1,0,1,1,1,0,0,0,0,0…...33]. And the samples for some labels are quite low and some are high. I'm looking for predict the regression values. So what will be the best approach to improve the prediction? I would like to apply data balancing technique. But so far I found the balancing technique available only for multi-class. I’m grateful to you if you share your best knowledge about regarding my problem or any other idea to improve the performance. Thanks in Advance.
When using a single.model to regress multiple values, it is usually beneficial to preprocess the predictions to be in roughly the same range.
Look for example on the way detection models predict (regress) bounding box coordinates: values are scaled and the net predicts only corrections.

Using AutoML to evaluate tha hyperparameters of the algorithm Word2Vec

Is it possible with AutoML (from H2O) to use only the Word2Vec algorithm and try out different values for the parameters to find out which parameter settings give me the most accurate vectors for my data set? So I don't want AutoML to apply the algorithms DeepLearning, GBM etc. to my dataset. Only the Word2Vec algorithm… How Do I do that?
So far I only managed to build a word2vec model with H2O.
I would like to test different Settings of the hyperparameters of Word2Vec with AutoML to evaluate which Settings are optimal...
The Word2Vec algorithm is a data transformation algorithm (converting rows of text to a matrix), not a supervised machine learning algorithm (which is what AutoML and all the algorithms inside of it do).
The typical way that Word2Vec is used is it apply Word2Vec to your text data so that your data can be used to train a supervised ML algorithm. From here you can run any supervised algorithm (GLM, Random Forest, GBM, etc) on this transformed dataset -- or my recommendation is to just pass the transformed data to AutoML, so it can find the best algorithm for you.
You will have to try out different settings for Word2Vec manually and see how well they do, given some particular supervised learning algorithm that you want to apply to your problem. Hopefully that clears up the confusion.

How can I send custom input to meta classifier of Stacking using weka api

From a research paper "In addition to stacking across all classifier outputs, we also evaluate stacking using only the aggregate output of each resampled (bagged) base classifier. For example, the outputs of all 10 SVM classifiers are averaged and used as a single level 0 input to the meta learner."
I am wondering how can I implement this. Actually I need to implement this for my thesis.
If you need only the average of the 10 classifiers you can add a voting classifier as one of the base classifiers for stacking. The voting classifier can use as many SVM classifiers as you need.
if you want to use the predictions of the SVM classifiers also as inputs for the stacking classifier, you can add SVM classifiers next to the voting classifiers (as base classifiers for stacking). However this would not be very effective.
Otherwise you can modify the code by yourself as Weka is an open source.

Comparison of HoG with CNN

I am working on the comparison of Histogram of oriented gradient (HoG) and Convolutional Neural Network (CNN) for the weed detection. I have two datasets of two different weeds.
CNN architecture is 3 layer network.
1) 1st dataset contains two classes and have 18 images.
The dataset is increased using data augmentation (rotation, adding noise, illumination changes)
Using the CNN I am getting a testing accuracy of 77% and for HoG with SVM 78%.
2) Second dataset contact leaves of two different plants.
each class contain 2500 images without data augmentation.
For this dataset, using CNN I am getting a test accuracy of 94% and for HoG with SVM 80%.
My question is Why I am getting higher accuracy for HoG using first dataset? CNN should be much better than HoG.
The only reason comes to my mind is the first data has only 18 images and less diverse as compare to the 2nd dataset. is it correct?
Yes, your intuition is right, having this small data set (just 18 images before data augmentation) can cause the worse performance. In general, for CNNs you usually need at least thousands of images. SVMs do not perform that bad because of the regularization (that you most probably use) and because of the probably much lower number of parameters the model has. There are ways how to regularize deep nets, e.g., with your first data set you might want to give dropout a try, but I would rather try to acquire a substantially larger data set.

similarity measure scikit-learn document classification

I am doing some work in document classification with scikit-learn. For this purpose, I represent my documents in a tf-idf matrix and feed a Random Forest classifier with this information, works perfectly well. I was just wondering which similarity measure is used by the classifier (cosine, euclidean, etc.) and how I can change it. Haven't found any parameters or informatin in the documentation.
Thanks in advance!
As with most supervised learning algorithms, Random Forest Classifiers do not use a similarity measure, they work directly on the feature supplied to them. So decision trees are built based on the terms in your tf-idf vectors.
If you want to use similarity then you will have to compute a similarity matrix for your documents and use this as your features.