WEKA: Can i combine two machine learning trained models into one model? - weka

I want to merge two machine learning models which are trained with two different data sets. How can I merge those two models into one instead of building a model by combining those two data sets using weka java library?
Usage: I'm splitting my whole data set (12 million) across the cluster and building individual models (to decrease the training time). So I want to get finally one single model by combining those all models. Is it possible?

You can combine multiple classifiers by Vote classifier.
If you want to code it yourself do something like:
double prediction1 = classifier1.classifyInstance(ins);
double prediction2 = classifier2.classifyInstance(ins);
// use your logic for combining predictions
double combinedPrediction = combinePredictions(prediction1, prediction2);
Also check https://machinelearningmastery.com/use-ensemble-machine-learning-algorithms-weka/

Related

Python Tensorflow Summarize Two Histograms Together

I was wondering if there is a way to summarize two histograms in tensorflow together and get something resembling the behavior of tf.summary.histogram. The reason is that I need to summarize batch logits for two different operations and I need the histograms to be "superimposed" so that I can compare their dynamics with respect to one another during training by only looking at the log file using Tensorboard.

Training and Test Set in Weka InCompatible in Text Classification

I have two datasets regarding whether a sentence contains a mention of a drug adverse event or not, both the training and test set have only two fields the text and the labels{Adverse Event, No Adverse Event} I have used weka with the stringtoWordVector filter to build a model using Random Forest on the training set.
I want to test the model built with removing the class labels from the test data set, applying the StringToWordVector filter on it and testing the model with it. When I try to do that it gives me the error saying training and test set not compatible probably because the filter identifies a different set of attributes for the test dataset. How do I fix this and output the predictions for the test set.
The easiest way to do this for a one off test is not to pre-filter the training set, but to use Weka's FilteredClassifier and configure it with the StringToWordVector filter, and your chosen classifier to do the classification. This is explained well in this video from the More Data Mining with Weka online course.
For a more general solution, if you want to build the model once then evaluate it on different test sets in future, you need to use InputMappedClassifier:
Wrapper classifier that addresses incompatible training and test data
by building a mapping between the training data that a classifier has
been built with and the incoming test instances' structure. Model
attributes that are not found in the incoming instances receive
missing values, so do incoming nominal attribute values that the
classifier has not seen before. A new classifier can be trained or an
existing one loaded from a file.
Weka requires a label even for the test data. It uses the labels or „ground truth“ of the test data to compare the result of the model against it and measure the model performance. How would you tell whether a model is performing well, if you don‘t know whether its predictions are right or wrong. Thus, the test data needs to have the very same structure as the training data in WEKA, including the labels. No worries, the labels are not used to help the model with its predictions.
The best way to go is to select cross validation (e.g. 10 fold cross validation) which automatically will split your data into 10 parts, using 9 for training and the remaining 1 for testing. This procedure is repeated 10 times so that each of the 10 parts has once been used as test data. The final performance verdict will be an average of all 10 rounds. Cross validation gives you a quite realistic estimate of the model performance on new, unseen data.
What you were trying to do, namely using the exact same data for training and testing is a bad idea, because the measured performance you end up with is way too optimistic. This means, you‘ll get very impressive figures like 98% accuracy during testing - but as soon as you use the model against new unseen data your accuracy might drop to a much worse level.

Can we give the test data, without labelling them?

I came across this snippet in the Tensorflow documentation, MNIST For ML Beginners.
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)
Now, I want to feed my own test images, without labelling them and would like the model to predict the labels, how do I achieve this?
Yes you can, but it would not be deep learning instead it would be clustering. ( Ex: K means Clustering )
Basic idea is like the following:
Create two placeholders for input and centroids
Decide a distance metric
Create graph
feed only dataset to run the graph

How do I use an MXNet model in C++?

After I have trained a model, how do I use it with C++?
I have tried MXNet incubator-mxnet/example/image-classification/predict-cpp/
and incubator-mxnet/cpp-package/example/.
As part of training you should periodically evaluate your model against a validation set, at the end of each epoch for example. You should then have a good idea of the expected accuracy of the model when using the model to score new data, to determine if the model is really performing worse than expected at inference time.
If the validation accuracy of the model while training the model is no better than random (i.e. 1/number of classes), there could be many reasons for this including; poor model selection, incorrect loss calculation, wrong optimization technique and hyperparameters (e.g. learning rate).
If the test accuracy of the model on unseen data is poor, you might be trying to apply the model to a different domain to which it was trained. You can't use a model trained on handwritten characters (e.g. MNIST) to classify real world objects (e.g. ImageNet).
If you need a C++ example of model training, take a look at this tutorial.

Simple prediction model for multiple features

I am new in prediction models. I am currently using python2.7 and sklearn. I would like to know a simple model to combine many features to predict one target.
To make it more clear. Lets say I have 4 arrays of size 10: A,B,C,Y. I would like to use the values of A,B,C to predict the values of Y.
Thank you