Generating PMML in WEKA - weka

Can someone tell me how to download “WekaScoring” pluggin to WEKA? the link so far I could find was
http://wiki.pentaho.com/display/EAI/List+of+Available+Pentaho+Data+Integration+Plug-In but this is not accessible.
What I need is I need to generate a PMML model for WEKA. Is it possible in WEKA?

What are the current limitations of Weka's PMML support?
Only PMML Regression, GeneralRegression, NeuralNetwork, TreeModel, RuleSetModel and SupportVectorMachineModel are implemented so far. GeneralRegression supports a single Predictor-to-Parameter matrix (i.e. in the case of classification, each target class value shares the same PPMatrix). Aggregate and MapValues expressions are not supported yet. The first six of the eleven PMML built-in functions are supported so far. There is no support for exporting PMML models from Weka yet.
http://wiki.pentaho.com/display/DATAMINING/PMML+Support+in+Weka

Related

Import sklearn2pmml generated .pmml back into ScikitLearn or Python

Apologies if this may have been answered somewhere but I've been looking for about an hour and can't find a good answer.
I have a simple Logistic Regression model trained in Scikit-Learn that I'm exporting to a .pmml file.
from sklearn2pmml import PMMLPipeline, sklearn2pmml
my_pipeline = PMMLPipeline(
( classifier", LogisticRegression() )
)
my_pipeline.fit(blah blah)
sklearn2pmml(my_pipeline, "filename.pmml")
etc....
So what I'm wondering is if/how I can import this file back into Python (2.7 preferably) or Scikit-Learn to use as I would in Java/Scala. Something along the lines of
"import (filename.pmml) as pm
pm.predict(data)
Thanks for any help!
Scikit-learn does not offer support for importing PMML files, so what you're trying to achieve cannot be done I'm afraid.
The concept of using libraries such as sklearn2pmml is really to extend the functionality that sklearn does not have when it comes to supporting the model export to a PMML format.
Typically, those who use sklearn2pmml are really looking to re-use the PMML models in other platforms (e.g. IBM's SPSS, Apache Spark ML, Weka or any other consumer as listed in the Data Mining Group's website).
If you're looking to save a model created with scikit-learn and re-use it afterwards with scikit-learn as well then you should explore its native persistence model mechanism named Pickle, which uses a binary data format.
You can read more about how to save/load models in Pickle format (together with its known issues) here.
I created a simple solution to generate sklearn kmeans models from pmml files which i exported from knime analytics platform. You can check it out pmml2sklearn
You could use PyPMML to make predictions on a new dataset using PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.
I believe you can Import/Export a pmml file with python. After you load back your model you can predict again with out any problem. However output file formats can differ, like 1d array, or nxn panda tables etc.
from sklearn2pmml import make_pmml_pipeline, sklearn2pmml
from pypmml import Model
#Extract as pmml
yourModelPipeline = make_pmml_pipeline(yourModelObjectGoesHere)
sklearn2pmml(yourModelPipeline, "yourModel.pmml")
#Load from pmml
yourModelLoaded = Model.fromFile('yourModel.pmml')
prediction = yourModelLoaded.predict(yourPredictionDataSet)
Lastly reproducing result make take long time, don't let it discourage you :). I would like to share developers comment about the issue: https://github.com/autodeployai/pypmml/issues/53

Can GraphEngine support RDF ?

Does GraphEngine support RDF and SPARQL, as described in the paper:A Distributed Graph Engine for Web Scale RDF Data : https://www.graphengine.io/downloads/papers/Trinity.RDF.pdf
If not, could it be implemented on top of the engine, or is it in the roadmap?
Please take a look at our sample code for hosting Freebase:
https://github.com/Microsoft/GraphEngine/tree/master/samples/freebase-likq
The implementation is in src/LIKQ, and samples/freebase-likq provides an example of integrating an index service, multi-typed entity adapters with the LIKQ module.
The freebase dataset is imported as a Trinity image via samples/GraphEngine.DataImporter (currently in the experimental branch). It scans the data twice, first round to decide the data types for the entities and generates TSL storage layout schema, and the second round for the actual import work.

Extracting MatConvnet model weights

I am currently developing an application for facial recognition.
The algorithms are implemented and trained using the MatConvnet library (http://www.vlfeat.org/matconvnet/). At the end, I have a Network (.mat file) which looks like that:
I would like to know if it were possible to extract the weights of the Network using its .mat file, write them in a XML file and read them with Caffe C++. I would like to reuse them in Caffe C++ in order to do some testing and hardware implementation. Is there an efficient and practical way to proceed so ?
Thank you for very much for your help.
The layer whose parameters you'd like to store, must be set as 'precious'. In net.var you can access the parameters and write them.
There is a conversion script that converts matconvnet models to caffe models here which you may find useful.
You can't use weights of the trained Network by matconvnet for caffe. You can merely import your model from matconvnet to caffe.(https://github.com/vlfeat/matconvnet/blob/4ce2871ec55f0d7deed1683eb5bd77a8a19a50cd/utils/import-caffe.py). But this script does not support all layers and you may have difficulties in employing it.
The best way is to define your caffe prototxt in python as the matconvnet model.

weka SVM multi class classifier

I understand that weka use a 1 to 1 approach in terms of SVM. However, i would like to classify documents and i have 10 class labels.
Is it possible to change the parameters to change it to a 1 vs rest approach instead.
How should i actually go about doing it.
The official site http://weka.wikispaces.com/LibSVM does not help much
Other classification methods such as naive bayes have been tried but i would like to compare the results against SVM methods
LIBSVM also allows multi-label classification. You can find here examples of implementation.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multilabel/
You can also search papers that using LibSVM with non-binary datasets.
i.e. http://link.springer.com/article/10.1007/s00521-011-0793-1
Anyway another variant to use LibSVM is SMO WEKA library.
In Weka Version 3.8, the multi-class meta classifier can be used. It also has options including 1-against-1 and 1-against-all multi-classification methods.

NLTK wrapper for Weka to build a classifier

I'm building a Named Entity classifier with nltk and I have my focus on location retrieval (of any type, from countries to museums, restaurants or roads). I'm trying to vary featuresets and methods I use.
For now, I've used NLTK's built-in Maxent, NaiveBayes, PositiveNaiveBayes, DecisionTrees and SVM. I'm using 40 different combinations of featuresets.
Maxent seems to be the best, but it's too slow. nltk's SVM is for binary classification and I had some issues with pickling the final classifier. Then I tried nltk's wrapper for scikit-learn SVM, but it didn't accept my inputs, I tried to adapt but had some float coercion problem.
Now, I'm considering to use nltk's wrapper for Weka, but I don't know if it could give me some extremely different result worthy to try and don't have to much time. My question is, what advantages Weka has over nltk's built-in classifiers?