Import sklearn2pmml generated .pmml back into ScikitLearn or Python - python-2.7

Apologies if this may have been answered somewhere but I've been looking for about an hour and can't find a good answer.
I have a simple Logistic Regression model trained in Scikit-Learn that I'm exporting to a .pmml file.
from sklearn2pmml import PMMLPipeline, sklearn2pmml
my_pipeline = PMMLPipeline(
( classifier", LogisticRegression() )
)
my_pipeline.fit(blah blah)
sklearn2pmml(my_pipeline, "filename.pmml")
etc....
So what I'm wondering is if/how I can import this file back into Python (2.7 preferably) or Scikit-Learn to use as I would in Java/Scala. Something along the lines of
"import (filename.pmml) as pm
pm.predict(data)
Thanks for any help!

Scikit-learn does not offer support for importing PMML files, so what you're trying to achieve cannot be done I'm afraid.
The concept of using libraries such as sklearn2pmml is really to extend the functionality that sklearn does not have when it comes to supporting the model export to a PMML format.
Typically, those who use sklearn2pmml are really looking to re-use the PMML models in other platforms (e.g. IBM's SPSS, Apache Spark ML, Weka or any other consumer as listed in the Data Mining Group's website).
If you're looking to save a model created with scikit-learn and re-use it afterwards with scikit-learn as well then you should explore its native persistence model mechanism named Pickle, which uses a binary data format.
You can read more about how to save/load models in Pickle format (together with its known issues) here.

I created a simple solution to generate sklearn kmeans models from pmml files which i exported from knime analytics platform. You can check it out pmml2sklearn

You could use PyPMML to make predictions on a new dataset using PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.

I believe you can Import/Export a pmml file with python. After you load back your model you can predict again with out any problem. However output file formats can differ, like 1d array, or nxn panda tables etc.
from sklearn2pmml import make_pmml_pipeline, sklearn2pmml
from pypmml import Model
#Extract as pmml
yourModelPipeline = make_pmml_pipeline(yourModelObjectGoesHere)
sklearn2pmml(yourModelPipeline, "yourModel.pmml")
#Load from pmml
yourModelLoaded = Model.fromFile('yourModel.pmml')
prediction = yourModelLoaded.predict(yourPredictionDataSet)
Lastly reproducing result make take long time, don't let it discourage you :). I would like to share developers comment about the issue: https://github.com/autodeployai/pypmml/issues/53

Related

How can I import a meta graph with the tensorflow c++ API?

I've found a few resources for how to import a model with Tensorflow c++ after exporting it to a .pb file, but my understanding is that the .pb file method has been replaced with a newer method which uses the tf.Saver.save method to produce a .meta, .index, .data-00000-of-00001, and a checkpoint file. I cannot find anything on how to import a model from these file types with the C++ API.
How can I do this?
I use TFLearn wrapper on the top of Tensorflow, but the process should be identical with plain Tensorflow models. You can save a checkpoint from TFLearn in this way correctly, but you can also freeze if you want to use a .pb model. It is possible to load both a checkpoint or a model file in C++. In any case, the inference part is identical in C++.

Check sklearn version before loading model using joblib

I've followed this guide to save a machine learning model for later use. The model was dumped in one machine:
from sklearn.externals import joblib
joblib.dump(clf, 'model.pkl')
And when I loaded it joblib.load('model.pkl') in another machine, I got this warning:
UserWarning: Trying to unpickle estimator DecisionTreeClassifier from
version pre-0.18 when using version 0.18.1. This might lead to
breaking code or invalid results. Use at your own risk.
So is there any way to know the sklearn version of the saved model to compare it with the current version?
Versioning of pickled estimators was added in scikit-learn 0.18. Starting from v0.18, you can get the version of scikit-learn used to create the estimator with,
estimator.__getstate__()['_sklearn_version']
The warning you get is produced by the __setstate__ method of the estimator which is automatically called upon unpickling. It doesn't look like there is a straightforward way of getting this version without loading the estimator from disk. You can filter out the warning, with,
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
estimator = joblib.load('model.pkl')
For pre-0.18 versions, there is no such mechanism, but I imagine you could, for instance, use not hasattr(estimator, '__getstate') as a test to detect to, at least, pre-0.18 versions.
I have the same problem, just re-training datasets and save again the 'model.pkl' file with joblib.dump. This will be resolved. Good luck!

Convert Keras model to TensorFlow protobuf

We're currently training various neural networks using Keras, which is ideal because it has a nice interface and is relatively easy to use, but we'd like to be able to apply them in our production environment.
Unfortunately the production environment is C++, so our plan is to:
Use the TensorFlow backend to save the model to a protobuf
Link our production code to TensorFlow, and then load in the protobuf
Unfortunately I don't know how to access the TensorFlow saving utilities from Keras, which normally saves to HDF5 and JSON. How do I save to protobuf?
In case you don't need to utilize a GPU in the environment you are deploying to, you could also use my library, called frugally-deep. It is available on GitHub and published under the MIT License: https://github.com/Dobiasd/frugally-deep
frugally-deep allows running forward passes on already-trained Keras models directly in C++ without the need to link against TensorFlow or any other backend.
This seems to be answered in "Keras as a simplified interface to TensorFlow: tutorial", posted on The Keras Blog by Francois Chollet.
In particular, section II, "Using Keras models with TensorFlow".
You can access TensorFlow backend by:
import keras.backend.tensorflow_backend as K
Then you can call any TensorFlow utility or function like:
K.tf.ConfigProto
Save your keras model as an HDF5 file.
You can then do the conversion with the following code:
from keras import backend as K
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
weight_file_path = 'path to your keras model'
net_model = load_model(weight_file_path)
sess = K.get_session()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), 'name of the output tensor')
graph_io.write_graph(constant_graph, 'output_folder_path', 'output.pb', as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join('output_folder_path', 'output.pb'))
Here is my sample code which handles multiple input and multiple output cases:
https://github.com/amir-abdi/keras_to_tensorflow
Make sure you change the learning phase of keras backend to store proper values of the layers (like dropout or batch normalization). Here is a discussion about it.

Generating PMML in WEKA

Can someone tell me how to download “WekaScoring” pluggin to WEKA? the link so far I could find was
http://wiki.pentaho.com/display/EAI/List+of+Available+Pentaho+Data+Integration+Plug-In but this is not accessible.
What I need is I need to generate a PMML model for WEKA. Is it possible in WEKA?
What are the current limitations of Weka's PMML support?
Only PMML Regression, GeneralRegression, NeuralNetwork, TreeModel, RuleSetModel and SupportVectorMachineModel are implemented so far. GeneralRegression supports a single Predictor-to-Parameter matrix (i.e. in the case of classification, each target class value shares the same PPMatrix). Aggregate and MapValues expressions are not supported yet. The first six of the eleven PMML built-in functions are supported so far. There is no support for exporting PMML models from Weka yet.
http://wiki.pentaho.com/display/DATAMINING/PMML+Support+in+Weka

Import Adobe Illustrator AI files to vector format structure

first of all: I do not want to implement AI import functionality based on AI file format specification for my own. Second: Adobe Illustrator is not installed on my target system, so I can not use it's programming interface.
But I want to import AI files into some CAD-like application to access the vector data out of this file afterwards (can be any kind of data structure, converting it to my own format is not a problem). How can this be done? Is there a library or something like this available which provides related functionality?
Thanks!