Check sklearn version before loading model using joblib - python-2.7

I've followed this guide to save a machine learning model for later use. The model was dumped in one machine:
from sklearn.externals import joblib
joblib.dump(clf, 'model.pkl')
And when I loaded it joblib.load('model.pkl') in another machine, I got this warning:
UserWarning: Trying to unpickle estimator DecisionTreeClassifier from
version pre-0.18 when using version 0.18.1. This might lead to
breaking code or invalid results. Use at your own risk.
So is there any way to know the sklearn version of the saved model to compare it with the current version?

Versioning of pickled estimators was added in scikit-learn 0.18. Starting from v0.18, you can get the version of scikit-learn used to create the estimator with,
estimator.__getstate__()['_sklearn_version']
The warning you get is produced by the __setstate__ method of the estimator which is automatically called upon unpickling. It doesn't look like there is a straightforward way of getting this version without loading the estimator from disk. You can filter out the warning, with,
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
estimator = joblib.load('model.pkl')
For pre-0.18 versions, there is no such mechanism, but I imagine you could, for instance, use not hasattr(estimator, '__getstate') as a test to detect to, at least, pre-0.18 versions.

I have the same problem, just re-training datasets and save again the 'model.pkl' file with joblib.dump. This will be resolved. Good luck!

Related

Can't find pvlib.pvsystem.Array

I am using pvlib-python to model a series of photovoltaic installations.
I have been running the normal pvlib-python procedural code just fine (as described in the intro tutorial.
I am now trying to extend my model to be able to cope with several arrays of panels in different directions etc, but connected to the same inverter. For this I though the easiest way would be to use pvlib.pvsystem.Array to create a list of Array-objects that I can then pass to the pvlib.pvsytem.PVSystem class (as described here).
My issue now is that I can't find pvsystem.Array at all? eg I'm just getting:
AttributeError: module 'pvlib.pvsystem' has no attribute 'Array'
when I try to create an instance of Array using:
from pvlib import pvsystem
module_parameters = {'pdc0': 5000, 'gamma_pdc': -0.004}
array_one = pvsystem.Array(module_parameters=module_parameters)
array_two = pvsystem.Array(module_parameters=module_parameters)
system_two_arrays = pvsystem.PVSystem(arrays=[array_one, array_two],
inverter_parameters=inverter_parameters)
as described in the examples in the PVSystem and Arrays page.
I am using pvlib-python=0.8.1, installed in my conda env using conda install -c conda-forge pvlib-python.
I am quite confused about this since I can obviously see all the documentation on pvsystem.Array on read-the-docs and see the source code on pvlib's github.
When I look at the code in my conda env it doesn't have Array under pvsystem (or if I list it using dir(pvlib.pvsystem)), so it is something wrong with the installation, but I simply can't figure out what. I've tried installing pvlib again and using different installation but always the same issue.
Am I missing something really obvious here?
Kind regards and thank you,
This feature is not present in the current stable version (8.1). If you want to use it already you could download the latest source as a zip file and install it, or clone the pvlib git repository on your computer.

Building models and estimators in tf2.0 without tf.keras

Given that layers API has been deprecated, how do I build models in tf2 without using tf.keras (or what is the recommended way to build models)? Issue #30829 has the same question, but was closed without any answers.
Update:
I'm okay with using tf.keras.layers instead of tf.layers, but once I've built all the layers and I need to return the model, is there a way to NOT use keras model, compile, fit, predict and evaluate, and just do it the tensorflow's way?
If you were wondering why I would want to do something like that, it is that I would like to use estimators to train, rather than keras' fit function. There exists a keras_model_to_estimator, but it seems it's not mature enough yet
Google released migration guide from TF 1 to TF 2, section Converting Models.
Recommended way to build models
Guide (section "Models based on tf.layers") recommends to convert tf.layers models to tf.keras.layers:
The conversion was one-to-one because there is a direct mapping from v1.layers to tf.keras.layers.
Build models without Keras
The option is to provide own layer implementations (example from the guide):
W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")
#tf.function
def forward(x):
return W * x + b
out_a = forward([1,0])
print(out_a)
But it is worth to consider tf.keras.layers.Layer (example), which gives some degree of freedom, but integrate with rest of Keras (and it's layers).
Even with layers written with tf.keras, you are able to write own training loop (example).
To sum up, in practice TF 2.0 requires you to use tf.keras.

Import sklearn2pmml generated .pmml back into ScikitLearn or Python

Apologies if this may have been answered somewhere but I've been looking for about an hour and can't find a good answer.
I have a simple Logistic Regression model trained in Scikit-Learn that I'm exporting to a .pmml file.
from sklearn2pmml import PMMLPipeline, sklearn2pmml
my_pipeline = PMMLPipeline(
( classifier", LogisticRegression() )
)
my_pipeline.fit(blah blah)
sklearn2pmml(my_pipeline, "filename.pmml")
etc....
So what I'm wondering is if/how I can import this file back into Python (2.7 preferably) or Scikit-Learn to use as I would in Java/Scala. Something along the lines of
"import (filename.pmml) as pm
pm.predict(data)
Thanks for any help!
Scikit-learn does not offer support for importing PMML files, so what you're trying to achieve cannot be done I'm afraid.
The concept of using libraries such as sklearn2pmml is really to extend the functionality that sklearn does not have when it comes to supporting the model export to a PMML format.
Typically, those who use sklearn2pmml are really looking to re-use the PMML models in other platforms (e.g. IBM's SPSS, Apache Spark ML, Weka or any other consumer as listed in the Data Mining Group's website).
If you're looking to save a model created with scikit-learn and re-use it afterwards with scikit-learn as well then you should explore its native persistence model mechanism named Pickle, which uses a binary data format.
You can read more about how to save/load models in Pickle format (together with its known issues) here.
I created a simple solution to generate sklearn kmeans models from pmml files which i exported from knime analytics platform. You can check it out pmml2sklearn
You could use PyPMML to make predictions on a new dataset using PMML in Python, for example:
from pypmml import Model
model = Model.fromFile('the/pmml/file/path')
result = model.predict(data)
The data could be dict, json, Series or DataFrame of Pandas.
I believe you can Import/Export a pmml file with python. After you load back your model you can predict again with out any problem. However output file formats can differ, like 1d array, or nxn panda tables etc.
from sklearn2pmml import make_pmml_pipeline, sklearn2pmml
from pypmml import Model
#Extract as pmml
yourModelPipeline = make_pmml_pipeline(yourModelObjectGoesHere)
sklearn2pmml(yourModelPipeline, "yourModel.pmml")
#Load from pmml
yourModelLoaded = Model.fromFile('yourModel.pmml')
prediction = yourModelLoaded.predict(yourPredictionDataSet)
Lastly reproducing result make take long time, don't let it discourage you :). I would like to share developers comment about the issue: https://github.com/autodeployai/pypmml/issues/53

Convert Keras model to TensorFlow protobuf

We're currently training various neural networks using Keras, which is ideal because it has a nice interface and is relatively easy to use, but we'd like to be able to apply them in our production environment.
Unfortunately the production environment is C++, so our plan is to:
Use the TensorFlow backend to save the model to a protobuf
Link our production code to TensorFlow, and then load in the protobuf
Unfortunately I don't know how to access the TensorFlow saving utilities from Keras, which normally saves to HDF5 and JSON. How do I save to protobuf?
In case you don't need to utilize a GPU in the environment you are deploying to, you could also use my library, called frugally-deep. It is available on GitHub and published under the MIT License: https://github.com/Dobiasd/frugally-deep
frugally-deep allows running forward passes on already-trained Keras models directly in C++ without the need to link against TensorFlow or any other backend.
This seems to be answered in "Keras as a simplified interface to TensorFlow: tutorial", posted on The Keras Blog by Francois Chollet.
In particular, section II, "Using Keras models with TensorFlow".
You can access TensorFlow backend by:
import keras.backend.tensorflow_backend as K
Then you can call any TensorFlow utility or function like:
K.tf.ConfigProto
Save your keras model as an HDF5 file.
You can then do the conversion with the following code:
from keras import backend as K
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
weight_file_path = 'path to your keras model'
net_model = load_model(weight_file_path)
sess = K.get_session()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), 'name of the output tensor')
graph_io.write_graph(constant_graph, 'output_folder_path', 'output.pb', as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join('output_folder_path', 'output.pb'))
Here is my sample code which handles multiple input and multiple output cases:
https://github.com/amir-abdi/keras_to_tensorflow
Make sure you change the learning phase of keras backend to store proper values of the layers (like dropout or batch normalization). Here is a discussion about it.

Weka - Measuring testing time

I'm using Weka 3.6.8 to carry out some machine learning and I'm want to find the 'time taken to test model on training/testing data'. When I test a predictive model on evaluation data, this parameter seems to be missing. Has this feature been removed from Weka or is it just a setting I'm missing? All I seem to be able to find is the time taken to build the actual predictive model. (I've also checked the Weka Manual but can't find anything)
Thanks in advance
That feature was added to 3.7.7, you need to upgrade. You should be able to get this data by running the test on the command line with the -T parameter.