I create a tensorflow model which I would like to save to file so that I can predict against it later. In particular, I need to save the:
input_placeholder
(= tf.placeholder(tf.float32, [None, iVariableLen]))
solution_space
(= tf.nn.sigmoid(tf.matmul(input_placeholder, weight_variable) + bias_variable))
session
(= tf.Session())
I've tried using pickle which works on other objects like sklearn binarizers etc, but not on the above, for which I get the error at the bottom.
How I pickle:
import pickle
with open(sModelSavePath, 'w') as fiModel:
pickle.dump(dModel, fiModel)
where dModel is a dictionary that contains all the objects I want to persist, which I use for fitting against.
Any suggestions on how to pickle tensorflow objects?
Error message:
pickle.dump(dModel, fiModel)
...
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle module objects
The way I solved this was by pickleing Sklearn objects like binarizers, and using tensorflow's inbuilt save functions for the actual model:
Saving tensorflow model:
1) Build the model as you usually would
2) Save the session with tf.train.Saver(). For example:
oSaver = tf.train.Saver()
oSess = oSession
oSaver.save(oSess, sModelPath) #filename ends with .ckpt
3) This saves all available variables etc in that session to their variable names.
Loading tensorflow model:
1) The entire flow needs to be re-initialized. In other words, variables, weights, bias, loss function etc need to be declared, and then initialized with tf.initialize_all_variables() being passed into oSession.run()
2) That session now needs to be passed to the loader. I abstracted the flow, so my loader looks like this:
dAlg = tf_training_algorithm() #defines variables etc and initializes session
oSaver = tf.train.Saver()
oSaver.restore(dAlg['oSess'], sModelPath)
return {
'oSess': dAlg['oSess'],
#the other stuff I need from my algorithm, like my solution space etc
}
3) All objects you need for prediction need to be gotten out of your initialisation, which in my case sit in dAlg
PS: Pickle like this:
with open(sSavePathFilename, 'w') as fiModel:
pickle.dump(dModel, fiModel)
with open(sFilename, 'r') as fiModel:
dModel = pickle.load(fiModel)
You should save your project into two separate parts, one is for tensorflow's objects, another is for other objects. I recommend you to use the following tools:
tf.saved_model: the procedures your want to saved and load tensorflow all in it.
dill: a more powerful pickle tool based on pickle, it can help you bypass most errors encountered by pickle
Related
Pardon me if I use the wrong terminology but what I want is to train a set of data (using GaussianNB Naive Bayes from Scikit Learn), save the model/classifier and then load it whenever I need and predict a category.
from sklearn.externals import joblib
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer
self.vectorizer = TfidfVectorizer(decode_error='ignore')
self.X_train_tfidf = self.vectorizer.fit_transform(train_data)
# Fit the model to my training data
self.clf = self.gnb.fit(self.X_train_tfidf.toarray(), category)
# Save the classifier to file
joblib.dump(self.clf, 'trained/NB_Model.pkl')
# Save the vocabulary to file
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
#Next time, I read the saved classifier
self.clf = joblib.load('trained/NB_Model.pkl')
# Read the saved vocabulary
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
# Initializer the vectorizer
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
# Try to predict a category for new data
X_new_tfidf = self.vectorizer.transform(new_data)
print self.clf.predict(X_new_tfidf.toarray())
# After running the predict command above, I get the error
'idf vector is not fitted'
Can anyone tell me what I'm missing?
Note: The saving of the model, the reading of the saved model and trying to predict a new category are all different methods of a class. I have collapsed all of them into a single screen here to make for easier reading.
Thanks
You need to pickle the self.vectorizer and load it again. Currently you are only saving the vocabulary learnt by the vectorizer.
Change the following line in your program:
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
to:
joblib.dump(self.vectorizer, 'trained/vectorizer.pkl')
And the following line:
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
to:
self.vectorizer =joblib.load('trained/vectorizer.pkl')
Delete this line:
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
Problem explanation:
You are correct in your thinking to just save the vocabulary learnt and reuse it. But the scikit-learn TfidfVectorizer also has the idf_ attribute which contains the IDF of the saved vocabulary. So you need to save that also. But even if you save both and load them both in a new TfidfVectorizer instance, then also you will get the "not_fitted" error. Because thats just the way most of the scikit transformers and estimators are defined. So without doing anything "hacky" saving the whole vectorizer is your best bet. If you still want to go onto the saving the vocabulary path, then please take a look here to how to properly do that:
http://thiagomarzagao.com/2015/12/08/saving-TfidfVectorizer-without-pickles/
The above page saves vocabulary into json and idf_ into a simple array. You can use pickles there, but you will get the idea about the working of TfidfVectorizer.
Hope it helps.
Pyomo solver invocation can be achieved by command line usage or from a Python script.
How does the command line call with the summary flag
pyomo solve model.py input.dat --solver=glpk --summary
translate to e.g. the usage of a SolverFactory class in a Python script?
Specifically, in the following example, how can one specify a summary option? Is it an (undocumented?) argument to SolverFactory.solve?
from pyomo.opt import SolverFactory
import pyomo.environ
from model import model
opt = SolverFactory('glpk')
instance = model.create_instance('input.dat')
results = opt.solve(instance)
The --summary option is specific to the pyomo command. It is not a solver option. I believe all it really does is execute the line
pyomo.environ.display(instance)
after the solve, which you can easily add to your script. A more direct way of querying the solution is just to access the value of model variables or the objective by "evaluating" them. E.g.,
instance.some_objective()
instance.some_variable()
instance.some_indexed_variable[0]()
or
pyomo.environ.value(instance.some_objective)
pyomo.environ.value(instance.some_variable)
pyomo.environ.value(instance.some_indexed_variable)
I prefer the former, but the latter is more appropriate if you are accessing the values of immutable, indexed Param objects. Also, note that variables have a .value attribute that you can access directly (and update if you want to provide a warmstart).
Per default the --summary command option stores a 'result' file in json format into the directory of your model.
You can achieve the same result by adding the following to your code:
results = opt.solve(instance, load_solutions=True)
results.write(filename='results.json', format='json')
or:
results = opt.solve(instance)
instance.solutions.store_to(results)
results.write(filename='results.json', format='json')
An object has been serialized by pickle, and it will be used by a model which would be placed at anyplace (under any directory). Since the object is frequently used and kind of a part of the model, I want to have the model contain the pickle file (place the file under a directory of the model) as a variable.
./data/constant.py
object = pickle.load(open('object.pkl'))
./data/object.pkl
./code/model01.py
from ..data import constant
# or
# from __future import absolute_import
# from model.data import constant
object = constant.object
./code/model02.py
from ..data import constant
object = constant.object
The problem is obviously that python will search object.pkl under ./code/(and anywhere I use the function of the model outside of the model) rather than ./data/.
Am I doing it right? Any better solutions? Thanks.
I think this question may be duplicated (this is a very common issue) but I cannot find any related archive here. If so, please help redirect me there.
Doing a little bit of path manipulation should work:
In module constant.py:
import os
path = os.path.dirname(os.path.abspath(__file__))
obj = pickle.load(open(os.path.join(path, 'object.pkl')))
Looks like you want object to be part of the module constant.
One way would be just putting the pickled object in constant.py:
my_object = pickle.loads(pickled_object) # don't use the name `object` it is a built-in
Note the s in loads.
pickled_object needs to be placed inside constant.py before the line shown above. It has to be a byte string.
You can create it either directly from the object:
pickled_object = pickle.dumps(obj)
or take it form the pickled file and past it in.
Example
Pickle your object:
>>> import pickle
>>> obj = [1, 2, 3]
>>> pickle.dumps(obj)
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
Now, in constant.py:
pickled_object = b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
my_object = pickle.loads(pickled_object)
As a result, my_object is [1, 2, 3] and can be accessed via constant.my_object
Can't find a direct, head on answer to this. Is there a way to access a tempfile in Django across 2 distinct views? Say I have the following code:
view#1(request):
temp = tempfile.NamedTemporaryFile()
write_book.save(temp_file)
temp_file_name = temp_file.name
print temp_file_name
request.session['output_file_name'] = temp_file_name
request.session.modified = True
return #something or other
view#2(request):
temp_file_name = request.session['output_file_name']
temp_file = open(str(temp_file_name))
#do something with 'temp_file' here
My problem comes in specifically on view#2, the 2nd line "open(temp_file_name)". Django complains this file/pathway doesn't exist, which is consistent of my understanding of the tempfile module (that the file is 'hidden' and only available to Django).
Is there a way for me to access this file? In case it matters, I ONLY need to read from it (technically serve it for download).
I'd think of this as how to access a NamedTemporaryFile across different requests, rather than different views. Looking at this documentation on NamedTemporaryFile, it says that the file can be opened across the same process, but not necessarily across multiple processes. Perhaps your other view is being called in a different Django process.
My suggestion would be to abandon the use of NamedTemporaryFile and instead just write it as a permanent file, then delete the file in the other view.
Thanks seddonym for attempting to answer. My partner clarified this for me...seddonym is correct for the Django version of NamedTemporaryFile. By calling the python version (sorry, don't have enough cred to post hyperlinks. Stupid rule) you CAN access across requests.
The trick is setting the delete=False parameter, and closing the file before 'returning' at the end of the request. Then, in the subsequent request, just open(file_name). Psuedo code below:
>>> import tempfile
>>> file = tempfile.NamedTemporaryFile(delete=False)
>>> file.name
'c:\\users\\(blah)\(blah)\(blah)\\temp\\tmp9drcz9'
>>> file.close()
>>> file
<closed file '<fdopen>', mode 'w+b' at 0x00EF5390>
>>> f = open(file.name)
>>> f
<open file 'c:\users\ymalik\appdata\local\temp\tmp9drcz9', mode 'r' at 0x0278C128>
This is, of course, done in the console, but it works in django as well.
I have a list including 4000 elements in python which each of its elements is an object of following class with several values.
class Point:
def __init__(self):
self.coords = []
self.IP=[]
self.BW=20
self.status='M'
def __repr__(self):
return str(self.coords)
I do not know how to save this list for future uses.
I have tried to save it by open a file and write() function, but this is not what I want.
I want to save it and import it in next program, like what we do in MATLAB that we can save a variable and import it in future
pickle is a good choice:
import pickle
with open("output.bin", "wb") as output:
pickle.dump(yourList, output)
and symmetric:
import pickle
with open("output.bin", "rb") as data:
yourList = pickle.load(data)
It is a good choice because it is included with the standard library, it can serialize almost any Python object without effort and has a good implementation, although the output is not human readable. Please note that you should use pickle only for your personal scripts, since it will happily load anything it receives, including malicious code: I would not recommend it for production or released projects.
This might be an option:
f = open('foo', 'wb')
np.save(f, my_list)
for loading then use
data = np.load(open('foo'))
if 'b' is not present in 'wb' then the program gives an error:
TypeError: write() argument must be str, not bytes
"b" for binary makes the difference.
Since you say Matlab, numpy should be an option.
f = open('foo', 'w')
np.save(f, my_list)
# later
data = np.load(open('foo'))
Of course, it'll return an array, not a list, but you can coerce it if you really want an array...