Python 'rawpy._rawpy.RawPy' object has no attribute 'imread' after second pass - python-2.7

I try to process a series of DNG raw picture files and it all works well for the first pass (first fils). When I try to read the second DNG file during the second pass through the for-next loop, I receive the error message 'rawpy._rawpy.RawPy' object has no attribute 'imread' when executing the line "with raw.imread(file) as raw:".
import numpy as np
import rawpy as raw
import pyexiv2
from scipy import stats
for file in list:
metadata = pyexiv2.ImageMetadata(file)
metadata.read()
with raw.imread(file) as raw:
rgb16 = raw.postprocess(gamma=(1,1), no_auto_bright=True, output_bps=16)
avgR=stats.describe(np.ravel(rgb16[:,:,0]))[2]
avgG=stats.describe(np.ravel(rgb16[:,:,1]))[2]
avgB=stats.describe(np.ravel(rgb16[:,:,2]))[2]
print i,file,'T=', metadata['Exif.PentaxDng.Temperature'].raw_value,'C',avgR,avgG,avgB
i+=1
I tried already to close the raw object but from googling I understand that is not necessary when a context manager is used.
Help or suggestions are very welcome.
Thanks in advance.

You're overwriting your alias of the rawpy module (raw) with the image you're reading. That means you'll get an error on the second pass through the loop.
import rawpy as raw # here's the first thing named "raw"
#...
for file in list:
#...
with raw.imread(file) as raw: # here's the second
#...
Pick a different name for one of the variables and your code should work.

Related

how to Looping python scripts together

I have two files. An 'initialization' script aka file1.py and a 'main code' script aka 'main_code.py'. The main_code.py is really a several hundred line .ipynb that was converted to a .py file. I want to run the same skeleton of the code with the only adjustment being to pass in the different parameters found in the 'file1.py' script.
In reality, it is much more complex than what I have laid out below with more references to other locations / DBs and what not.
However, I receive errors such as 'each_item[0]' is not defined. I can't seem to be able to pass in the values/variables that come from my loop in file1.py to my script that is contained inside the loop.
Must be doing something very obviously wrong as I imagine this is a simple fix
file1.py:
import pandas as pd
import os
import bumpy as np
import jaydebeapi as jd
#etc...
cities = ['NYC','LA','DC','MIA'] # really comes from a query/column
value_list = [5,500,5000,300] # comes from a query/column
zipped = list(zip(cities,value_list)) # make tuples
for each_item in zipped:
os.system('python loop_file.py')
# where I'm getting errors.
main_code.py:
names = each_item[0]
value = each_item[1]
# lots of things going on here in real code but for simplicity...
print value = value * 4
print value

read text file content with python at zapier

I have problems getting the content of a txt-file into a Zapier
object using https://zapier.com/help/code-python/. Here is the code I am
using:
with open('file', 'r') as content_file:
content = content_file.read()
I'd be glad if you could help me with this. Thanks for that!
David here, from the Zapier Platform team.
Your code as written doesn't work because the first argument for the open function is the filepath. There's no file at the path 'file', so you'll get an error. You access the input via the input_data dictionary.
That being said, the input is a url, not a file. You need to use urllib to read that url. I found the answer here.
I've got a working copy of the code like so:
import urllib2 # the lib that handles the url stuff
result = []
data = urllib2.urlopen(input_data['file'])
for line in data: # file lines are iterable
result.append(line) # keep each line, or parse, etc.
return {'lines': result}
The key takeaway is that you need to return a dictionary from the function, so make sure you somehow squish your file into one.
​Let me know if you've got any other questions!
#xavid, did you test this in Zapier?
It fails miserably beacuse urllib2 doesn't exist in the zapier python environment.

'idf vector is not fitted' error when using a saved classifier/model

Pardon me if I use the wrong terminology but what I want is to train a set of data (using GaussianNB Naive Bayes from Scikit Learn), save the model/classifier and then load it whenever I need and predict a category.
from sklearn.externals import joblib
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer
self.vectorizer = TfidfVectorizer(decode_error='ignore')
self.X_train_tfidf = self.vectorizer.fit_transform(train_data)
# Fit the model to my training data
self.clf = self.gnb.fit(self.X_train_tfidf.toarray(), category)
# Save the classifier to file
joblib.dump(self.clf, 'trained/NB_Model.pkl')
# Save the vocabulary to file
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
#Next time, I read the saved classifier
self.clf = joblib.load('trained/NB_Model.pkl')
# Read the saved vocabulary
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
# Initializer the vectorizer
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
# Try to predict a category for new data
X_new_tfidf = self.vectorizer.transform(new_data)
print self.clf.predict(X_new_tfidf.toarray())
# After running the predict command above, I get the error
'idf vector is not fitted'
Can anyone tell me what I'm missing?
Note: The saving of the model, the reading of the saved model and trying to predict a new category are all different methods of a class. I have collapsed all of them into a single screen here to make for easier reading.
Thanks
You need to pickle the self.vectorizer and load it again. Currently you are only saving the vocabulary learnt by the vectorizer.
Change the following line in your program:
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
to:
joblib.dump(self.vectorizer, 'trained/vectorizer.pkl')
And the following line:
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
to:
self.vectorizer =joblib.load('trained/vectorizer.pkl')
Delete this line:
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
Problem explanation:
You are correct in your thinking to just save the vocabulary learnt and reuse it. But the scikit-learn TfidfVectorizer also has the idf_ attribute which contains the IDF of the saved vocabulary. So you need to save that also. But even if you save both and load them both in a new TfidfVectorizer instance, then also you will get the "not_fitted" error. Because thats just the way most of the scikit transformers and estimators are defined. So without doing anything "hacky" saving the whole vectorizer is your best bet. If you still want to go onto the saving the vocabulary path, then please take a look here to how to properly do that:
http://thiagomarzagao.com/2015/12/08/saving-TfidfVectorizer-without-pickles/
The above page saves vocabulary into json and idf_ into a simple array. You can use pickles there, but you will get the idea about the working of TfidfVectorizer.
Hope it helps.

Import a file like a model in python?

An object has been serialized by pickle, and it will be used by a model which would be placed at anyplace (under any directory). Since the object is frequently used and kind of a part of the model, I want to have the model contain the pickle file (place the file under a directory of the model) as a variable.
./data/constant.py
object = pickle.load(open('object.pkl'))
./data/object.pkl
./code/model01.py
from ..data import constant
# or
# from __future import absolute_import
# from model.data import constant
object = constant.object
./code/model02.py
from ..data import constant
object = constant.object
The problem is obviously that python will search object.pkl under ./code/(and anywhere I use the function of the model outside of the model) rather than ./data/.
Am I doing it right? Any better solutions? Thanks.
I think this question may be duplicated (this is a very common issue) but I cannot find any related archive here. If so, please help redirect me there.
Doing a little bit of path manipulation should work:
In module constant.py:
import os
path = os.path.dirname(os.path.abspath(__file__))
obj = pickle.load(open(os.path.join(path, 'object.pkl')))
Looks like you want object to be part of the module constant.
One way would be just putting the pickled object in constant.py:
my_object = pickle.loads(pickled_object) # don't use the name `object` it is a built-in
Note the s in loads.
pickled_object needs to be placed inside constant.py before the line shown above. It has to be a byte string.
You can create it either directly from the object:
pickled_object = pickle.dumps(obj)
or take it form the pickled file and past it in.
Example
Pickle your object:
>>> import pickle
>>> obj = [1, 2, 3]
>>> pickle.dumps(obj)
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
Now, in constant.py:
pickled_object = b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
my_object = pickle.loads(pickled_object)
As a result, my_object is [1, 2, 3] and can be accessed via constant.my_object

How to save lists in python

I have a list including 4000 elements in python which each of its elements is an object of following class with several values.
class Point:
def __init__(self):
self.coords = []
self.IP=[]
self.BW=20
self.status='M'
def __repr__(self):
return str(self.coords)
I do not know how to save this list for future uses.
I have tried to save it by open a file and write() function, but this is not what I want.
I want to save it and import it in next program, like what we do in MATLAB that we can save a variable and import it in future
pickle is a good choice:
import pickle
with open("output.bin", "wb") as output:
pickle.dump(yourList, output)
and symmetric:
import pickle
with open("output.bin", "rb") as data:
yourList = pickle.load(data)
It is a good choice because it is included with the standard library, it can serialize almost any Python object without effort and has a good implementation, although the output is not human readable. Please note that you should use pickle only for your personal scripts, since it will happily load anything it receives, including malicious code: I would not recommend it for production or released projects.
This might be an option:
f = open('foo', 'wb')
np.save(f, my_list)
for loading then use
data = np.load(open('foo'))
if 'b' is not present in 'wb' then the program gives an error:
TypeError: write() argument must be str, not bytes
"b" for binary makes the difference.
Since you say Matlab, numpy should be an option.
f = open('foo', 'w')
np.save(f, my_list)
# later
data = np.load(open('foo'))
Of course, it'll return an array, not a list, but you can coerce it if you really want an array...