how to Looping python scripts together - python-2.7

I have two files. An 'initialization' script aka file1.py and a 'main code' script aka 'main_code.py'. The main_code.py is really a several hundred line .ipynb that was converted to a .py file. I want to run the same skeleton of the code with the only adjustment being to pass in the different parameters found in the 'file1.py' script.
In reality, it is much more complex than what I have laid out below with more references to other locations / DBs and what not.
However, I receive errors such as 'each_item[0]' is not defined. I can't seem to be able to pass in the values/variables that come from my loop in file1.py to my script that is contained inside the loop.
Must be doing something very obviously wrong as I imagine this is a simple fix
file1.py:
import pandas as pd
import os
import bumpy as np
import jaydebeapi as jd
#etc...
cities = ['NYC','LA','DC','MIA'] # really comes from a query/column
value_list = [5,500,5000,300] # comes from a query/column
zipped = list(zip(cities,value_list)) # make tuples
for each_item in zipped:
os.system('python loop_file.py')
# where I'm getting errors.
main_code.py:
names = each_item[0]
value = each_item[1]
# lots of things going on here in real code but for simplicity...
print value = value * 4
print value

Related

Cleaning up re.search output?

I have been writing a script which will recover for me CVSS3 scores when i enter a vulnerability name, i've pretty much got it working as intended except for a minor annoying detail.
π ~/Documents/Tools/Scripts ❯ python3 CVSS3-Grabber.py
Paste Vulnerability Name: PHP 7.2.x < 7.2.21 Multiple Vulnerabilities.
Base Score: None
Vector: <re.Match object; span=(27869, 27913), match='CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:L/I:N/A:H'>
Temporal Vector: <re.Match object; span=(27986, 28008), match='CVSS:3.0/E:U/RL:O/RC:C'>
As can be seen the output could be much neater, i would much prefer something like this:
π ~/Documents/Tools/Scripts ❯ python3 CVSS3-Grabber.py
Paste Vulnerability Name: PHP 7.2.x < 7.2.21 Multiple Vulnerabilities.
Base Score: None
Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:U/C:L/I:N/A:H
However i have been struggling to figure out how to get the output nicer, is there an easy part of the re module that im missing that can do this for me? or perhaps putting the output into a file first would then allow me to manipulate the text to how i need it.
Here is my code, would appreciate any feedback on how to improve as i have recently gotten back into python and scripting in general.
import requests
import re
from bs4 import BeautifulSoup
from googlesearch import search
def get_url():
vuln = input("Paste Vulnerability Name: ") + "tenable"
for url in search(vuln, tld='com',lang='en',num=1,start=0,stop=1,pause=2.0):
return url
def get_scores(url):
response = requests.get(url)
html = response.text
cvss3_temporal_v = re.search("CVSS:3.0/E:./RL:./RC:.",html)
cvss3_v = re.search("CVSS:3.0/AV:./AC:./PR:./UI:./S:./C:./I:./A:.",html)
cvss3_basescore = re.search("Base Score:....",html)
print("Base Score: ",cvss3_basescore)
print("Vector: ",cvss3_v)
print("Temporal Vector: ",cvss3_temporal_v)
urll = get_url()
get_scores(urll)
### IMPROVEMENTS ###
# Include the base score in output
# Tidy up output
# Vulnerability list?
# modify to accept flags, i.e python3 CVSS3-Grabber.py -v VULNAME ???
# State whether it is a failing issue or Action point
Thanks!
Don't print the match object. Print the match value.
In Python the value is accessible through the .group() method. If there are no regex subgroups (or you want the entire match, like in this case), don't specify any arguments when you call it:
print("Vector: ", cvss3_v.group())

Python 'rawpy._rawpy.RawPy' object has no attribute 'imread' after second pass

I try to process a series of DNG raw picture files and it all works well for the first pass (first fils). When I try to read the second DNG file during the second pass through the for-next loop, I receive the error message 'rawpy._rawpy.RawPy' object has no attribute 'imread' when executing the line "with raw.imread(file) as raw:".
import numpy as np
import rawpy as raw
import pyexiv2
from scipy import stats
for file in list:
metadata = pyexiv2.ImageMetadata(file)
metadata.read()
with raw.imread(file) as raw:
rgb16 = raw.postprocess(gamma=(1,1), no_auto_bright=True, output_bps=16)
avgR=stats.describe(np.ravel(rgb16[:,:,0]))[2]
avgG=stats.describe(np.ravel(rgb16[:,:,1]))[2]
avgB=stats.describe(np.ravel(rgb16[:,:,2]))[2]
print i,file,'T=', metadata['Exif.PentaxDng.Temperature'].raw_value,'C',avgR,avgG,avgB
i+=1
I tried already to close the raw object but from googling I understand that is not necessary when a context manager is used.
Help or suggestions are very welcome.
Thanks in advance.
You're overwriting your alias of the rawpy module (raw) with the image you're reading. That means you'll get an error on the second pass through the loop.
import rawpy as raw # here's the first thing named "raw"
#...
for file in list:
#...
with raw.imread(file) as raw: # here's the second
#...
Pick a different name for one of the variables and your code should work.

'idf vector is not fitted' error when using a saved classifier/model

Pardon me if I use the wrong terminology but what I want is to train a set of data (using GaussianNB Naive Bayes from Scikit Learn), save the model/classifier and then load it whenever I need and predict a category.
from sklearn.externals import joblib
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_extraction.text import TfidfVectorizer
self.vectorizer = TfidfVectorizer(decode_error='ignore')
self.X_train_tfidf = self.vectorizer.fit_transform(train_data)
# Fit the model to my training data
self.clf = self.gnb.fit(self.X_train_tfidf.toarray(), category)
# Save the classifier to file
joblib.dump(self.clf, 'trained/NB_Model.pkl')
# Save the vocabulary to file
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
#Next time, I read the saved classifier
self.clf = joblib.load('trained/NB_Model.pkl')
# Read the saved vocabulary
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
# Initializer the vectorizer
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
# Try to predict a category for new data
X_new_tfidf = self.vectorizer.transform(new_data)
print self.clf.predict(X_new_tfidf.toarray())
# After running the predict command above, I get the error
'idf vector is not fitted'
Can anyone tell me what I'm missing?
Note: The saving of the model, the reading of the saved model and trying to predict a new category are all different methods of a class. I have collapsed all of them into a single screen here to make for easier reading.
Thanks
You need to pickle the self.vectorizer and load it again. Currently you are only saving the vocabulary learnt by the vectorizer.
Change the following line in your program:
joblib.dump(self.vectorizer.vocabulary_, 'trained/vectorizer_vocab.pkl')
to:
joblib.dump(self.vectorizer, 'trained/vectorizer.pkl')
And the following line:
self.vocab =joblib.load('trained/vectorizer_vocab.pkl')
to:
self.vectorizer =joblib.load('trained/vectorizer.pkl')
Delete this line:
self.vectorizer = TfidfVectorizer(vocabulary=self.vocab, decode_error='ignore')
Problem explanation:
You are correct in your thinking to just save the vocabulary learnt and reuse it. But the scikit-learn TfidfVectorizer also has the idf_ attribute which contains the IDF of the saved vocabulary. So you need to save that also. But even if you save both and load them both in a new TfidfVectorizer instance, then also you will get the "not_fitted" error. Because thats just the way most of the scikit transformers and estimators are defined. So without doing anything "hacky" saving the whole vectorizer is your best bet. If you still want to go onto the saving the vocabulary path, then please take a look here to how to properly do that:
http://thiagomarzagao.com/2015/12/08/saving-TfidfVectorizer-without-pickles/
The above page saves vocabulary into json and idf_ into a simple array. You can use pickles there, but you will get the idea about the working of TfidfVectorizer.
Hope it helps.

How to save lists in python

I have a list including 4000 elements in python which each of its elements is an object of following class with several values.
class Point:
def __init__(self):
self.coords = []
self.IP=[]
self.BW=20
self.status='M'
def __repr__(self):
return str(self.coords)
I do not know how to save this list for future uses.
I have tried to save it by open a file and write() function, but this is not what I want.
I want to save it and import it in next program, like what we do in MATLAB that we can save a variable and import it in future
pickle is a good choice:
import pickle
with open("output.bin", "wb") as output:
pickle.dump(yourList, output)
and symmetric:
import pickle
with open("output.bin", "rb") as data:
yourList = pickle.load(data)
It is a good choice because it is included with the standard library, it can serialize almost any Python object without effort and has a good implementation, although the output is not human readable. Please note that you should use pickle only for your personal scripts, since it will happily load anything it receives, including malicious code: I would not recommend it for production or released projects.
This might be an option:
f = open('foo', 'wb')
np.save(f, my_list)
for loading then use
data = np.load(open('foo'))
if 'b' is not present in 'wb' then the program gives an error:
TypeError: write() argument must be str, not bytes
"b" for binary makes the difference.
Since you say Matlab, numpy should be an option.
f = open('foo', 'w')
np.save(f, my_list)
# later
data = np.load(open('foo'))
Of course, it'll return an array, not a list, but you can coerce it if you really want an array...

Import a list from a different file

Is there a way I can import a list from a different Python file? For example if I have a list:
list1 = ['horses', 'sheep', 'cows', 'chickens', 'dog']
Can I import this list into other files? I know to import other functions you do
from FileName import DefName
This is a user defined list and I don't want to have the user input the same list a million times.
Just a few maybes as to how this could be done:
from FileName import ListName or put all the lists into a function and then import the definition name
Thanks for the help
One option is to dump that list into a temp file, and read it from your other python script.
Another option (if one python script calls the other), is to pass the list as an argument (e.g. using sys.argv[1] and *args, etc).
I'll just export the lists in a file. Therefore every piece of code can read it.