Jupyter string tokenization for python - python-2.7

I'm trying to implement simple_tokenize using dictionary as the output from my previous code but i get an error message. Any assistance with the following code would be much appreciated. I'm using Python 2.7 Jupyter
import csv
reader = csv.reader(open('data.csv'))
dictionary = {}
for row in reader:
key = row[0]
dictionary[key] = row[1:]
print dictionary
The above works pretty well but issue is with the following:
import re
words = dictionary
split_regex = r'\W+'
def simple_tokenize(string):
for i in rows:
word = words.split
#pass
print word
I get this error:
NameError Traceback (most recent call last)
<ipython-input-2-0d0e05fb1556> in <module>()
1 import re
2
----> 3 words = dictionary
4 split_regex = r'\W+'
5
NameError: name 'dictionary' is not defined

Variables are not saved between Jupyter sessions, unless you explicitly do so yourself. Thus, if you ran the first code section, then quit your Jupyter session, started a new Jupyter session and ran the second code block, dictionary is not preserved from the first session and will thus be undefined, as indicated by the error.
If you run the above code blocks differently (e.g., not across Jupyter sessions), you should indicate this, but the tags and traceback suggest this is what you do.

Related

How to get a row from a file that starts with multiple prefixes

I'm trying to get the rows that starts with 's1', 's2'. Below is the code.
import os
f = open("test.csv","r")
lines = f.readlines()
f.close()
f = open("test.csv","w")
for line in lines:
if (line.startsWith("s1") || line.startsWith("s2"))
f.write(line)
f.close()
When I run this script I'm getting syntax error near '||'.
I looked into this link and made changes accordingly -
How to check if a string starts with one of several prefixes?
Please let me know if I'm doing anything wrong.
FYI, I'm using Python 2.7.
Thanks!

How to load retrained_graph.pb and retrained_label.txt using pycharm editor

Using pete warden tutorials i had trained the inception network and training of which i am getting two files
1.retrained_graph.pb
2.retrained_label.txt
Using this i wanted to classify the flower image.
I had install pycharm and linked all the tensorflow library , i had also test the sample tensorflow code it is working fine.
Now when i run the label_image.py program which is
import tensorflow as tf, sys
image_path = sys.argv[1]
# Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("/tf_files/retrained_labels.txt")]
# Unpersists graph from file
with tf.gfile.FastGFile("/tf_files/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
i am getting this error message
/home/chandan/Tensorflow/bin/python /home/chandan/PycharmProjects/tf/tf_folder/tf_files/label_image.py
Traceback (most recent call last):
File "/home/chandan/PycharmProjects/tf/tf_folder/tf_files/label_image.py", line 7, in <module>
image_path = sys.argv[1]
IndexError: list index out of range
Could any one please help me with this issue.
You are getting this error because it is expecting image name (with path) as an argument.
In pycharm go to View->Tool windows->Terminal.
It is same as opening separate terminal. And run
python label_image.py /image_path/image_name.jpg
You are trying to get the command line argument by calling sys.argv[1]. So you need to give command line arguments to satisfy it. Looks like the argument required is a test image, you should pass its location as a parameter.
Pycharm should have a script parameters and interpreter options dialog which you can use to enter the required parameters.
Or you can call the script from a command line and enter the parameter via;
>python my_python_script.py my_python_parameter.jpg
EDIT:
According to the documents (I don't have pycharm installed on this computer), you should go to Run/Debug configuration menu and edit the configurations for your script. Add the absolute path of your file into Script Parameters box in quotes.
Or alternatively if you just want to skip the parameter thing completely, just get the path as raw_input (input in python3) or just simply give it to image_path = r"absolute_image_path.jpg"

Handling map function in python2 & python3

Recently i came across a question & confused with a possible solution,
code part is
// code part in result reader
result = map(int, input())
// consumer call
result_consumer(result)
its not about how do they work, the problem is when you are running in python2 it will raise an exception, on result fetching part, so result reader can handle the exception, but incase of python3 a map object is returned, so only consumer will be able to handle exception.
is there any solution keeping map function & handle the exception in python2 & python3
python3
>>> d = map(int, input())
1,2,3,a
>>> d
<map object at 0x7f70b11ee518>
>>>
python2
>>> d = map(int, input())
1,2,3,'a'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'a'
>>>
the behavior of map is not the only difference between python2 and python3, input is also difference, you need to keep in mind the basic differences between the two to make code compatible for both
python 3 vs python 2
map = itertools.imap
zip = itertools.izip
filter = itertools.ifilter
range = xrange
input = raw_input
so to make code for both, you can use alternatives like list comprehension that work the same for both, and for those that don't have easy alternatives, you can make new functions and/or use conditional renames, like for example
my_input = input
try:
raw_input
except NameError: #we are in python 3
my_input = lambda msj=None: eval(input(msj))
(or with your favorite way to check which version of python is in execution)
# code part in result reader
result = [ int(x) for x in my_input() ]
# consumer call
result_consumer(result)
that way your code do the same regardless of which version of python you run it.
But as jsbueno mentioned, eval and python2's input are dangerous so use the more secure raw_input or python3's input
try:
input = raw_input
except NameError: #we are in python 3
pass
(or with your favorite way to check which version of python is in execution)
then if your plan is to provide your input as 1,2,3 add an appropriate split
# code part in result reader
result = [ int(x) for x in input().split(",") ]
# consumer call
result_consumer(result)
If you always need the exception to occur at the same place you can always force the map object to yield its results by wrapping it in a list call:
result = list(map(int, input()))
If an error occurs in Python 2 it will be during the call to map while, in Python 3, the error is going to surface during the list call.
The slight downside is that in the case of Python 2 you'll create a new list. To avoid this you could alternatively branch based on sys.version and use the list only in Python 3 but that might be too tedious for you.
I usually use my own version of map in this situations to escape any possible problem may occur and it's
def my_map(func,some_list):
done = []
for item in some_list:
done.append( func(item) )
return done
and my own version of input too
def getinput(text):
import sys
ver = sys.version[0]
if ver=="3":
return input(text)
else:
return raw_input(text)
if you are working on a big project add them to a python file and import them any time you need like what I do.

Python 2.7 and Textblob - TypeError: The `text` argument passed to `__init__(text)` must be a string, not <type 'list'>

Update: Issue resolved. (see comment section below.) Ultimately, the following two lines were required to transform my .csv to unicode and utilize TextBlob: row = [cell.decode('utf-8') for cell in row], and text = ' '.join(row).
Original question:
I am trying to use a Python library called Textblob to analyze text from a .csv file. Error I receive when I call Textblob in my code is:
Traceback (most recent call last): File
"C:\Users\Marcus\Documents\Blog\Python\Scripts\Brooks\textblob_sentiment.py",
line 30, in
blob = TextBlob(row) File "C:\Python27\lib\site-packages\textblob\blob.py", line 344, in
init
'must be a string, not {0}'.format(type(text)))TypeError: The text argument passed to __init__(text) must be a string, not
My code is:
#from __future__ import division, unicode_literals #(This was recommended for Python 2.x, but didn't help in my case.)
#-*- coding: utf-8 -*-
import csv
from textblob import TextBlob
with open(u'items.csv', 'rb') as scrape_file:
reader = csv.reader(scrape_file, delimiter=',', quotechar='"')
for row in reader:
row = [unicode(cell, 'utf-8') for cell in row]
print row
blob = TextBlob(row)
print type(blob)
I have been working through UTF/unicode issues. I'd originally had a different subject which I posed to this thread. (Since my code and the error have changed, I'm posting to a new thread.) Print statements indicate that the variable "row" is of type=str, which I thought indicated that the reader object had been transformed as required by Textblob. The source .csv file is saved as UTF-8. Can anyone provide feedback as to how I can get unblocked on this, and the flaws in my code?
Thanks so much for the help.
So maybe you can make change as below:
row = str([cell.encode('utf-8') for cell in row])

Element not found in cache - Selenium (Python)

I just wrote a simple webscraping script to give me all the episode links on a particular site's page. The script was working fine, but, now it's broke. I didn't change anything.
Try this URL (For scraping ) :- http://www.crunchyroll.com/tabi-machi-late-show
Now, the script works mid-way and gives me an error stating, ' Element not found in the cache - perhaps the page has changed since it was looked up'
I looked it up on internet and people said about using the 'implicit wait' command at certain places. I did that, still no luck.
UPDATE : I tried this script in a demote desktop and it's working there without any problems.
Here's my script :-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import time
from subprocess import Popen
#------------------------------------------------
try:
Link = raw_input("Please enter your Link : ")
if not Link:
raise ValueError('Please Enter A Link To The Anime Page. This Application Will now Exit in 5 Seconds.')
except ValueError as e:
print(e)
time.sleep(5)
exit()
print 'Analyzing the Page. Hold on a minute.'
driver = webdriver.Firefox()
driver.get(Link)
assert "Crunchyroll" in driver.title
driver.implicitly_wait(5) # <-- I tried removing this lines as well. No luck.
elem = driver.find_elements_by_xpath("//*[#href]")
driver.implicitly_wait(10) # <-- I tried removing this lines as well. No luck.
text_file = open("BatchLink.txt", "w")
print 'Fetching The Links, please wait.'
for elem in elem:
x = elem.get_attribute("href")
#print x
text_file.write(x+'\n')
print 'Links have been fetched. Just doing the final cleaning now.'
text_file.close()
CleanFile = open("queue.txt", "w")
with open('BatchLink.txt') as f:
mylist = f.read().splitlines()
#print mylist
with open('BatchLink.txt', 'r') as inF:
for line in inF:
if 'episode' in line:
CleanFile.write(line)
print 'Please Check the file named queue.txt'
CleanFile.close()
os.remove('BatchLink.txt')
driver.close()
Here's a screenshot of the error (might be of some help) :
http://i.imgur.com/SaANlsg.png
Ok i didn't work with python but know the problem
you have variable that you init -> elem = driver.find_elements_by_xpath("//*[#href]")
after that you doing some things with it in loop
before you finishing the loop try to init this variable again
elem = driver.find_elements_by_xpath("//*[#href]")
The thing is that the DOM is changes and you loosing the element collection.