def func():
import csv
file=open("cmiday.csv")
x,y=[],[]
reader=csv.DictReader(file)
for row in reader:
if(type(row["max_rel_hum"])%1==0):
continue
if(type(row["precip"])%1==0):
continue
if(row["max_rel_hum"]>100):
continue
if(row["max_rel_hum"]<0):
continue
if (row["precip"]>10):
continue
if(row["precip"]<0):
continue
x.append(row["max_rel_hum"])
y.append(row["precip"])
print x
print y
I'm trying to collect data from a csv file into lists x and y. I don't want any values for row["max_rel_hum"] to be integers or be more than 100 or less than 0. Similarly, I don't want any values for row["precip"] to be more than 10 or less than 0. I'm getting this error when I try to run the function:
>>> func()
Traceback (most recent call las
File "<stdin>", line 1, in <m
File "hw.py", line 7, in func
if(row["max_rel_hum"]%1==0)
ValueError: incomplete format
Please help out. Thanks
Values from a CSV are strings, not integers. You're expecting % to do modulo, but on a string it does string formatting.
You need something like this:
if ( int(row["max_rel_hum"]) % 1 == 0):
And you need to do int() for in all the lines, even the < and > ones - they are valid operations on strings, but will do an alphabetical order comparison, not a numeric comparison, and won't give the results you expect.
You don't need type() in the if line at all.
Related
I am working on Spark using Python API. Below is my code. When I execute the line wordCount.first(). I am receiving ValueError: need more than 1 value to unpack. Any light on the above error would be appreciated. Thanks...
#create an RDD with textFile method
text_data_file=sc.textFile('/resources/yelp_labelled.txt')
#import the required library for word count operation
from operator import add
#Use filter to return RDD for words length greater than zero
wordCountFilter=text_data_file.filter(lambda x:len(x)>0)
#use flat map to split each line into words
wordFlatMap=wordCountFilter.flatMap(lambda x: x.split())
#map each key with value 1 using map function
wordMapper=wordFlatMap.flatMap(lambda x:(x,5))
#Use reducebykey function to reduce above mapped keys
#returns the key-value pairs by adding values for similar keys
wordCount=wordMapper.reduceByKey(add)
#view the first element
wordCount.first()
File "/home/notebook/spark-1.6.0-bin-`hadoop2.6/python/lib/pyspark.zip/pyspark/shuffle.py", line 236, in mergeValues for k, v in iterator: ValueError: need more than 1 value to unpack`
Your mistake is here:
wordMapper=wordFlatMap.flatMap(lambda x:(x,5))
it should be
wordMapper=wordFlatMap.map(lambda x:(x,5))
otherwise you just emit
x
and
5
as separate values. Spark will try to expand x and fail, it its length is not equal to 2. Otherwise it will try to unpack 5 and fail as well.
I have a code that I am trying to run which will compare a value from a csv file to a threshold that I have set within the py file.
My csv file has an output similar to below, but with 1030 lines
-46.62
-47.42
-47.36
-47.27
-47.36
-47.24
-47.24
-47.03
-47.12
Note: there are no lines between the values but there is a single space before them.
My first attempt was with this code:
file_in5 = open('710_edited_capture.csv', 'r')
line5=file_in5.readlines()
a=line5[102]
b=line5[307]
c=line5[512]
d=line5[717]
e=line5[922]
print[a]
print[b]
print[c]
print[d]
print[e]
which gave the output of:
[' -44.94\n']
[' -45.06\n']
[' -45.09\n']
[' -45.63\n']
[' -45.92\n']
My first thought was to use .strip() to remove the space and the \n but this is not supported in lists and returns the error:
Traceback (most recent call last):
File "/root/test.py", line 101, in <module>
line5=line5.strip()
AttributeError: 'list' object has no attribute 'strip'
My next code below:
for line5 in file_in5:
line5=line5.strip()
line5=file_in5.readlines()
a=line5[102]
b=line5[307]
c=line5[512]
d=line5[717]
e=line5[922]
print[a]
print[b]
print[c]
print[d]
print[e]
Returns another error:
Traceback (most recent call last):
File "/root/test.py", line 91, in <module>
line5=file_in5.readlines()
ValueError: Mixing iteration and read methods would lose data
What is the most efficient way to read in just 5 specific lines without any spaces or \n, and then be able to use them in subsequent calculations such as:
if a>threshold and a>b and a>c and a>d and a>e:
print ('a is highest and within limit')
CF=a
You can use strip(), but you need to use read() instead of readlines(). Another way, if you have more than one value in a row with comma separation, you can use the code as below:
with open('710_edited_capture.csv', 'r') as file:
file_content=file.readlines()
for line in file_content:
vals = line.strip().split(',')
print(vals)
You can also append "vals" to an empty list. As a result, you will get a list that contains a list of values for each line.
it's a little bit unclear what you want to do but if you just want to read a file compare each value to a threshold value and keep upper value here a example :
threshold=46.2
outlist=[]
with open('data.csv', 'r') as data:
for i in data:
if float(i)>threshold:
outlist.append(i)
then you can adapt it to your needs...
Thanks for all the comments and suggestions however they are not quite what I needed.
I have however applied a workaround, although admittedly clunky.
I have created 5 additional files from the original with only the one value in each. From this I can now strip the space and /n and save them locally as a variable. I no longer needed the readlines
These variables can be compared to each other and the threshold to determine the optimum choice.
I have just started doing my first research project, and I have just begun programming (approximately 2 weeks ago). Excuse me if my questions are naive. I might be using python very inefficiently. I am eager to improve here.
I have experimental data that I want to analyse. My goal is to create a python script that takes the data as input, and that for output gives me graphs, where certain parameters contained in text files (within the experimental data folders) are plotted and fitted to certain equations. This script should be as generalizable as possible so that I can use it for other experiments.
I'm using the Anaconda, Python 2.7, package, which means I have access to various libraries/modules related to science and mathematics.
I am stuck at trying to use For and While loops (for the first time).
The data files are structured like this (I am using regex brackets here):
.../data/B_foo[1-7]/[1-6]/D_foo/E_foo/text.txt
What I want to do is to cycle through all the 7 top directories and each of their 6 subdirectories (named 1,2,3...6). Furthermore, within these 6 subdirectories, a text file can be found (always with the same filename, text.txt), which contain the data I want to access.
The 'text.txt' files is structured something like this:
1 91.146 4.571 0.064 1.393 939.134 14.765
2 88.171 5.760 0.454 0.029 25227.999 137.883
3 88.231 4.919 0.232 0.026 34994.013 247.058
4 ... ... ... ... ... ...
The table continues down. Every other row is empty. I want to extract information from 13 rows starting from the 8th line, and I'm only interested in the 2nd, 3rd and 5th columns. I want to put them into lists 'parameter_a' and 'parameter_b' and 'parameter_c', respectively. I want to do this from each of these 'text.txt' files (of which there is a total of 7*6 = 42), and append them to three large lists (each with a total of 7*6*13 = 546 items when everything is done).
This is my attempt:
First, I made a list, 'list_B_foo', containing the seven different 'B_foo' directories (this part of the script is not shown). Then I made this:
parameter_a = []
parameter_b = []
parameter_c = []
j = 7 # The script starts reading 'text.txt' after the j:th line.
k = 35 # The script stops reading 'text.txt' after the k:th line.
x = 0
while x < 7:
for i in range(1, 7):
path = str(list_B_foo[x]) + '/%s/D_foo/E_foo/text.txt' % i
m = open(path, 'r')
line = m.readlines()
while j < k:
line = line[j]
info = line.split()
print 'info:', info
parameter_a.append(float(info[1]))
parameter_b.append(float(info[2]))
parameter_c.append(float(info[5]))
j = j + 2
x = x + 1
parameter_a_vect = np.array(parameter_a)
parameter_b_vect = np.array(parameter_b)
parameter_c_vect = np.array(parameter_c)
print 'a_vect:', parameter_a_vect
print 'b_vect:', parameter_b_vect
print 'c_vect:', parameter_c_vect
I have tried to fiddle around with indentation without getting it to work (receiving either syntax error or indentation errors). Currently, I get this output:
info: ['1', '90.647', '4.349', '0.252', '0.033', '93067.188', '196.142']
info: ['.']
Traceback (most recent call last):
File "script.py", line 104, in <module>
parameter_a.append(float(info[1]))
IndexError: list index out of range
I don't understand why I get the "list index out of range" message. If anyone knows why this is the case, I would be happy to hear you out.
How do I solve this problem? Is my approach completely wrong?
EDIT: I went for a pure while-loop solution, taking RebelWithoutAPulse and CamJohnson26's suggestions into account. This is how I solved it:
parameter_a=[]
parameter_b=[]
parameter_c=[]
k=35 # The script stops reading 'text.txt' after the k:th line.
x=0
while x < 7:
y=1
while y < 7:
j=7
path1 = str(list_B_foo[x]) + '/%s/pdata/999/dcon2dpeaks.txt' % (y)
m = open(path, 'r')
lines = m.readlines()
while j < k:
line = lines[j]
info = line.split()
parameter_a.append(float(info[1]))
parameter_b.append(float(info[2]))
parameter_c.append(float(info[5]))
j = j+2
y = y+1
x = x+1
Meta: I am not sure If I should give the answer to the person who answered the quickest and who helped me finish my task. Or the person with the answer which I learned most from. I am sure this is a common issue that I can find an answer to by reading the rules or going to Stackexchange Meta. Until I've read up on the recomendations, I will hold off on marking the question as answered by any of you two.
Welcome to stack overflow!
The error is due to name collision that you inadvertenly have created. Note the output before the exception occurs:
info: ['1', '90.647', '4.349', '0.252', '0.033', '93067.188', '196.142']
info: ['.']
Traceback (most recent call last):
...
The line[1] cannot compute - there is no "1"-st element in the list, containing only '.' - in python the lists start with 0 position.
This happens in your nested loop,
while j < k
where you redefine the very line you read previously created:
line = m.readlines()
while j < k:
line = line[j]
info = line.split()
...
So what happens is on first run of the loop, your read the lines of the files into line list, then you take one line from the list, assign it to line again, and continue with the loop. At this point line contains a string.
On the next run reading from line via specified index reads the character from the string on the j-th position and the code malfunctions.
You could fix this with different naming.
P.S. I would suggest using with ... as ... syntax while working with files, it is briefly described here - this is called a context manager and it takes care of opening and closing the files for you.
P.P.S. I would also suggest reading the naming conventions
Looks like you are overwriting the line array with the first line of the file. You call line = m.readlines(), which sets line equal to an array of lines. You then set line = line[j], so now the line variable is no longer an array, it's a string equal to
1 91.146 4.571 0.064 1.393 939.134 14.765
This loop works fine, but the next loop will treat line as an array of chars and take the 4th element, which is just a period, and set it equal to itself. That explains why the info variable only has one element on the second pass through the loop.
To solve this, just use 2 line variables instead of one. Call one lines and the other line.
lines = m.readlines()
while j < k:
line = lines[j]
info = line.split()
May be other errors too but that should get you started.
I've looked at the other questions posted on the site about index error, but I'm still not understanding how to fix my own code. Im a beginner when it comes to Python. Based on the users input, I want to check if that input lies in the fourth position of each line in the list of lists.
Here's the code:
#create a list of lists from the missionPlan.txt
from __future__ import with_statement
listoflists = []
with open("missionPlan.txt", "r") as f:
results = [elem for elem in f.read().split('\n') if elem]
for result in results:
listoflists.append(result.split())
#print(listoflists)
#print(listoflists[2][3])
choice = int(input('Which command would you like to alter: '))
i = 0
for rows in listoflists:
while i < len(listoflists):
if listoflists[i][3]==choice:
print (listoflists[i][0])
i += 1
This is the error I keep getting:
not getting inside the if statement
So, I think this is what you're trying to do - find any line in your "missionPlan.txt" where the 4th word (after splitting on whitespace) matches the number that was input, and print the first word of such lines.
If that is indeed accurate, then perhaps something along this line would be a better approach.
choice = int(input('Which command would you like to alter: '))
allrecords = []
with open("missionPlan.txt", "r") as f:
for line in f:
words = line.split()
allrecords.append(words)
try:
if len(words) > 3 and int(words[3]) == choice:
print words[0]
except ValueError:
pass
Also, if, as your tags suggest, you are using Python 3.x, I'm fairly certain the from __future__ import with_statement isn't particularly necessary...
EDIT: added a couple lines based on comments below. Now in addition to examining every line as it's read, and printing the first field from every line that has a fourth field matching the input, it gathers each line into the allrecords list, split into separate words as a list - corresponding to the original questions listoflists. This will enable further processing on the file later on in the code. Also fixed one glaring mistake - need to split line into words, not f...
Also, to answer your "I cant seem to get inside that if statement" observation - that's because you're comparing a string (listoflists[i][3]) with an integer (choice). The code above addresses both that comparison mismatch and the check for there actually being enough words in a line to do the comparison meaningfully...
The error code I get in another file that uses it is:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\pyahoolib-0.2-py2.7.egg\yahoo\session.py", line 107, in listener
t.send_pk(consts.SERVICE_AUTHRESP, auth.hash(t.login_id, t.passwd, p[94]))
File "C:\Anaconda\lib\site-packages\pyahoolib-0.2-py2.7.egg\yahoo\auth.py", line 73, in hash
hs = md5.new(mkeystr+"".join(map(chr,[x,x>>8,y]))).digest()
ValueError: chr() arg not in range(256)
UPDATE: #merlin2011: This is confusing me. the code is hs = md5.new(mkeystr+"".join(map(chr,[x,x>>8,y]))).digest()
Where the chr has a comma after it. I thought it was a function from doc.python.org: chr(i)
Return a string of one character whose ASCII code is the integer i. For example, chr(97) returns the string 'a'. This is the inverse of ord(). The argument must be in the range [0..255], inclusive; ValueError will be raised if i is outside that range. See also unichr().
If so, is [x,x>>8,y] an iterable for map() I just don't recognize yet?
Also, I don't want to change any of this code because it is part of the pyahoolib-0.2 auth.py file. But to get it all working I do not know what to do.
It's the Binary Right Shift Operator:
From Python Wiki:
x >> y:
Returns x with the bits shifted to the right by y places. This is the same as integer-dividing (\\) x by 2**y.
In case you were wondering, the error message means that chr only accepts arguments inside the range 0 to 256, and your map function is causing it to be called with a value that is outside that range.