I'm trying to read every 10th line in a text file, adding certain parts of the line to lists, and ignore the rest. Each line in my .txt file is a comma-delimited list of 29 elements, and I'm using line.strip(',') to separate each line into a list, called line_list.
for line in current_file:
if line.startswith('#'):
pass
for linenum, line in enumerate(current_file):
if linenum % 10 == 0:
line_list = line.split(",")
time_list.append(line_list[0])
values.append(line_list[4])
else:
pass
However, python doesn't seem to be recognizing all the elements in the list. When I print len(line_list) it returns the value 29, but doesn't recognize any index larger than 0. So the line time_list.append(line_list[0]) works but I get an index out of range error for values.append(line_list[4])
Printing line_list[0] works for all lines and when I print line_list I get the entire list of 29 comma separated values.
Any ideas?
EDIT
Thanks to a commenter I removed the double iteration, so the updated code is
for linenum, line in enumerate(current_file):
if line.startswith('#'):
pass
if linenum % 10 == 0:
line_list = line.split(",")
time_list.append(line_list[0])
values.append(line_list[4])
else:
pass
I'm still getting an index out of range error though, for line_list[4]
Use one for loop — I think you want something like:
for linenum, line in enumerate(current_file):
if line.startswith('#'):
continue # Skip it.
if linenum % 10 == 0:
line_list = line.split(",")
print(line_list)
time_list.append(line_list[0])
values.append(line_list[4])
You are using line twice a a variable to loop over. This is not a good coding practice and may cause the strange behavior. For example change the second line variable to just l:
for line in current_file:
if line.startswith('#'):
pass
for linenum, l in enumerate(current_file):
if linenum % 10 == 0:
line_list = l.split(",")
print(line_list)
time_list.append(line_list[0])
values.append(line_list[4])
else:
pass
Related
I tried a very simple python script, which be used to add certain strings in each row, the code is:
import csv
List = []
list = []
csv_reader = csv.reader(open('moz_press_IDE_mdoc.csv'))
i = 0
for row in csv_reader:
List.append(list)
j = 1
for num in row:
tmp = str(i) + ':'
num = tmp + num
j += 1
List[i].append(num)
i += 1
out = open('newCSV.csv', 'w')
csv_writer = csv.writer(out)
csv_writer.writerow(List)
out.close()
It shows the error message:
Traceback (most recent call last):
File "preprocess.py", line 19, in <module>
csv_writer.writerow(List)
MemoryError
Can someone help me with that?
This line
List.append(list)
doesn't do what you think it does. It appends a fresh reference to the existing variable.
Instead, write
List.append([])
because (I think) you want a new empty list, not a new reference to the one you manipulated in the previous loop iteration. Print out List before passing it to csv.writer() to see the difference.
And please don't use list as a variable name, because it masks the builtin type. That makes your code hard to read and may lead to mystifying bugs.
So I am rather new to programming and just recently started with Classes and we are supposed to make a phonebook that can be loaded in seperate text files.
I however keep running into the problem in this section that when I get into the for-loop. It hits a brick wall on
if storage[2] == permaStorage[i].number:
And tells me "IndexError: list index out of range". I am almost certain it is due to permaStorage starts out empty, but even when I attempt to fill it with temporary instances of Phonebook it tells me it out of range. The main reason it is there is to check if a phone number already exists within the permaStorage.
Anyone got a good tip on how to solve this or work around it?
(Sorry if the text is badly written. Just joined this site and not sure on the style)
class Phonebook():
def __init__(self):
self.name = ''
self.number = ''
def Add(name1, number1):
y = Phonebook()
y.name = name1
y.number = number1
return y
def Main():
permaStorage = []
while True:
print " add name number\n lookup name\n alias name newname\n change name number\n save filename\n load filename\n quit\n"
choices = raw_input ("What would you like to do?: ")
storage = choices.split(" ")
if storage[0] == "add":
for i in range(0, len(permaStorage)+1):
if storage[2] == permaStorage[i].number:
print "This number already exists. No two people can have the same phonenumber!\n"
break
if i == len(permaStorage):
print "hej"
try:
tempbox = Add(storage[1], storage[2])
permaStorage.append(tempbox)
except:
raw_input ("Remember to write name and phonenumber! Press any key to continue \n")
I think problem is that permaStorage is empty list and then u try to:
for i in range(0, len(permaStorage)+1):
if storage[2] == permaStorage[i].number:
will cause an error because permaStorage has 0 items but u trying to get first (i=0, permaStorage[0]) item.
I think you should replace second if clause with first one:
for i in range(0, len(permaStorage)+1):
if i == len(permaStorage):
print "hej"
try:
tempbox = Add(storage[1], storage[2])
permaStorage.append(tempbox)
if storage[2] == permaStorage[i].number:
print "This number already exists. No two people can have the same phonenumber!\n"
break
So in this case if perStorage is blank you will append some value and next if clause will be ok.
Indexing starts at zero in python. Hence, a list of length 5 has the last element index as 4 starting from 0. Change range to range(0, len(permastorage))
You should iterate upto the last element of the list, not beyond.
Try -
for i in range(0, len(permaStorage)):
The list of numbers produced in range() is from the start, but not including the end, so range(3) == [0, 1, 2].
So if your list x has length 10, range(0, len(x)) will give you 0 through 9, which is the correct indices of the elements of your list.
Adding 1 to len(x) will produce the range 0 through 10, and when you try to access x[10], it will fail.
I have two files: one is a single column (call it pred) and has no headers, the other has two columns: ID and IsClick (it has headers). My goal is to use the column ID as an index to pred.
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
The output is:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
there seems something wrong because my file has 7816361 lines counting the header while my list has an extra element (len of list 7816361)
I don't have your csv files to recreate the problem, but the problem looks like it is being caused by your use of index.
index = sample.ID.values - 1 is taking each of your sample ID's and subtracting 1. These are not index values in pred as it is only 7816360 long. Each of the last 3 items in your index array (based on your print output) would go out of bounds as they are >7816360. I suspect the error is showing you the first of your ID-1 that go out of bounds.
Assuming you just want to join the files based on their line number you could do the following:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
Otherwise you'll need to perform a join or merge on your two dataframes.
Hello I have written a python program to parse data specific data from txt file
my code is:
f = open('C:/Users/aikaterini/Desktop/Ericsson_PARSER/BSC_alarms_vf_OSS.txt','r')
from datetime import datetime
import MySQLdb
def firstl():
with f as lines:
lines = lines.readlines()
print len(lines)
for i,line in enumerate(lines):
if line.startswith("*** Disconnected from"):
conline = line.split()
bsc = conline[-2]
print "\n"*5
print bsc
print "*"*70
break
for i,line in enumerate(lines):
if line.startswith("*** Connected to"):
conline = line.split()
bsc = conline[-2]
print "\n"*5
print bsc
print "*"*70
elif line[:3] == "A1/" or line[:3] == "A2/":
if lines[i+1].startswith("RADIO"):
fal = line.split()
first_alarm_line = [fal[0][:2],fal[-2],fal[-1]]
year = first_alarm_line[1][:2]
month = first_alarm_line[1][2:4]
day = first_alarm_line[1][4:]
hours = first_alarm_line[2][:2]
minutes = first_alarm_line[2][2:]
date = datetime.strptime( day + " " + month + " " + year + " " + \
hours+":"+minutes,"%d %m %y %H:%M")
print first_alarm_line
print date, "\n"
print lines[i+1]
print lines[i+4]
print lines[i+5]
desc_line = lines[i+4]
desc_values_line = lines[i+5]
desc = desc_line.split(None,2)
print desc
desc_values = desc_values_line.split(None,2)
rsite = ""
#for x in desc_values[1]:
# if not (x.isalpha() or x == "0"):
# rsite += x
rsite = desc_values[1].lstrip('RBS0')
print "\t"*2 + "rsite:" + rsite
if desc[-1] == "ALARM SLOGAN\n":
alarm_slogan = desc_values[-1]
print alarm_slogan
x = i
print x # to check the line
print len(line) #check length of lines
while not lines[x].startswith("EXTERNAL"):
x+=1
if lines[x].startswith("EXTERNAL"):
while not lines[x] == "\n":
print lines[x]
x+=1
print "\n"*5
elif lines[i+1].startswith("CELL LOGICAL"):
fal = line.split()
first_alarm_line = [fal[0][:2],fal[-2],fal[-1]]
#print i
print first_alarm_line
type = lines[i+1]
print type
cell_line = lines[i+3]
cell = cell_line.split()[0]
print cell
print "\n"*5
##########Database query###########
#db = MySQLdb.connect(host,user,password,database)
firstl()
when i run the program the results are correct
but it prints until line 50672 while there are 51027
and i get the last printed result with the following error:
['A2', '130919', '0309']
2013-09-19 03:09:00
RADIO X-CEIVER ADMINISTRATION
MO RSITE ALARM SLOGAN
RXOCF-18 RBS03668 OML FAULT
['MO', 'RSITE', 'ALARM SLOGAN\n']
rsite:3668
OML FAULT
50672
51027
Traceback (most recent call last):
File "C:\Python27\parser_v3.py", line 106, in <module>
firstl()
File "C:\Python27\parser_v3.py", line 72, in firstl
while not lines[x].startswith("EXTERNAL"):
IndexError: list index out of range
if i comment the while not line i get :
Traceback (most recent call last):
File "C:\Python27\parser_v3.py", line 106, in <module>
firstl()
File "C:\Python27\parser_v3.py", line 60, in firstl
rsite = desc_values[1].lstrip('RBS0')
IndexError: list index out of range
The txt content is like :
A1/EXT "FS G11B/25/13/3" 382 150308 1431
RADIO X-CEIVER ADMINISTRATION
BTS EXTERNAL FAULT
MO RSITE CLASS
RXOCF-16 RBS02190 1
EXTERNAL ALARM
ALARM SYSTEM ON/OFF G2190 DRAMA CNR
A1/EXT "FS G11B/25/13/3" 755 150312 1434
RADIO X-CEIVER ADMINISTRATION
BTS EXTERNAL FAULT
MO RSITE CLASS
RXOCF-113 RBS00674 1
EXTERNAL ALARM
IS.BOAR FAIL G0674 FALAKRO
I don't understand since i do a split with maxnumber 2 and i get 3 elements as u can see and i am picking the 2nd and if i comment that i get another error when i pick an element from a list and the thing is that returning the correct result.Please help me.
Sorry for the long post thank you in advance.
I'm haven't dug deep into your code, but have you tried validating that x does not exceed the number of elements in lines before trying to access that index? Also, for readability I'd suggest using lines[x] != rather than not lines[x] ==
while x < len(lines) and lines[x] != "\n":
I solved it although i don't know if it is correct way but it works.
I think the problem was that the x was exceeding the length of the list lines containing the file and there had to be a check after the split that the list had length larger or equal to number of elements so :
if len(desc_values) > 2 and len(desc) > 2:
rsite = desc_values[1].lstrip('RBS0')
print "\t"*2 + "rsite:" + rsite
if desc[-1] == "ALARM SLOGAN\n":
alarm_slogan = desc_values[-1]
print alarm_slogan
x = i
print x #to check the line
print len(lines) # check length of lines
while [x] < len(lines): #check so that x doesnt exceed the length of file list "line"
while not lines[x].startswith("EXTERNAL"):
x+=1
if lines[x].startswith("EXTERNAL"):
while lines[x] != "\n":
print lines[x]
x+=1
Thank you man you really helped me although i am trying to find a way to stop the iteration of x to gain some computation time i tried break but it throws you completely of the loop.
Thanks anyway
For example I have a list with a month and the month number, I want to add in another value in [0][1], [1][1],[2][1] every time I input a new value. I'm thinking of the append function however I'm unsure how to re-code that to make that value to input into the the next list along in the list.
list = [[['Jan'],1], [['Feb'],2],[['Mar'],3]]
def month():
global list
count = 0
while value_count < 3:
value = int(input("Enter a value: "))
if value in range(101):
list[0][1].append(value)
count += 1
else:
print('Try Again.')
month()
I want to result with something like:
list = [[['Jan'],1,54], [['Feb'],2,65],[['Mar'],3,62]]
Where 54, 65 and 62 are random numbers a user has entered.
Please see the comments in the code
# list is a function used by python to create lists,
# do not use builtins' names for your variables
my_list = [[['Jan'],1], [['Feb'],2],[['Mar'],3]]
def month():
# print("Don't use globals!\n"*100)
# you can pass arguments to a function, like in "def mont(my_list)"
global my_list
# cycling on a numerical index is complicated, use this syntax
# instead, meaning "use, in turn, the same name to refer to each
# element of the list"
for element in my_list:
value = int(input("Enter a value: "))
# if value in range(101):
if 1: # always true, the test doesn't work as you expect
element.append(value)
else:
print('Try Again.')
month()
print my_list
you can write the input loop like this
value = 101
while not (0<=value<101):
value = int(input(...)
element.append(value)
this way you'll miss the alternative prompt but it's good enough.