Parsing txt file using python : List index out of range

Parsing txt file using python : List index out of range - python-2.7

Hello I have written a python program to parse data specific data from txt file
my code is:
f = open('C:/Users/aikaterini/Desktop/Ericsson_PARSER/BSC_alarms_vf_OSS.txt','r')
from datetime import datetime
import MySQLdb
def firstl():
with f as lines:
lines = lines.readlines()
print len(lines)
for i,line in enumerate(lines):
if line.startswith("*** Disconnected from"):
conline = line.split()
bsc = conline[-2]
print "\n"*5
print bsc
print "*"*70
break
for i,line in enumerate(lines):
if line.startswith("*** Connected to"):
conline = line.split()
bsc = conline[-2]
print "\n"*5
print bsc
print "*"*70
elif line[:3] == "A1/" or line[:3] == "A2/":
if lines[i+1].startswith("RADIO"):
fal = line.split()
first_alarm_line = [fal[0][:2],fal[-2],fal[-1]]
year = first_alarm_line[1][:2]
month = first_alarm_line[1][2:4]
day = first_alarm_line[1][4:]
hours = first_alarm_line[2][:2]
minutes = first_alarm_line[2][2:]
date = datetime.strptime( day + " " + month + " " + year + " " + \
hours+":"+minutes,"%d %m %y %H:%M")
print first_alarm_line
print date, "\n"
print lines[i+1]
print lines[i+4]
print lines[i+5]
desc_line = lines[i+4]
desc_values_line = lines[i+5]
desc = desc_line.split(None,2)
print desc
desc_values = desc_values_line.split(None,2)
rsite = ""
#for x in desc_values[1]:
# if not (x.isalpha() or x == "0"):
# rsite += x
rsite = desc_values[1].lstrip('RBS0')
print "\t"*2 + "rsite:" + rsite
if desc[-1] == "ALARM SLOGAN\n":
alarm_slogan = desc_values[-1]
print alarm_slogan
x = i
print x # to check the line
print len(line) #check length of lines
while not lines[x].startswith("EXTERNAL"):
x+=1
if lines[x].startswith("EXTERNAL"):
while not lines[x] == "\n":
print lines[x]
x+=1
print "\n"*5
elif lines[i+1].startswith("CELL LOGICAL"):
fal = line.split()
first_alarm_line = [fal[0][:2],fal[-2],fal[-1]]
#print i
print first_alarm_line
type = lines[i+1]
print type
cell_line = lines[i+3]
cell = cell_line.split()[0]
print cell
print "\n"*5
##########Database query###########
#db = MySQLdb.connect(host,user,password,database)
firstl()
when i run the program the results are correct
but it prints until line 50672 while there are 51027
and i get the last printed result with the following error:
['A2', '130919', '0309']
2013-09-19 03:09:00
RADIO X-CEIVER ADMINISTRATION
MO RSITE ALARM SLOGAN
RXOCF-18 RBS03668 OML FAULT
['MO', 'RSITE', 'ALARM SLOGAN\n']
rsite:3668
OML FAULT
50672
51027
Traceback (most recent call last):
File "C:\Python27\parser_v3.py", line 106, in <module>
firstl()
File "C:\Python27\parser_v3.py", line 72, in firstl
while not lines[x].startswith("EXTERNAL"):
IndexError: list index out of range
if i comment the while not line i get :
Traceback (most recent call last):
File "C:\Python27\parser_v3.py", line 106, in <module>
firstl()
File "C:\Python27\parser_v3.py", line 60, in firstl
rsite = desc_values[1].lstrip('RBS0')
IndexError: list index out of range
The txt content is like :
A1/EXT "FS G11B/25/13/3" 382 150308 1431
RADIO X-CEIVER ADMINISTRATION
BTS EXTERNAL FAULT
MO RSITE CLASS
RXOCF-16 RBS02190 1
EXTERNAL ALARM
ALARM SYSTEM ON/OFF G2190 DRAMA CNR
A1/EXT "FS G11B/25/13/3" 755 150312 1434
RADIO X-CEIVER ADMINISTRATION
BTS EXTERNAL FAULT
MO RSITE CLASS
RXOCF-113 RBS00674 1
EXTERNAL ALARM
IS.BOAR FAIL G0674 FALAKRO
I don't understand since i do a split with maxnumber 2 and i get 3 elements as u can see and i am picking the 2nd and if i comment that i get another error when i pick an element from a list and the thing is that returning the correct result.Please help me.
Sorry for the long post thank you in advance.

I'm haven't dug deep into your code, but have you tried validating that x does not exceed the number of elements in lines before trying to access that index? Also, for readability I'd suggest using lines[x] != rather than not lines[x] ==
while x < len(lines) and lines[x] != "\n":

I solved it although i don't know if it is correct way but it works.
I think the problem was that the x was exceeding the length of the list lines containing the file and there had to be a check after the split that the list had length larger or equal to number of elements so :
if len(desc_values) > 2 and len(desc) > 2:
rsite = desc_values[1].lstrip('RBS0')
print "\t"*2 + "rsite:" + rsite
if desc[-1] == "ALARM SLOGAN\n":
alarm_slogan = desc_values[-1]
print alarm_slogan
x = i
print x #to check the line
print len(lines) # check length of lines
while [x] < len(lines): #check so that x doesnt exceed the length of file list "line"
while not lines[x].startswith("EXTERNAL"):
x+=1
if lines[x].startswith("EXTERNAL"):
while lines[x] != "\n":
print lines[x]
x+=1
Thank you man you really helped me although i am trying to find a way to stop the iteration of x to gain some computation time i tried break but it throws you completely of the loop.
Thanks anyway

Related

Q: Python3 - If Statements for changing list lengths

I am attempting to analyze data sets as lists of differing lengths. I am calling lines (rows) of my data set one by one to be analyzed by my function. I want the function to still be run properly regardless of the length of the list.
My Code:
f = open('DataSet.txt')
for line in iter(f):
remove_blanks = ['']
entries = line.split()
''.join([i for i in entries if i not in remove_blanks])
trash = (entries[0], entries[1])
time = int(entries[2])
column = [int(v) for v in entries[3:]]
def myFun():
print(entries)
print_string = ''
if column[0] == 100:
if column[1] >= 250 and column[2] == 300:
if len(column) >= 9:
digit = [chr(x) for x in column[4:9]]
print_string = ('code: ' + ''.join(str(digit[l]) for l in range(5)) + ' ')
if len(column) >= 13:
optional_digit = [chr(d) for d in column[9:13]]
for m in range(0, 4):
print_string += 'Optional Field: ' + optional_digit[m] + ''
else:
print_string += 'No Optional Field '
pass
pass
print(print_string)
print('')
myFun()
f.close()
What is happening is if the length of a line of my data is not long enough (i.e. the list ends at column[6]), I get the error:
line 17, in function
print('Code: ' + digit[l])
IndexError: list index out of range
I want it to still print Code: #number #number #number #number and leave any non-existent columns as blanks when it is printed so that one line may print as Code: ABC9 and the next print as Code: AB if there are differing list lengths.
Please help! :)

Well, just make sure you're not looping over a list longer than available:
print_string = 'code: ' + ''.join(str(digit[l]) for l in range(min(5, len(digit)))) + ' '
or better:
print_string = "code {} ".format("".join(str(dig) for dig in digit[:5]))
Although I have a feeling you're over-complicating this.

Using Interval tree to find overlapping regions

I have two files
File 1
chr1:4847593-4847993
TGCCGGAGGGGTTTCGATGGAACTCGTAGCA
File 2
Pbsn|X|75083240|75098962|
TTTACTACTTAGTAACACAGTAAGCTAAACAACCAGTGCCATGGTAGGCTTGAGTCAGCT
CTTTCAGGTTCATGTCCATCAAAGATCTACATCTCTCCCCTGGTAGCTTAAGAGAAGCCA
TGGTGGTTGGTATTTCCTACTGCCAGACAGCTGGTTGTTAAGTGAATATTTTGAAGTCC
File 1 has approximately 8000 more lines with different header and sequence below it.
I would first like to match the start and end co ordinates from file1 to file 2 or see if its close to each other let say by +- 100 if yes then match the sequence in file 2 and then print out the header info for file 2 and the matched sequence.
My approach use interval tree(in python i am still trying to get a hang of it), store the co ordinates ?
I tried using re.match but its not giving me accurate results.
Any tips would be highly appreciated.
Thanks.
My first try,
How ever now i have hit another road block so for my second second file if my start and end is 5000 and 8000 respectively I want to change this by subtracting 2000 so my new start and stop is 3000 and 5000 here is my code
from intervaltree import IntervalTree
from collections import defaultdict
binding_factor = some.txt
genome = dict()
with open('file2', 'r') as rows:
for row in rows:
#print row
if row.startswith('>'):
row = row.strip().split('|')
chrom_name = row[5]
start = int[row[3]
end = int(row[3])
# one interval tree per chromosome
if chrom_name not in genome:
genome[chrom_name] = IntervalTree()
# first time we've encountered this chromosome, createtree
# index the feature
genome[chrom_name].addi(start,end,row[2])
#for key,value in genome.iteritems():
#print key, ":", value
mast = defaultdict(list)
with open(file1', 'r') as f:
for row in f:
row = row.strip().split()
row[0] = row[0].replace('chr', '') if row[0].startswith('chr') else row[0]
row[0] = 'MT' if row[0] == 'M' else row[0]
#print row[0]
mast[row[0]].append({
'start':int(row[1]),
'end':int(row[2])
})
#for k,v in mast.iteritems():
#print k, ":", v
with open(binding_factor, 'w') as f :
for k,v in mast.iteritems():
for i in v:
g = genome[k].search(i['start'],i['end'])
if g:
print g
l = gene
f.write(str(l)`enter code here` + '\n')

Why did the if statement in the loop not work?

I was scraping Yahoo finance and somehow the if condition did not work. What the if was supposed to do was to print the stock price if the stock price was greater than 50, but it printed all the stocks which were above 50 and below 50.
Here is the code:
import urllib2
from bs4 import BeautifulSoup as bs4
list = ["aapl","goog","yhoo"]
i = 0
while i < len(list):
url = urllib2.urlopen("http://finance.yahoo.com/q?s="+ list[i] +"&q1=1")
soup = bs4(url,"html.parser")
for price in soup.find(attrs={'id':"yfs_l84_" + list[i]}):
if price > 50:
print price
i += 1
else:
print "failed"
1 += 1
Why did it print the stock "yahoo", cause "yahoo" is less than 50, no?

We can rewrite code in following:
1 += 1 will not work because LHS should be variable name. This is i += 1 . this might be typing mistake. :)
No need of i variable, we can iterate list by for loop. This will remove our i += 1 statements from the code.
Do not use in-built variable names as our variable names. e.g. list is list type variable used to create new list. If we use such variable name then this will create problem in code.
e.g.
>>> list
<type 'list'>
>>> a = list()
>>> a
[]
>>> list = [1,2,3,4]
>>> a = list()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
>>>
Use exception handling during Type Casting means when we convert one data type to other data type.
Demo:
import urllib2
from bs4 import BeautifulSoup as bs4
item_list = ["aapl","goog","yhoo"]
target_url = "http://finance.yahoo.com/q?s=%s&q1=1"
target_id = "yfs_l84_%s"
for i in item_list:
url = urllib2.urlopen(target_url%i)
soup = bs4(url, "html.parser")
for price in soup.find(attrs={'id':target_id%i}):
try:
price = float(price)
except:
print "Exception during type conversion. Value is %s. Type is %s."%(price, type(price))
continue
if price > 50:
print "Price:", price
else:
print "failed"

Seems to me you've mixed types:
if price > 50:
TypeError: unorderable types: NavigableString() > int()
Use this:
if float(price) > 50:
print price

price is string, you have to convert it to number:
if int(price) > 50:
print price
i += 1
else:
print "failed"
i += 1
or (with fractional part):
if float(price) > 50:
print price
i += 1
else:
print "failed"
i += 1

out of bounds error when using a list as an index

I have two files: one is a single column (call it pred) and has no headers, the other has two columns: ID and IsClick (it has headers). My goal is to use the column ID as an index to pred.
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
The output is:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
there seems something wrong because my file has 7816361 lines counting the header while my list has an extra element (len of list 7816361)

I don't have your csv files to recreate the problem, but the problem looks like it is being caused by your use of index.
index = sample.ID.values - 1 is taking each of your sample ID's and subtracting 1. These are not index values in pred as it is only 7816360 long. Each of the last 3 items in your index array (based on your print output) would go out of bounds as they are >7816360. I suspect the error is showing you the first of your ID-1 that go out of bounds.
Assuming you just want to join the files based on their line number you could do the following:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
Otherwise you'll need to perform a join or merge on your two dataframes.

''str' object does not support item assignment' python 2

So this is my code it is strying tomease car registration plates, start times and end times (In the complete code it would be printed at the bottom).
data = str(list)
sdata = str(list)
edata = str(list)
current = 0
repeats = input ('How many cars do you want to measure?')
def main():
global current
print (current)
print ''
print ''
print '---------------------------------------'
print '---------------------------------------'
print 'Enter the registration number.'
data[current] = raw_input(' ')
print 'Enter the time it passed Camera 1. In this form HH:MM:SS'
sdata[current] = raw_input(' ')
print 'Enter the time it passed Camera 2. In this form HH:MM:SS'
edata[current] = raw_input (' ')
print '---------------------------------------'
print''
print''
print''
print 'The Registration Number is :'
print data[current]
print''
print 'The Start Time Is:'
print sdata[current]
print''
print 'The End Time Is:'
print edata[current]
print''
print''
raw_input('Press enter to confirm.')
print'---------------------------------------'
d = d + 1
s = s + 1
a = a + 1
current = current = 1
while current < repeats:
main()
When I run it and it gets to:
data[current] = raw_input(' ')
I get the error message 'TypeError: 'str' object does not support item assignment'
Thank you in advance for the help. :D

The error is clear. str object does not support item assignment
Strings in python are immutable. You have converted the data to a string when you do
data = str(list)
So, by
data["current"] = raw_input()
you are trying to assign some value to a string, which is not supported in python.
If you want data to be a list,
data = list()
or
data = []
will help, thus preventing the error

Dont use str during assignment
data = str(list)
sdata = str(list)
edata = str(list)
Instead use
data = []
sdata = []
edata = []
and later while printing use str if u want
print str(data[current])
as aswin said its immutable so dont complex it

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parsing txt file using python : List index out of range - python-2.7

I'm haven't dug deep into your code, but have you tried validating that x does not exceed the number of elements in lines before trying to access that index? Also, for readability I'd suggest using lines[x] != rather than not lines[x] == while x < len(lines) and lines[x] != "\n":

Related

Q: Python3 - If Statements for changing list lengths

Using Interval tree to find overlapping regions

Why did the if statement in the loop not work?

out of bounds error when using a list as an index

''str' object does not support item assignment' python 2

Categories

Resources