IndexError, but more likely I/O error - python-2.7

Unsure of why I am getting this error. I'm reading from a file called columns_unsorted.txt, then trying to write to columns_unsorted.txt. There error is on fan_on = string_j[1], saying list index out of range. Here's my code:
#!/usr/bin/python
import fileinput
import collections
# open document to record results into
j = open('./columns_unsorted.txt', 'r')
# note this is a file of rows of space-delimited date in the format <1384055277275353 0 0 0 1 0 0 0 0 22:47:57> on each row, the first term being unix times, the last human time, the middle binary indicating which machine event happened
# open document to read from
l = open('./columns_sorted.txt', 'w')
# CREATE ARRAY CALLED EVENTS
events = collections.deque()
i = 1
# FILL ARRAY WITH "FACTS" ROWS; SPLIT INTO FIELDS, CHANGE TYPES AS APPROPRIATE
for line in j: # columns_unsorted
line = line.rstrip('\n')
string_j = line.split(' ')
time = str(string_j[0])
fan_on = int(string_j[1])
fan_off = int(string_j[2])
heater_on = int(string_j[3])
heater_off = int(string_j[4])
space_on = int(string_j[5])
space_off = int(string_j[6])
pump_on = int(string_j[7])
pump_off = int(string_j[8])
event_time = str(string_j[9])
row = time, fan_on, fan_off, heater_on, heater_off, space_on, space_off, pump_on, pump_off, event_time
events.append(row)

You are missing the readlines function, no?
You have to do:
j = open('./columns_unsorted.txt', 'r')
l = j.readlines()
for line in l:
# what you want to do with each line
In the future, you should print some of your variables, just to be sure the code is working as you want it to, and to help you identifying problems.
(for example, if in your code you would print string_j you would see what kind of problem you have)

Problem was an inconsistent line in the data file. Forgive my haste in posting

Related

Memory Error when exporting data to csv file

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)
In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.

Creating a if/else that appends data from mult. scraped pages if counts differ?

I"m trying to scrape Oregon teacher licensure information that looks like this or this(this is publicly available data)
This is my code:
for t in range(0,2): #Refers to txt file with ids
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
for license_row in tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]"):
license_data = license_row.xpath(".//td/text()")
count = count + 1
if count==1:
ltest1.append(license_data)
if count==2:
ltest2.append(license_data)
if count==3:
ltest3.append(license_data)
with open('teacher_lic.csv', 'wb') as pensionfile:
writer = csv.writer(pensionfile, delimiter="," )
writer.writerow(["Name", "Lic1", "Lic2", "Lic3"])
pen = zip(lname, ltest1, ltest2, ltest3)
for penlist in pen:
writer.writerow(list(penlist))
The problem occurs when this happens: teacher A has 13 licenses and Teacher B has 2. In A my total count = 13 and B = 2. When I get to Teacher B and count equal to 3, I want to say, "if count==3 then ltest3.append(licensure_data) else if count==3 and license_data=='' then license3.append('')" but since there's no count==3 in B there's no way to tell it to append an empty set.
I'd want the output to look like this:
Is there a way to do this? I might be approaching this completely wrong so if someone can point me in another direction, that would be helpful as well.
There's probably a more elegant way to do this but this managed to work pretty well.
I created some blank spaces to fill in when Teacher A has 13 licenses and Teacher B has 2. There were some errors that resulted when the license_row.xpath got to the count==3 in Teacher B. I exploited these errors to create the ltest3.append('').
for t in range(0, 2): #Each txt file contains differing amounts
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
test = tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]")
difference = 15 - len(test)
for i in range(0, difference):
test.append('')
for license_row in test:
count = count + 1
try:
license_data = license_row.xpath(".//td/text()")
except NameError:
license_data = ''
if license_data=='' and count==1:
ltest1.append('')
if license_data=='' and count==2:
ltest2.append('')
if license_data=='' and count==3:
ltest3.append('')
except AttributeError:
license_data = ''
if count==1 and True:
print "True"
if count==1:
ltest1.append(license_data)
if count==2 and True:
print "True"
if count==2:
ltest2.append(license_data)
if count==3 and True:
print "True"
if count==3:
ltest3.append(license_data)
del license_data
for endorse_row in tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr"):
endorse_data = endorse_row.xpath(".//td/text()")
lendorse1.append(endorse_data)

Creating a Dictionary from a while loop (Python)

How can I create a Dictionary from my while loop below? infile reads from a file called input, which has the following content:
min:1,23,62,256
max:24,672,5,23
sum:22,14,2,3,89
P90:23,30,45.23
P70:12,23,24,57,32
infile = open("input.txt", "r")
answers = open("output.txt", "w")
while True:
line = infile.readline()
if not line: break
opType = line[0:3]
numList = (line[4:len(line)])
numList = numList.split(',')
What I'm trying to do is basically 2 lists, one that has the operation name (opType) and the other that has the numbers. From there I want to create a dictionary that looks like this
myDictionary = {
'min': 1,23,62,256,
'max': 24,672,5,23,
'avg': 22,14,2,3,89,
'P90': 23,30,45.23,
'P70': 12,23,24,57,32,
}
The reason for this is that I need to call the operation type to a self-made function, which will then carry out the operation. I'll figure this part out. I currently just need help making the dictionary from the while loop.
I'm using python 2.7
Try the following code.
I believe, you would need the 'sum' also in the dictionary. If not, just add a condition to remove it.
myDictionary = {}
with open('input.txt','r') as f:
for line in f:
x = line.split(':')[1].rstrip().split(',')
for i in xrange(len(x)):
try:
x[i] = int(x[i])
except ValueError:
x[i] = float(x[i])
myDictionary[line.split(':')[0]] = x
print myDictionary

out of bounds error when using a list as an index

I have two files: one is a single column (call it pred) and has no headers, the other has two columns: ID and IsClick (it has headers). My goal is to use the column ID as an index to pred.
import pandas as pd
import numpy as np
def LinesInFile(path):
with open(path) as f:
for linecount, line in enumerate(f):
pass
f.close()
print 'Found ' + str(linecount) + ' lines'
return linecount
path ='/Users/mas/Documents/workspace/Avito/input/' # path to testing file
submission = path + 'submission1234.csv'
lines = LinesInFile(submission)
lines = LinesInFile(path + 'sampleSubmission.csv')
sample = pd.read_csv(path + 'sampleSubmission.csv')
preds = np.array(pd.read_csv(submission, header = None))
index = sample.ID.values - 1
print index
print len(index)
sample['IsClick'] = preds[index]
sample.to_csv('submission.csv', index=False)
The output is:
Found 7816360 lines
Found 7816361 lines
[ 0 4 5 ..., 15961507 15961508 15961511]
7816361
Traceback (most recent call last):
File "/Users/mas/Documents/workspace/Avito/July3b.py", line 23, in <module>
sample['IsClick'] = preds[index]
IndexError: index 7816362 is out of bounds for axis 0 with size 7816361
there seems something wrong because my file has 7816361 lines counting the header while my list has an extra element (len of list 7816361)
I don't have your csv files to recreate the problem, but the problem looks like it is being caused by your use of index.
index = sample.ID.values - 1 is taking each of your sample ID's and subtracting 1. These are not index values in pred as it is only 7816360 long. Each of the last 3 items in your index array (based on your print output) would go out of bounds as they are >7816360. I suspect the error is showing you the first of your ID-1 that go out of bounds.
Assuming you just want to join the files based on their line number you could do the following:
sample=pd.concat((pd.read_csv(path + 'sampleSubmission.csv'),pd.read_csv(submission, header = None).rename(columns={0:'IsClick'})),axis=1)
Otherwise you'll need to perform a join or merge on your two dataframes.

Count the number of files in a folder that have certain strings

I have a folder with 200 files. Each file has data like
VISITERM_90 VISITERM_0 VISITERM_34 ..... etc.
Each file does not have the same elements. So, I would like to count the number of files which have the elements from VISITERM_0 to VISITERM_99. That is I should get my output as:
VISITERM_0 200
VISTERM_1 140
VISITERM_2 150
and so on depending upon the numbers of files that has the specified elements. I want to run it in a loop from VISITERM_0 till VISITERM_99 and for each element I need to find the number of files.
My code is:
import os
vt = 'VISITERM_'
no = 0
while no < 10:
for doc in os.listdir('/home/krupa/Krupa/Mirellas_Image_Annotation_Data/Test/sample_codes/Files'):
doc2 = '/home/krupa/Krupa/Mirellas_Image_Annotation_Data/Test/sample_codes/Files/' + doc
c = vt + (repr(no))
with open (doc2, 'r') as inF:
for line in inF:
if c in line:
print c, doc2
else:
print "DOES NOT EXIST" , c, doc2
no = no + 1
This code is printing me each visiterm and each of the file that has it. I just want the VISITERMS_* and their corresponding number of files. Please help!
My python skills are a bit rusty, so bear with me. I think that you need a way to store the values while looping, I'll use a dictionary. This is not the complete solution, but it can help you figure out what you need to do:
dict={}
for doc in os.listdir('..'):
doc2 = '..'
with open (doc2, 'r') as inF:
for line in inF:
while no < 10:
c = vt + (repr(no))
if c in line:
numberOfElements = 0
if dict.has_key(c):
numberOfElements = dict[c]
numberOfElements += 1
else:
numberOfElements = 1
dict[c] = numberOfElements
no += 1
for key in dict.keys():
print key, dict[key]