Memory Error when exporting data to csv file - python-2.7

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)

In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.

Related

Creating a if/else that appends data from mult. scraped pages if counts differ?

I"m trying to scrape Oregon teacher licensure information that looks like this or this(this is publicly available data)
This is my code:
for t in range(0,2): #Refers to txt file with ids
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
for license_row in tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]"):
license_data = license_row.xpath(".//td/text()")
count = count + 1
if count==1:
ltest1.append(license_data)
if count==2:
ltest2.append(license_data)
if count==3:
ltest3.append(license_data)
with open('teacher_lic.csv', 'wb') as pensionfile:
writer = csv.writer(pensionfile, delimiter="," )
writer.writerow(["Name", "Lic1", "Lic2", "Lic3"])
pen = zip(lname, ltest1, ltest2, ltest3)
for penlist in pen:
writer.writerow(list(penlist))
The problem occurs when this happens: teacher A has 13 licenses and Teacher B has 2. In A my total count = 13 and B = 2. When I get to Teacher B and count equal to 3, I want to say, "if count==3 then ltest3.append(licensure_data) else if count==3 and license_data=='' then license3.append('')" but since there's no count==3 in B there's no way to tell it to append an empty set.
I'd want the output to look like this:
Is there a way to do this? I might be approaching this completely wrong so if someone can point me in another direction, that would be helpful as well.
There's probably a more elegant way to do this but this managed to work pretty well.
I created some blank spaces to fill in when Teacher A has 13 licenses and Teacher B has 2. There were some errors that resulted when the license_row.xpath got to the count==3 in Teacher B. I exploited these errors to create the ltest3.append('').
for t in range(0, 2): #Each txt file contains differing amounts
address = 'http://www.tspc.oregon.gov/lookup_application/LDisplay_Individual.asp?id=' + lines2[t]
page = requests.get(address)
tree = html.fromstring(page.text)
count = 0
test = tree.xpath(".//tr[td[1] = 'License Type']/following-sibling::tr[1]")
difference = 15 - len(test)
for i in range(0, difference):
test.append('')
for license_row in test:
count = count + 1
try:
license_data = license_row.xpath(".//td/text()")
except NameError:
license_data = ''
if license_data=='' and count==1:
ltest1.append('')
if license_data=='' and count==2:
ltest2.append('')
if license_data=='' and count==3:
ltest3.append('')
except AttributeError:
license_data = ''
if count==1 and True:
print "True"
if count==1:
ltest1.append(license_data)
if count==2 and True:
print "True"
if count==2:
ltest2.append(license_data)
if count==3 and True:
print "True"
if count==3:
ltest3.append(license_data)
del license_data
for endorse_row in tree.xpath(".//tr[td = 'Endorsements']/following-sibling::tr"):
endorse_data = endorse_row.xpath(".//td/text()")
lendorse1.append(endorse_data)

loop doesn't iterate over all the csv file read, python2, pycharm

here is my code:
import csv
inp1 = raw_input('Enter your Hijjri year:')
intinp1 = int(inp1)
majmouaopen = open('Majmoua.csv')
majmouaread = csv.reader(majmouaopen)
majmouaread.next()
mabsoutaopen = open('Mabsouta.csv')
mabsoutaread = csv.reader(mabsoutaopen)
mabsoutaread.next()
hijrimiladimonthsopened = open('MiladiHijrimonths.csv')
hijrimiladimonthsread = csv.reader(hijrimiladimonthsopened)
yearslist = []
years = []
yearssection = []
monthssection = []
minutessection = []
def miladifromhijri(intinp1):#, inp2, intinp3):
fulyear = intinp1 - 1
n = 0
for row in majmouaread:
print row
introw = int(row[0])
if introw <= fulyear:
n += 1
years.append(introw)
continue
if n == len(years):
near = years[::-1][0]
nearlessyear = near
break
for row in majmouaread:
print row
my problem is with the last loop, it doesn't print all of the majmouaread files. for the first loop, which is the same, it does print all of the csv file rows.
What is causing the probblem, is it something in the code? or something happened to the csv file read? It looks fine with first loop?

Printing Results from Loops

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here

IndexError, but more likely I/O error

Unsure of why I am getting this error. I'm reading from a file called columns_unsorted.txt, then trying to write to columns_unsorted.txt. There error is on fan_on = string_j[1], saying list index out of range. Here's my code:
#!/usr/bin/python
import fileinput
import collections
# open document to record results into
j = open('./columns_unsorted.txt', 'r')
# note this is a file of rows of space-delimited date in the format <1384055277275353 0 0 0 1 0 0 0 0 22:47:57> on each row, the first term being unix times, the last human time, the middle binary indicating which machine event happened
# open document to read from
l = open('./columns_sorted.txt', 'w')
# CREATE ARRAY CALLED EVENTS
events = collections.deque()
i = 1
# FILL ARRAY WITH "FACTS" ROWS; SPLIT INTO FIELDS, CHANGE TYPES AS APPROPRIATE
for line in j: # columns_unsorted
line = line.rstrip('\n')
string_j = line.split(' ')
time = str(string_j[0])
fan_on = int(string_j[1])
fan_off = int(string_j[2])
heater_on = int(string_j[3])
heater_off = int(string_j[4])
space_on = int(string_j[5])
space_off = int(string_j[6])
pump_on = int(string_j[7])
pump_off = int(string_j[8])
event_time = str(string_j[9])
row = time, fan_on, fan_off, heater_on, heater_off, space_on, space_off, pump_on, pump_off, event_time
events.append(row)
You are missing the readlines function, no?
You have to do:
j = open('./columns_unsorted.txt', 'r')
l = j.readlines()
for line in l:
# what you want to do with each line
In the future, you should print some of your variables, just to be sure the code is working as you want it to, and to help you identifying problems.
(for example, if in your code you would print string_j you would see what kind of problem you have)
Problem was an inconsistent line in the data file. Forgive my haste in posting

Attempting to Obtain Taxonomic Information from Biopython

I am attempting to alter a previous script that utilizes biopython to fetch information about a species phylum. This script was written to retrieve information one species at a time. I would like to modify the script so that I can do this for 100 organisms at a time.
Here is the initial code
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(" ", "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
I have managed to modify the script so that it accepts a local file that contains one of the organisms I am using. But I need to extend this to a 100 organisms.
So the idea was to generate a list from the file of my organisms and somehow separately fed each item generated from the list into the line taxid = get_tax_id("Erodium carvifolium") and replace "Erodium carvifolium" with my organisms name. But I have no idea how to do that.
Here is the sample version of the code with some of my adjustments
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(' ', "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid
The question marks refer to places where I am stumped as what to do next. I don't see how I can connect my loop to replace the ? in get_tax_id(?). Or do I need to somehow append each of the items in the list so that they are modified each time to contain get_tax_id(Helicobacter pylori 26695) and then find some way to place them in the line containing taxid =
Here's what you need, place this below your function definitions, i.e. after the line that says: sys.exit(2)
species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
taxid_list = [] # Initiate the lists to store the data to be parsed in
data_list = []
lineage_list = []
print('parsing taxonomic data...') # message declaring the parser has begun
for species in species_list:
print ('\t'+species) # progress messages
taxid = get_tax_id(species) # Apply your functions
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']}
taxid_list.append(taxid) # Append the data to lists already initiated
data_list.append(data)
lineage_list.append(lineage)
print('complete!')