Cannot iterate through two csv files and compare - python-2.7

I'm relatively new to python (2.7) and need help looping through 2 CSV files. The first (outer loop) file is the row I want to write if certain conditions are met with the second (inner loop) file.
import csv
f = open('../CI Working Copy.csv')
with open('../first.csv', 'wb') as n:
theWriter = csv.writer(n)
csv_f = csv.reader(f)
g = open('../second.csv')
csv_g = csv.reader(g)
for row in csv_f:
cbd = row[3]
ced = row[4]
rbd = row[5]
red = row[6]
ciCn = row[10]
for iRow in csv_g:
cn = iRow[0]
startDate = iRow[1]
endDate = iRow[2]
iId = iRow[3]
writeRow = 'false'
if ciCn == cn:
if (cbd == startDate and ced == endDate) or (rbd == startDate and red == endDate):
theWriter.writerow(row)
g.close()
f.close()
It makes it into the second (inner loop) file, but never returns to the outer loop. I only need to write the row from the first file.

For each row of the first csv file, you consume all the second file, so you need to go back on the beginning of the second file on each iteration.
The solution is:
for row in csv_f:
g.seek(0) #go at the start of the second file
for iRow in csv_g:
do_smth(iRow,row)
g.close()

Related

Memory Error when exporting data to csv file

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)
In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.

loop doesn't iterate over all the csv file read, python2, pycharm

here is my code:
import csv
inp1 = raw_input('Enter your Hijjri year:')
intinp1 = int(inp1)
majmouaopen = open('Majmoua.csv')
majmouaread = csv.reader(majmouaopen)
majmouaread.next()
mabsoutaopen = open('Mabsouta.csv')
mabsoutaread = csv.reader(mabsoutaopen)
mabsoutaread.next()
hijrimiladimonthsopened = open('MiladiHijrimonths.csv')
hijrimiladimonthsread = csv.reader(hijrimiladimonthsopened)
yearslist = []
years = []
yearssection = []
monthssection = []
minutessection = []
def miladifromhijri(intinp1):#, inp2, intinp3):
fulyear = intinp1 - 1
n = 0
for row in majmouaread:
print row
introw = int(row[0])
if introw <= fulyear:
n += 1
years.append(introw)
continue
if n == len(years):
near = years[::-1][0]
nearlessyear = near
break
for row in majmouaread:
print row
my problem is with the last loop, it doesn't print all of the majmouaread files. for the first loop, which is the same, it does print all of the csv file rows.
What is causing the probblem, is it something in the code? or something happened to the csv file read? It looks fine with first loop?

Using Interval tree to find overlapping regions

I have two files
File 1
chr1:4847593-4847993
TGCCGGAGGGGTTTCGATGGAACTCGTAGCA
File 2
Pbsn|X|75083240|75098962|
TTTACTACTTAGTAACACAGTAAGCTAAACAACCAGTGCCATGGTAGGCTTGAGTCAGCT
CTTTCAGGTTCATGTCCATCAAAGATCTACATCTCTCCCCTGGTAGCTTAAGAGAAGCCA
TGGTGGTTGGTATTTCCTACTGCCAGACAGCTGGTTGTTAAGTGAATATTTTGAAGTCC
File 1 has approximately 8000 more lines with different header and sequence below it.
I would first like to match the start and end co ordinates from file1 to file 2 or see if its close to each other let say by +- 100 if yes then match the sequence in file 2 and then print out the header info for file 2 and the matched sequence.
My approach use interval tree(in python i am still trying to get a hang of it), store the co ordinates ?
I tried using re.match but its not giving me accurate results.
Any tips would be highly appreciated.
Thanks.
My first try,
How ever now i have hit another road block so for my second second file if my start and end is 5000 and 8000 respectively I want to change this by subtracting 2000 so my new start and stop is 3000 and 5000 here is my code
from intervaltree import IntervalTree
from collections import defaultdict
binding_factor = some.txt
genome = dict()
with open('file2', 'r') as rows:
for row in rows:
#print row
if row.startswith('>'):
row = row.strip().split('|')
chrom_name = row[5]
start = int[row[3]
end = int(row[3])
# one interval tree per chromosome
if chrom_name not in genome:
genome[chrom_name] = IntervalTree()
# first time we've encountered this chromosome, createtree
# index the feature
genome[chrom_name].addi(start,end,row[2])
#for key,value in genome.iteritems():
#print key, ":", value
mast = defaultdict(list)
with open(file1', 'r') as f:
for row in f:
row = row.strip().split()
row[0] = row[0].replace('chr', '') if row[0].startswith('chr') else row[0]
row[0] = 'MT' if row[0] == 'M' else row[0]
#print row[0]
mast[row[0]].append({
'start':int(row[1]),
'end':int(row[2])
})
#for k,v in mast.iteritems():
#print k, ":", v
with open(binding_factor, 'w') as f :
for k,v in mast.iteritems():
for i in v:
g = genome[k].search(i['start'],i['end'])
if g:
print g
l = gene
f.write(str(l)`enter code here` + '\n')

Django CSV file -database upload error

I have a CSV file and I am trying to populate them to a sqlite database. I have no error message and it works perfectly fine but loads only the last line of the file.
MD= MD()
database = options.get('database')
filename = options.get('filename')
dataReader = csv.reader(open(filename))
for row in dataReader:
if row[0] != 'ID':
bb= 1 if row[3] == 'YES' else 0
pro = 'YES' if row[4] == 'Pro' else 'NO'
MD.id = row[0]
MD.mol = row[1]
MD.phase = row[2]
MD.warning = black_box
MD.pro = pro
MD.status = Type.objects.get(description=row[5])
MD.name = row[6]
MD.stem = row[7]
MD.year = row[8]
MD.iname = row[9]
MD.iyear = row[10]
print row[1], row[2],row[3],row[4],row[5],row[6], row[0]
MD.save()
But the print statement prints all the lines in the CSV file. I have no idea what happens. Thanks
You are only creating one MD instance outside of the for loop, then saving to that same instance on each iteration of the loop. You need to create a new MD() instance for every iteration (per line of the file) if you want to create and save a new record for each line. This is why you are only saving the last line - you are over-writing the pre-existing instance you created. Good luck.
You are saving always the same object. Try with this:
Put the MD= MD() inside the for:
... # The same here
for row in dataReader:
if row[0] != 'ID':
MD= MD()
... # The same here
MD.id = row[0]
MD.mol = row[1]
...
MD.save()

Printing Results from Loops

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here