How to use user-defined functions in python? - python-2.7

I'm just learning python so be gentle. I want to read a file and that to be one function and then have another function work on what the previous function "read". I am having trouble passing the result on one function to another. Here is what I have no far:
I want to call read_file more than once and to be able to pass its result to more than one function, therefore, I do not want frame to be a global variable. How do I get read_file to pass 'frame' to cost_channelID directly? Perhaps, for cost_channelID to call read_file?
def read_file():
user_input = raw_input("please put date needed in x.xx form: ")
path = r'C:\\Users\\CP\\documents\\' + user_input
allFiles = glob.glob(path + '/*.csv')
frame = pd.DataFrame()
list = []
for file in allFiles:
df = pd.read_csv(file,index_col=None,header=0)
list.append(df)
frame =pd.concat(list,ignore_index=True)
def cost_channelID():
numbers =r'[0,1,2,3,4,5,6,7,8,9]'
Ads = frame['Ad']
ID = []
for ad in Ads:
num = ''.join(re.findall(numbers,ad)[1:7])
ID.append(num)
ID = pd.Series(ID)
pieces = [frame,ID]
frame2 = pd.concat(pieces,ignore_index=True,axis=1)
frame2 = frame2.rename(columns={0:'Ad',1:'Ad Impressions',2:'Total Ad Spend',3:'eCPM (Total Ad Spend/Ad)',4:'Ad Attempts',5:'ID'})
Any and all help is greatly appreciated!

Here is your modified code (comments in uppercase for easier finding, not rudeness):
def read_file():
user_input = raw_input("please put date needed in x.xx form: ")
path = r'C:\\Users\\CP\\documents\\' + user_input
allFiles = glob.glob(path + '/*.csv')
frame = pd.DataFrame()
list = []
for file in allFiles:
df = pd.read_csv(file,index_col=None,header=0)
list.append(df)
frame =pd.concat(list,ignore_index=True)
return(frame) #YOUR FUNCTION WILL BE RETURNING THE READ FRAME
def cost_channelID(read_frame): #YOU WILL BE RECEIVING A DIFFERENT FRAME EVERY TIME
numbers =r'[0,1,2,3,4,5,6,7,8,9]'
Ads = read_frame['Ad']
ID = []
for ad in Ads:
num = ''.join(re.findall(numbers,ad)[1:7])
ID.append(num)
ID = pd.Series(ID)
pieces = [frame,ID]
frame2 = pd.concat(pieces,ignore_index=True,axis=1)
frame2 = frame2.rename(columns={0:'Ad',1:'Ad Impressions',2:'Total Ad Spend',3:'eCPM (Total Ad Spend/Ad)',4:'Ad Attempts',5:'ID'})
#HERE YOU MEANT TO USE FRAME2 WITHIN THE FUNCTION? IT IS WHAT HAPPENING BECAUSE OF THE INDENTATION
frame1 = read_file() #YOU CAN READ AS MANY FRAMES AS YOU WANT AND THEY'LL BE KEPT IN SEPARATED FRAMES
frame2 = read_file()
#...framexx = read_file()
#AND YOU CAN JUST CALL cost_channelID in any of them (or any other function)
const_channelID(frame1)
const_channelID(frame2)
#... AND SO ON

If you are trying to pass frame from one function to the other, you need to declare it outside the scope of the function. Otherwise we need more information about what you are trying to accomplish.
frame = None
def read_file():
user_input = raw_input("please put date needed in x.xx form: ")
path = r'C:\\Users\\CP\\documents\\' + user_input
allFiles = glob.glob(path + '/*.csv')
frame = pd.DataFrame()
list = []
for file in allFiles:
df = pd.read_csv(file,index_col=None,header=0)
list.append(df)
frame =pd.concat(list,ignore_index=True)
def cost_channelID():
numbers =r'[0,1,2,3,4,5,6,7,8,9]'
Ads = frame['Ad']
ID = []
for ad in Ads:
num = ''.join(re.findall(numbers,ad)[1:7])
ID.append(num)
ID = pd.Series(ID)
pieces = [frame,ID]
frame2 = pd.concat(pieces,ignore_index=True,axis=1)
frame2 = frame2.rename(columns={0:'Ad',1:'Ad Impressions',2:'Total Ad Spend',3:'eCPM (Total Ad Spend/Ad)',4:'Ad Attempts',5:'ID'})
Here is a good reference for scope rules in python
Short Description of the Scoping Rules?

Related

Memory Error when exporting data to csv file

Hello I was hoping someone could help me with my college coursework, I have an issue with my code. I keep running into a memory error with my data export.
Is there any way I can reduce the memory that is being used or is there a different approach I can take?
For the course work I am given a file of 300 records about customer orders from a CSV file and then I have to export the Friday records to a new CSV file. Also I am required to print the most popular method for customer's orders and the total money raised from the orders but I have an easy plan for that.
This is my first time working with CSV so I'm not sure how to do it. When I run the program it tends to crash instantly or stop responding. Once it appeared with 'MEMORY ERROR' however that is all it appeared with. I'm using a college provided computer so I am not sure on the exact specs but I know it runs 4GB of memory.
defining count occurences predefined function
def countOccurences(target,array):
counter = 0
for element in array:
if element == target:
counter= counter + 1
print counter
return counter
creating user defined functions for the program
dataInput function used for collecting data from provided file
def dataInput():
import csv
recordArray = []
customerArray = []
f = open('E:\Portable Python 2.7.6.1\Choral Shield Data File(CSV).csv')
csv_f = csv.reader(f)
for row in csv_f:
customerArray.append(row[0])
ticketID = row[1]
day, area = datasplit(ticketID)
customerArray.append(day)
customerArray.append(area)
customerArray.append(row[2])
customerArray.append(row[3])
recordArray.append(customerArray)
f.close
return recordArray
def datasplit(variable):
day = variable[0]
area = variable[1]
return day,area
def dataProcessing(recordArray):
methodArray = []
wed_thursCost = 5
friCost = 10
record = 0
while record < 300:
method = recordArray[record][4]
methodArray.append(method)
record = record+1
school = countOccurences('S',methodArray)
website = countOccurences('W',methodArray)
if school > website:
school = True
elif school < website:
website = True
dayArray = []
record = 0
while record < 300:
day = recordArray[record][1]
dayArray.append(day)
record = record + 1
fridays = countOccurences('F',dayArray)
wednesdays = countOccurences('W',dayArray)
thursdays = countOccurences('T', dayArray)
totalFriCost = fridays * friCost
totalWedCost = wednesdays * wed_thursCost
totalThurCost = thursdays * wed_thursCost
totalCost = totalFriCost + totalWedCost + totalThurCost
return totalCost,school,website
My first attempt to writing to a csv file
def dataExport(recordArray):
import csv
fridayRecords = []
record = 0
customerIDArray = []
ticketIDArray = []
numberArray = []
methodArray = []
record = 0
while record < 300:
if recordArray[record][1] == 'F':
fridayRecords.append(recordArray[record])
record = record + 1
with open('\Courswork output.csv',"wb") as f:
writer = csv.writer(f)
for record in fridayRecords:
writer.writerows(fridayRecords)
f.close
My second attempt at writing to the CSV file
def write_file(recordArray): # write selected records to a new csv file
CustomerID = []
TicketID = []
Number = []
Method = []
counter = 0
while counter < 300:
if recordArray[counter][2] == 'F':
CustomerID.append(recordArray[counter][0])
TicketID.append(recordArray[counter][1]+recordArray[counter[2]])
Number.append(recordArray[counter][3])
Method.append(recordArray[counter][4])
fridayRecords = [] # a list to contain the lists before writing to file
for x in range(len(CustomerID)):
one_record = CustomerID[x],TicketID[x],Number[x],Method[x]
fridayRecords.append(one_record)
#open file for writing
with open("sample_output.csv", "wb") as f:
#create the csv writer object
writer = csv.writer(f)
#write one row (item) of data at a time
writer.writerows(recordArray)
f.close
counter = counter + 1
#Main Program
recordArray = dataInput()
totalCost,school,website = dataProcessing(recordArray)
write_file(recordArray)
In the function write_file(recordArray) in your second attempt the counter variable counter in the first while loop is never updated so the loop continues for ever until you run out of memory.

How do I fill a column with a user inputted value?

I want to add a date column to data exported to csv file. Luckily the files are identified by the date they represent. However, I can't get the column to fill up with the appropriate user inputted value. Here's the code I have so far:
def read_file():
user_input = raw_input("Please put cost folder with date in form Costmm.dd: ")
path = r'C:\\Users\\CP\\documents\\' + user_input
allFiles = glob.glob(path + '/*.csv')
frame = pd.DataFrame()
frame['Date'] = pd.Series()
frame['Date'] = frame['Date'].astype(object).fillna(user_input)
list = []
for file in allFiles:
df = pd.read_csv(file,index_col=None,header=0)
list.append(df)
frame =pd.concat(list,ignore_index=True)
frame['Date'] = pd.Series()
return(frame)
Any and all help is greatly appreciated!
If I understand what you're after the following should work:
import datetime as dt
def read_file():
user_input = raw_input("Please put cost folder with date in form Costmm.dd: ")
path = r'C:\\Users\\CP\\documents\\' + user_input
allFiles = glob.glob(path + '/*.csv')
l = []
for file in allFiles:
df = pd.read_csv(file,index_col=None,header=0)
l.append(df)
frame =pd.concat(l,ignore_index=True)
frame['Date'] = dt.datetime.strptime(user_input, '%m.%d')
return frame
A few notes:
frame = pd.DataFrame()
frame['Date'] = pd.Series()
frame['Date'] = frame['Date'].astype(object).fillna(user_input)
The above was redundant as you overwrite this with the result of pd.concat anyway.
Don't user list for a variable name of type list use something like df_list.
To convert a string to datetime you can use dt.datetime.strptime and pass the format string.

''str' object does not support item assignment' python 2

So this is my code it is strying tomease car registration plates, start times and end times (In the complete code it would be printed at the bottom).
data = str(list)
sdata = str(list)
edata = str(list)
current = 0
repeats = input ('How many cars do you want to measure?')
def main():
global current
print (current)
print ''
print ''
print '---------------------------------------'
print '---------------------------------------'
print 'Enter the registration number.'
data[current] = raw_input(' ')
print 'Enter the time it passed Camera 1. In this form HH:MM:SS'
sdata[current] = raw_input(' ')
print 'Enter the time it passed Camera 2. In this form HH:MM:SS'
edata[current] = raw_input (' ')
print '---------------------------------------'
print''
print''
print''
print 'The Registration Number is :'
print data[current]
print''
print 'The Start Time Is:'
print sdata[current]
print''
print 'The End Time Is:'
print edata[current]
print''
print''
raw_input('Press enter to confirm.')
print'---------------------------------------'
d = d + 1
s = s + 1
a = a + 1
current = current = 1
while current < repeats:
main()
When I run it and it gets to:
data[current] = raw_input(' ')
I get the error message 'TypeError: 'str' object does not support item assignment'
Thank you in advance for the help. :D
The error is clear. str object does not support item assignment
Strings in python are immutable. You have converted the data to a string when you do
data = str(list)
So, by
data["current"] = raw_input()
you are trying to assign some value to a string, which is not supported in python.
If you want data to be a list,
data = list()
or
data = []
will help, thus preventing the error
Dont use str during assignment
data = str(list)
sdata = str(list)
edata = str(list)
Instead use
data = []
sdata = []
edata = []
and later while printing use str if u want
print str(data[current])
as aswin said its immutable so dont complex it

Printing Results from Loops

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here

Attempting to Obtain Taxonomic Information from Biopython

I am attempting to alter a previous script that utilizes biopython to fetch information about a species phylum. This script was written to retrieve information one species at a time. I would like to modify the script so that I can do this for 100 organisms at a time.
Here is the initial code
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(" ", "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
taxid = get_tax_id("Erodium carvifolium")
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
I have managed to modify the script so that it accepts a local file that contains one of the organisms I am using. But I need to extend this to a 100 organisms.
So the idea was to generate a list from the file of my organisms and somehow separately fed each item generated from the list into the line taxid = get_tax_id("Erodium carvifolium") and replace "Erodium carvifolium" with my organisms name. But I have no idea how to do that.
Here is the sample version of the code with some of my adjustments
import sys
from Bio import Entrez
def get_tax_id(species):
"""to get data from ncbi taxomomy, we need to have the taxid. we can
get that by passing the species name to esearch, which will return
the tax id"""
species = species.replace(' ', "+").strip()
search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml")
record = Entrez.read(search)
return record['IdList'][0]
def get_tax_data(taxid):
"""once we have the taxid, we can fetch the record"""
search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml")
return Entrez.read(search)
Entrez.email = ""
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
i = iter(list)
item = i.next()
for item in list:
???
taxid = get_tax_id(?)
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in
data[0]['LineageEx'] if d['Rank'] in ['phylum']}
print lineage, taxid
The question marks refer to places where I am stumped as what to do next. I don't see how I can connect my loop to replace the ? in get_tax_id(?). Or do I need to somehow append each of the items in the list so that they are modified each time to contain get_tax_id(Helicobacter pylori 26695) and then find some way to place them in the line containing taxid =
Here's what you need, place this below your function definitions, i.e. after the line that says: sys.exit(2)
species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304']
taxid_list = [] # Initiate the lists to store the data to be parsed in
data_list = []
lineage_list = []
print('parsing taxonomic data...') # message declaring the parser has begun
for species in species_list:
print ('\t'+species) # progress messages
taxid = get_tax_id(species) # Apply your functions
data = get_tax_data(taxid)
lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']}
taxid_list.append(taxid) # Append the data to lists already initiated
data_list.append(data)
lineage_list.append(lineage)
print('complete!')