summing up a column in a csv file based on user search - python-2.7

I have the following csv file:
data.cvs
school,students,teachers,subs
us-school1,10,2,0
us-school2,20,4,2
uk-school1,10,2,0
de-school1,10,3,1
de-school1,15,3,3
I am trying to have a user search for the school country (us or uk, or de)
and then sum up the corresponding column. (e.g. sum all students in us-* etc.)
So far i am able to search using the raw_input and display column contents corresponding to the country, appreciate if someone can give me some pointers on how i can achive this.
desired output:
Country: us
Total students: 30
Total teachers: 6
Total subs: 2
--
import csv
import re
search = raw_input('Enter school (e.g. us: ')
with open('data.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
school = row['school']
students = row['students']
teachers = row['teachers']
sub = row['subs']
if re.match(search, schools) is not None:
print students

That's relatively easy to do - all you need is a dict to hold group your countries, and then just add together all of the values:
import collections
import csv
result = {} # store the results
with open("data.csv", "rb") as f: # open our file
reader = csv.DictReader(f) # use csv.DictReader for convenience
for row in reader:
country = row.pop("school")[:2] # get our country
result[country] = result.get(country, collections.defaultdict(int)) # country group
for column in row: # loop through all other columns
result[country][column] += int(row[column]) # add them together
# Now you can use or print your result by country:
for country in result:
print("Country: {}".format(country))
print("Total students: {}".format(result[country].get("students", 0)))
print("Total teachers: {}".format(result[country].get("teachers", 0)))
print("Total subs: {}\n".format(result[country].get("subs", 0)))
This is also universal as you can add additional number columns (e.g. janitors :D) and it will happily sum them together, but keep in mind that it works only with integers (if you want floats, replace the references to int with float) and it expects that every field except school is a number.

Your problem could be solved with something like this:
import csv
search = raw_input('Enter school (e.g. us: ')
with open('data.csv') as csvfile:
reader = csv.DictReader(csvfile)
result_countrys = {}
for row in reader:
students = int(row['students'])
teachers = int(row['teachers'])
subs = int(row['subs'])
subs = row['subs']
country = school[: 2]
if country in result_countrys:
count = result_countrys[country]
count['students'] = count['students'] + students
count['teachers'] = count['teachers'] + teachers
count['subs'] = count['subs'] + subs
else :
dic = {}
dic['students'] = students
dic['teachers'] = teachers
dic['subs'] = subs
result_countrys[country] = dic
for k, v in result_countrys[search].iteritems():
print("country " + str(search) + " has " + str(v) + " " + str(k))
I tryed out with this set of values:
reader = [{'school': 'us-school1', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'us-school2', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'uk-school1', 'students': 20, 'teachers': 6, 'subs': 2}]
and the result is:
Enter school (e.g. us): us
country us has 30 students
country us has 6 teachers
country us has 2 subs

Related

List to Dictionary - multiple values to key

I am very new to coding and seeking guidance on below...
I have a csv output currently like this:
'Age, First Name, Last Name, Mark'
'21, John, Smith, 68'
'16, Alex, Jones, 52'
'42, Michael, Carpenter, 92 '
How do I create a dictionary that will end up looking like this:
dictionary = {('age' : 'First Name', 'Mark'), ('21' : 'John', '68'), etc}
I would like the first value to be the key - and only want two other values, and I'm having difficulty finding ways to approach this.
So far I've got
data = open('test.csv', 'r').read().split('\n')
I've tried to split each part into a string
for row in data:
x = row.split(',')
EDIT:
Thank you for those who have gave some input into solving my problem.
So after using
myDic = {}
for row in data:
tmpLst = row.split(",")
key = tmpLst[0]
value = (tmpLst[1], tmpLst[-1])
myDic[key] = value
my data came out as
['Age', 'First Name', 'Last Name', 'Mark']
['21', 'John', 'Smith', '68']
['16', 'Alex', 'Jones', '52']
['42', 'Michael', 'Carpenter', '92']
But get an IndexError: list index out of range at the line
value = (tmpLst[1], tmpLst[-1])
even though I can see that it should be within the range of the index.
Does anyone know why this error is coming up or what needs to be changed?
Assuming an actual valid CSV file that looks like this:
Age,First Name,Last Name,Mark
21,John,Smith,68
16,Alex,Jones,52
42,Michael,Carpenter,92
the following code should do what you want:
from __future__ import print_function
import csv
with open('test.csv') as csv_file:
reader = csv.reader(csv_file)
d = { row[0]: (row[1], row[3]) for row in reader }
print(d)
# Output:
# {'Age': ('First Name', 'Mark'), '16': ('Alex', '52'), '21': ('John', '68'), '42': ('Michael', '92')}
If d = { row[0]: (row[1], row[3]) for row in reader } is confusing, consider this alternative:
d = {}
for row in reader:
d[row[0]] = (row[1], row[3])
I guess you want output like this:
dictionary = {'age' : ('First Name', 'Mark')}
Then you can use the following code:
myDic = {}
for row in data:
tmpLst = row.split(",")
key = tmpLst[0]
value = (tmpLst[1], tmpLst[-1])
myDic[key] = value

Assigning variable from csv file based on header using python

I have a csv file that has a single record in it that I need to assign variables to and run through a python script. My date looks like the line below. Has header.
"CompanyName","Contact","Street","CityZip","Store","DateRec","apples","appQuan","oranges","orgQuan","peaches","peaQuan","pumpkins","pumQuan","Receive",0
American Grocers","Allison Smith","456 1st. Street","Podunk, California 00990","Store 135 Order","05/14/2015",1,10,0,4,1,4,2,0
Each value needs to be assigned a variable
1st position, "American Grocers" = CompanyName
2nd position, "Allison Smith" = Contact
3rd position = Street, etc.
After the date it gets tricky. The last 11 values are related to each other and get saved to a key.
If value 7 = 1, then variable 7 = "apples" and variable 8 = 10, else skip values 7 and 8 and go to 9
If value 9 = 1, then variable 9 = "oranges" and value 10 = the variable in position 10 (4), else skip values 9 and 10 and go to 11
If value 11 = 1, then variable 11 = "peaches" and value 12 = the variable in position 10 (4), else skip values 11 and 12 and go to 13
If value 13 = 1, then variable 13 = "pumkins" and value 13 = the variable in position 13 (2), else skip values 13 and 14
If value 15 = 1 then variable 15 = "Delivery", else variable = "Pick up"
thus python would assign the following:
CompanyName = "American Grocers"
Contact = "Allison Smith"
Street = "456 1st. Street"
CityZip = "Podunk, California 00990"
Store = "Store 135 Order"
OrderDate (does not need to be date type) = "05/14/2015"
orderList = {"apples" : 10, "peaches" : 4, "pumpkins" : 2}
Recieve = "Pick up"
I need to manipulate these variables further along in the script.
I have the following code to which outputs the data to its corresponding header information.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
opened_file = open(raw_file)
csv_data = csv.reader(opened_file, delimiter=delimiter)
parsed_data = []
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields, row)))
opened_file.close()
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
Output looks like this. (I am not sure why the output is not in the same order as the file ...)
[{'DateRec': '05/14/2015', 'orgQuan': '4', 'CompanyName': 'American Grocers', 'appQuan': '10', 'peaQuan': '4', 'oranges': '0', 'peaches': '1', 'Contact': 'Allison Smith', 'CityZip': 'Podunk, California 00990', 'pumpkins': '2', 'apples': '1', 'pumQuan': '0', 'Store': 'Store 135 Order', 'Street': '456 1st. Street'}]
I do not know how to take this and get the variables assigned as listed above. Suggestions? Using python 2.7
I am not sure why the output is not in the same order as the file ...
In a Python dictionary, entries are displayed an arbitrary order.
Below is the general contour of how to parse the program. The detailed logic: "if this field is this do this otherwise do that" is something that I hope you can do on your own. The specifics are way too long, detailed, and specific to be of interest of value for anyone else and I'm guessing that is why there hasn't been much interest in this.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
parsed_data = []
with open(raw_file) as opened_file:
rec = {}
csv_data = csv.reader(opened_file, delimiter=delimiter)
fields = csv_data.next()
for row in csv_data:
for i, val in enumerate(row[0:6]):
rec[fields[i]] = val
# This part below is too specific, long, and complicated
# that it is doubtful filling this out in detail will be use
# or interest to anyone else on stackoverflow. But to give
# you an idea of how to proceed...
if row[6] == '1':
rec[fields[6]] = 'apples'
rec[fields[7]] = 10
else:
# continue
pass
# ...
parsed_data.append(rec)
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
I finally accomplished what I was looking for with the following code: Thank you to all who offered help. Your ideas spurred me onto the tangents I needed.
import csv
MY_FILE = csv.reader(open("C:\\tests\\DataRequestsData\\qryFruit.csv", "rb"))
for row in MY_FILE:
CompanyName, Contact, Street, CityZip, Store, DateRec, apples, appQuan, oranges, orgQuan, peaches, peaQuan, pumpkins, pumQuan, Receive = row
s='{'
if apples == "1":
s = s + '"apples"' + ":" + appQuan
if oranges == "1":
s = s + '", "oranges"' + ":" + orgQuan
if peaches == "1":
s = s + '", "peaches"' + ":" + peaQuan
if pumpkins == "1":
s = s + '", "pumpkins"' + ":" + pumQuan
s = s + '}'
if Receive == "0":
Receive = "Pick up"
else:
Receive = "Deliver"

Printing Results from Loops

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here

Importing and analysing text data using Python 2.7

I have created code in Python 2.7 which saves sales data for various products into a text file using the write() method. My limited Python skills have hit the wall with the next step - I need code which can read this data from the text file and then calculate and display the mean average number of sales of each item. The data is stored in the text file like the data shown below (but I am able to format it differently if that would help).
Product A,30
Product B,26
Product C,4
Product A,40
Product B,18
Product A,31
Product B,13
Product C,3
After far too long Googling around this to no avail, any pointers on the best way to manage this would be greatly appreciated. Thanks in advance.
You can read from the file, then split each line by a space (' '). Then, it is just a matter of creating a dictionary, and appending each new item to a list which is the value for each letter key, then using sum and len to get the average.
Example
products = {}
with open("myfile.txt") as product_info:
data = product_info.read().split('\n') #Split by line
for item in data:
_temp = item.split(' ')[1].split(',')
if _temp[0] not in products.keys():
products[_temp[0]] = [_temp[1]]
else:
products[_temp[0]] = products[_temp[0]]+[_temp[1]]
product_list = [[item, float(sum(key))/len(key)] for item, key in d.items()]
product_list.sort(key=lambda x:x[0])
for item in product_list:
print 'The average of {} is {}'.format(item[0], item[1])
from __future__ import division
dict1 = {}
dict2 = {}
file1 = open("input.txt",'r')
for line in file1:
if len(line)>2:
data = line.split(",")
a,b = data[0].strip(),data[1].strip()
if a in dict1:
dict1[a] = dict1[a] + int(b)
else:
dict1[a] = int(b)
if a in dict2:
dict2[a] = dict2[a] + 1
else:
dict2[a] = 1
for k,v in dict1.items():
for m,n in dict2.items():
if k == m:
avg = float(v/n)
print "%s Average is: %0.6f"%(k,float(avg))
Output:
Product A Average is: 33.666667
Product B Average is: 19.000000
Product C Average is: 3.500000

Adding 2 list inside a dictionnary

I've been trying to add the number of 2 list inside a dictionnary. The thing is, I need to verify if the value in the selected row and column is already in the dictionnary, if so I want to add the double entry list to the value (another double entry list) already existing in the dictionnary. I'm using a excel spreadsheet + xlrd so i can read it up. I' pretty new to this.
For exemple, the code is checking the account (a number) in the specified row and columns, let's say the value is 10, then if it's not in the dictionnary, it add the 2 values corresponding to this count, let's say [100, 0] as a value to this key. This is working as intended.
Now, the hard part is when the account number is already in the dictionnary. Let's say its the second entry for the account number 10. and it's [50, 20]. I want the value associated to the key "10" to be [150, 20].
I've tried the zip method but it seems to return radomn result, Sometimes it adds up, sometime it doesn't.
import xlrd
book = xlrd.open_workbook("Entry.xls")
print ("The number of worksheets is", book.nsheets)
print ("Worksheet name(s):", book.sheet_names())
sh = book.sheet_by_index(0)
print (sh.name,"Number of rows", sh.nrows,"Number of cols", sh.ncols)
liste_compte = {}
for rx in range(4, 10):
if (sh.cell_value(rowx=rx, colx=4)) not in liste_compte:
liste_compte[((sh.cell_value(rowx=rx, colx=4)))] = [sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)]
elif (sh.cell_value(rowx=rx, colx=4)) in liste_compte:
three = [x + y for x, y in zip(liste_compte[sh.cell_value(rowx=rx, colx=4)],[sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)])]
liste_compte[(sh.cell_value(rowx=rx, colx=4))] = three
print (liste_compte)
I'm not going to directly untangle your code, but just help you with a general example that does what you want:
def update_balance(existing_balance, new_balance):
for column in range(len(existing_balance)):
existing_balance[column] += new_balance[column]
def update_account(accounts, account_number, new_balance):
if account_number in accounts:
update_balance(existing_balance = accounts[account_number], new_balance = new_balance)
else:
accounts[account_number] = new_balance
And finally you'd do something like (assuming your xls looks like [account_number, balance 1, balance 2]:
accounts = dict()
for row in xls:
update_account(accounts = accounts,
account_number = row[0],
new_balance = row[1:2])