List to Dictionary - multiple values to key - python-2.7

I am very new to coding and seeking guidance on below...
I have a csv output currently like this:
'Age, First Name, Last Name, Mark'
'21, John, Smith, 68'
'16, Alex, Jones, 52'
'42, Michael, Carpenter, 92 '
How do I create a dictionary that will end up looking like this:
dictionary = {('age' : 'First Name', 'Mark'), ('21' : 'John', '68'), etc}
I would like the first value to be the key - and only want two other values, and I'm having difficulty finding ways to approach this.
So far I've got
data = open('test.csv', 'r').read().split('\n')
I've tried to split each part into a string
for row in data:
x = row.split(',')
EDIT:
Thank you for those who have gave some input into solving my problem.
So after using
myDic = {}
for row in data:
tmpLst = row.split(",")
key = tmpLst[0]
value = (tmpLst[1], tmpLst[-1])
myDic[key] = value
my data came out as
['Age', 'First Name', 'Last Name', 'Mark']
['21', 'John', 'Smith', '68']
['16', 'Alex', 'Jones', '52']
['42', 'Michael', 'Carpenter', '92']
But get an IndexError: list index out of range at the line
value = (tmpLst[1], tmpLst[-1])
even though I can see that it should be within the range of the index.
Does anyone know why this error is coming up or what needs to be changed?

Assuming an actual valid CSV file that looks like this:
Age,First Name,Last Name,Mark
21,John,Smith,68
16,Alex,Jones,52
42,Michael,Carpenter,92
the following code should do what you want:
from __future__ import print_function
import csv
with open('test.csv') as csv_file:
reader = csv.reader(csv_file)
d = { row[0]: (row[1], row[3]) for row in reader }
print(d)
# Output:
# {'Age': ('First Name', 'Mark'), '16': ('Alex', '52'), '21': ('John', '68'), '42': ('Michael', '92')}
If d = { row[0]: (row[1], row[3]) for row in reader } is confusing, consider this alternative:
d = {}
for row in reader:
d[row[0]] = (row[1], row[3])

I guess you want output like this:
dictionary = {'age' : ('First Name', 'Mark')}
Then you can use the following code:
myDic = {}
for row in data:
tmpLst = row.split(",")
key = tmpLst[0]
value = (tmpLst[1], tmpLst[-1])
myDic[key] = value

Related

summing up a column in a csv file based on user search

I have the following csv file:
data.cvs
school,students,teachers,subs
us-school1,10,2,0
us-school2,20,4,2
uk-school1,10,2,0
de-school1,10,3,1
de-school1,15,3,3
I am trying to have a user search for the school country (us or uk, or de)
and then sum up the corresponding column. (e.g. sum all students in us-* etc.)
So far i am able to search using the raw_input and display column contents corresponding to the country, appreciate if someone can give me some pointers on how i can achive this.
desired output:
Country: us
Total students: 30
Total teachers: 6
Total subs: 2
--
import csv
import re
search = raw_input('Enter school (e.g. us: ')
with open('data.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
school = row['school']
students = row['students']
teachers = row['teachers']
sub = row['subs']
if re.match(search, schools) is not None:
print students
That's relatively easy to do - all you need is a dict to hold group your countries, and then just add together all of the values:
import collections
import csv
result = {} # store the results
with open("data.csv", "rb") as f: # open our file
reader = csv.DictReader(f) # use csv.DictReader for convenience
for row in reader:
country = row.pop("school")[:2] # get our country
result[country] = result.get(country, collections.defaultdict(int)) # country group
for column in row: # loop through all other columns
result[country][column] += int(row[column]) # add them together
# Now you can use or print your result by country:
for country in result:
print("Country: {}".format(country))
print("Total students: {}".format(result[country].get("students", 0)))
print("Total teachers: {}".format(result[country].get("teachers", 0)))
print("Total subs: {}\n".format(result[country].get("subs", 0)))
This is also universal as you can add additional number columns (e.g. janitors :D) and it will happily sum them together, but keep in mind that it works only with integers (if you want floats, replace the references to int with float) and it expects that every field except school is a number.
Your problem could be solved with something like this:
import csv
search = raw_input('Enter school (e.g. us: ')
with open('data.csv') as csvfile:
reader = csv.DictReader(csvfile)
result_countrys = {}
for row in reader:
students = int(row['students'])
teachers = int(row['teachers'])
subs = int(row['subs'])
subs = row['subs']
country = school[: 2]
if country in result_countrys:
count = result_countrys[country]
count['students'] = count['students'] + students
count['teachers'] = count['teachers'] + teachers
count['subs'] = count['subs'] + subs
else :
dic = {}
dic['students'] = students
dic['teachers'] = teachers
dic['subs'] = subs
result_countrys[country] = dic
for k, v in result_countrys[search].iteritems():
print("country " + str(search) + " has " + str(v) + " " + str(k))
I tryed out with this set of values:
reader = [{'school': 'us-school1', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'us-school2', 'students': 20, 'teachers': 6, 'subs': 2}, {'school': 'uk-school1', 'students': 20, 'teachers': 6, 'subs': 2}]
and the result is:
Enter school (e.g. us): us
country us has 30 students
country us has 6 teachers
country us has 2 subs

using csv with python 2.7, working with DictReader

I've been trying to wrap my head around this for a while now. I am trying to take a csv file, extract all rows, concantenate 2 values, take those 2 values to calculate distance from a third value separate of the csv, store the distance with the correct data from the csv, finally I need to find the shortest distance and return a dict with all the values i have not used yet.
with open(filename,'r') as csvfile:
reader = csv.DictReader(csvfile)
#create a multi-dimensional dictionary with the store name as keyword
new_dict = {}
try:
for row in reader:
new_dict[row['name']] ={}
new_dict[row['name']]['name'] = row['name']
new_dict[row['name']]['dist'] = {}
new_dict[row['name']]['address'] = row['address']
new_dict[row['name']]['city'] = row['city']
new_dict[row['name']]['state'] = row['state']
new_dict[row['name']]['zip'] = row['zip']
latt = str(row['latitude'])
longi = str(row['longitude'])
#concantenate latt and longi for use in grate_circle distance calculation
pharm_loc = latt + ','+ longi
#add distance from usr_loc for each store to dict for each store
new_dict[row['name']]['dist'] = str(calc_dist(usr_loc, store_loc))
I finally got this part fixed, now I need help filtering out all but the closest result for 'dist'... I cannot seem to wrap my head around this for some reason. Any help would be greatly appreciated.
---EDIT---
updated code that is working now. this produces a multidimensional dict as follows...
{'CONTINENTAL ': {'city': 'TOPEKA', 'dist': '50.3131329882', 'name': 'CONTINENTAL PHARMACY LLC', 'zip': '66603', 'state': 'KS', 'address': '821 SW 6TH AVE'}, 'DILLON ': {'city': 'TOPEKA', 'dist': '48.3573823197', 'name': 'DILLON PHARMACY', 'zip': '66605', 'state': 'KS', 'address': '2010 SE 29TH ST'}}
There are a lot more entries in the dict, I just need to filter down to the closest location and return only the values for that location.

Assigning variable from csv file based on header using python

I have a csv file that has a single record in it that I need to assign variables to and run through a python script. My date looks like the line below. Has header.
"CompanyName","Contact","Street","CityZip","Store","DateRec","apples","appQuan","oranges","orgQuan","peaches","peaQuan","pumpkins","pumQuan","Receive",0
American Grocers","Allison Smith","456 1st. Street","Podunk, California 00990","Store 135 Order","05/14/2015",1,10,0,4,1,4,2,0
Each value needs to be assigned a variable
1st position, "American Grocers" = CompanyName
2nd position, "Allison Smith" = Contact
3rd position = Street, etc.
After the date it gets tricky. The last 11 values are related to each other and get saved to a key.
If value 7 = 1, then variable 7 = "apples" and variable 8 = 10, else skip values 7 and 8 and go to 9
If value 9 = 1, then variable 9 = "oranges" and value 10 = the variable in position 10 (4), else skip values 9 and 10 and go to 11
If value 11 = 1, then variable 11 = "peaches" and value 12 = the variable in position 10 (4), else skip values 11 and 12 and go to 13
If value 13 = 1, then variable 13 = "pumkins" and value 13 = the variable in position 13 (2), else skip values 13 and 14
If value 15 = 1 then variable 15 = "Delivery", else variable = "Pick up"
thus python would assign the following:
CompanyName = "American Grocers"
Contact = "Allison Smith"
Street = "456 1st. Street"
CityZip = "Podunk, California 00990"
Store = "Store 135 Order"
OrderDate (does not need to be date type) = "05/14/2015"
orderList = {"apples" : 10, "peaches" : 4, "pumpkins" : 2}
Recieve = "Pick up"
I need to manipulate these variables further along in the script.
I have the following code to which outputs the data to its corresponding header information.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
opened_file = open(raw_file)
csv_data = csv.reader(opened_file, delimiter=delimiter)
parsed_data = []
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields, row)))
opened_file.close()
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
Output looks like this. (I am not sure why the output is not in the same order as the file ...)
[{'DateRec': '05/14/2015', 'orgQuan': '4', 'CompanyName': 'American Grocers', 'appQuan': '10', 'peaQuan': '4', 'oranges': '0', 'peaches': '1', 'Contact': 'Allison Smith', 'CityZip': 'Podunk, California 00990', 'pumpkins': '2', 'apples': '1', 'pumQuan': '0', 'Store': 'Store 135 Order', 'Street': '456 1st. Street'}]
I do not know how to take this and get the variables assigned as listed above. Suggestions? Using python 2.7
I am not sure why the output is not in the same order as the file ...
In a Python dictionary, entries are displayed an arbitrary order.
Below is the general contour of how to parse the program. The detailed logic: "if this field is this do this otherwise do that" is something that I hope you can do on your own. The specifics are way too long, detailed, and specific to be of interest of value for anyone else and I'm guessing that is why there hasn't been much interest in this.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
parsed_data = []
with open(raw_file) as opened_file:
rec = {}
csv_data = csv.reader(opened_file, delimiter=delimiter)
fields = csv_data.next()
for row in csv_data:
for i, val in enumerate(row[0:6]):
rec[fields[i]] = val
# This part below is too specific, long, and complicated
# that it is doubtful filling this out in detail will be use
# or interest to anyone else on stackoverflow. But to give
# you an idea of how to proceed...
if row[6] == '1':
rec[fields[6]] = 'apples'
rec[fields[7]] = 10
else:
# continue
pass
# ...
parsed_data.append(rec)
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
I finally accomplished what I was looking for with the following code: Thank you to all who offered help. Your ideas spurred me onto the tangents I needed.
import csv
MY_FILE = csv.reader(open("C:\\tests\\DataRequestsData\\qryFruit.csv", "rb"))
for row in MY_FILE:
CompanyName, Contact, Street, CityZip, Store, DateRec, apples, appQuan, oranges, orgQuan, peaches, peaQuan, pumpkins, pumQuan, Receive = row
s='{'
if apples == "1":
s = s + '"apples"' + ":" + appQuan
if oranges == "1":
s = s + '", "oranges"' + ":" + orgQuan
if peaches == "1":
s = s + '", "peaches"' + ":" + peaQuan
if pumpkins == "1":
s = s + '", "pumpkins"' + ":" + pumQuan
s = s + '}'
if Receive == "0":
Receive = "Pick up"
else:
Receive = "Deliver"

Multidict in python not working ?? How to create it?

I want to create an python multi dimensional dictionary :-
Currently i am doing like this
multidict = {}
IN LOOP
mulitdict[i] = data
if loop runs ten times I am getting same value in all index..
Eg:
I want to have like this
multidict {0 : {'name':name1, 'age' : age1}, 1: {'name':name2, 'age' : age2}
but i am getting as shown below
multidict {0 : {'name':name1, 'age' : age1}, 1: {'name':name1, 'age' : age1}
I also tried the default dict also....but every time i get same value in all index. What is the problem?
Tried code :
csv_parsed_data2 = {}
with open('1112.txt') as infile:
i =0
for lineraw in infile:
line = lineraw.strip()
if 'sample1 ' in line:
### TO GET SOURCE ROUTER NAME ###
data['sample1'] = line[8:]
elif 'sample2- ' in line:
### TO GET DESTINATION ROUTER NAME ###
data['sample2'] = line[13:]
elif 'sample3' in line:
### TO GET MIN,MAX,MEAN AND STD VALUES ###
min_value = line.replace("ms"," ")
min_data = min_value.split(" ")
data['sample3'] = min_data[1]
csv_parsed_data2[i] = data
i = i + 1
print i,'::',csv_parsed_data2,'--------------'
print csv_parsed_data2,' all index has same value'
any efficient way to do this??
It sounds you are assigning the same data dict to each of the values of your outer multidict, and just modifying the values it holds on each pass through the loop. This will result in all the values appearing the same, with the values from the last pass through the loop.
You probably need to make sure that you create a separate dictionary object to hold the data from each value. A crude fix might be to replace multidict[i] = data with multidict[i] = dict(data), but if you know how data is created, you can probably do something more elegant.
Edit: Seeing your code, here's a way to fix the issue:
csv_parsed_data2 = {}
with open('1112.txt') as infile:
i =0
data = {} # start with empty data dict
for lineraw in infile:
line = lineraw.strip()
if 'sample1 ' in line:
### TO GET SOURCE ROUTER NAME ###
data['sample1'] = line[8:]
elif 'sample2- ' in line:
### TO GET DESTINATION ROUTER NAME ###
data['sample2'] = line[13:]
elif 'sample3' in line:
### TO GET MIN,MAX,MEAN AND STD VALUES ###
min_value = line.replace("ms"," ")
min_data = min_value.split(" ")
data['sample3'] = min_data[1]
csv_parsed_data2[i] = data
data = {} # after saving a reference to the dict, reinitialize it
i = i + 1
print i,'::',csv_parsed_data2,'--------------'
print csv_parsed_data2,' all index has same value'
To understand what was going on, consider this simpler situation, where I a values in a dictionary after saving a reference to it when it had some older values:
my_dict = { "foo": "bar" }
some_ref = my_dict
print some_ref["foo"] # prints "bar"
my_dict["foo"] = "baz"
print some_ref["foo"] # prints "baz", since my_dict and some_ref refer to the same object
print some_ref is d # prints "True", confirming that fact
In your code, my_dict was data and some_ref were all the values of csv_parsed_data2. They would all end up being references to the same object, which would hold whatever the last values assigned to data were.
Try this:
multidict = {}
for j in range(10):
s = {}
s['name'] = raw_input()
s['age'] = input()
multidict[j] = s
This will have the desired result

How do I create a dictionary mapping strings to sets given a list and a tuple of tuples?

I am trying to create a dictionary from a list and tuple of tuples as illustrated below. I have to reverse map the tuples to the list and create a set of non-None column names.
Any suggestions on a pythonic way to achieve the solution (desired dictionary) is much appreciated.
MySQL table 'StateLog':
Name NY TX NJ
Amy 1 None 1
Kat None 1 1
Leo None None 1
Python code :
## Fetching data from MySQL table
#cursor.execute("select * from statelog")
#mydataset = cursor.fetchall()
## Fetching column names for mapping
#state_cols = [fieldname[0] for fieldname in cursor.description]
state_cols = ['Name', 'NY', 'TX', 'NJ']
mydataset = (('Amy', '1', None, '1'), ('Kat', None, '1', '1'), ('Leo', None, None, '1'))
temp = [zip(state_cols, each) for each in mydataset]
# Looks like I can't do a tuple comprehension for the following snippet : finallist = ((eachone[1], eachone[0]) for each in temp for eachone in each if eachone[1] if eachone[0] == 'Name')
for each in temp:
for eachone in each:
if eachone[1]:
if eachone[0] == 'Name':
k = eachone[1]
print k, eachone[0]
print '''How do I get a dictionary in this format'''
print '''name_state = {"Amy": set(["NY", "NJ"]),
"Kat": set(["TX", "NJ"]),
"Leo": set(["NJ"])}'''
Output so far :
Amy Name
Amy NY
Amy NJ
Kat Name
Kat TX
Kat NJ
Leo Name
Leo NJ
Desired dictionary :
name_state = {"Amy": set(["NY", "NJ"]),
"Kat": set(["TX", "NJ"]),
"Leo": set(["NJ"])}
To be really honest, I would say your problem is that your code is becoming too cumbersome. Resist the temptation of "one-lining" it and create a function. Everything will become way easier!
mydataset = (
('Amy', '1', None, '1'),
('Kat', None, '1', '1'),
('Leo', None, None, '1')
)
def states(cols, data):
"""
This function receives one of the tuples with data and returns a pair
where the first element is the name from the tuple, and the second
element is a set with all matched states. Well, at least *I* think
it is more readable :)
"""
name = data[0]
states = set(state for state, value in zip(cols, data) if value == '1')
return name, states
pairs = (states(state_cols, data) for data in mydataset)
# Since dicts can receive an iterator which yields pairs where the first one
# will become a key and the second one will become the value, I just pass
# a list with all pairs to the dict constructor.
print dict(pairs)
The result is:
{'Amy': set(['NY', 'NJ']), 'Leo': set(['NJ']), 'Kat': set(['NJ', 'TX'])}
Looks like another job for defaultdict!
So lets create our default dict
name_state = collections.defaultdict(set)
We now have a dictionary that has sets as all default values, we can now do something like this
name_state['Amy'].add('NY')
Moving on you just need to iterate over your object and add to each name the right states. Enjoy
You can do this as a dictionary comprehension (Python 2.7+):
from itertools import compress
name_state = {data[0]: set(compress(state_cols[1:], data[1:])) for data in mydataset}
or as a generator expression:
name_state = dict((data[0], set(compress(state_cols[1:], data[1:]))) for data in mydataset)