Accessing list of dictionaries within a dictionary in python - python-2.7

So lets see how bad i messed this up
This is what is supposed to look like
Students
1 - MICHAEL JORDAN - 13
2 - JOHN ROSALES - 11
3 - MARK GUILLEN - 11
4 - KB TONEL - 7
Instructors
1 - MICHAEL CHOI - 11
2 - MARTIN PURYEAR - 13
How ever the two extra arrays are throwing me for a loop. I called the for loop for the first one thus having students and instructors. Then I called the keys and value. Could some one please look at this and point me in the right direction to fix this mess
users = {
'Students': [
{'first_name': 'Michael', 'last_name' : 'Jordan'},
{'first_name' : 'John', 'last_name' : 'Rosales'},
{'first_name' : 'Mark', 'last_name' : 'Guillen'},
{'first_name' : 'KB', 'last_name' : 'Tonel'}
],
'Instructors': [
{'first_name' : 'Michael', 'last_name' : 'Choi'},
{'first_name' : 'Martin', 'last_name' : 'Puryear'}
]
}
for i in users:
for i in Students:
print ([i['first_name'], i['last_name']] , + len([i['first_name'], i['last_name']
for i in Instructors:
print ([i['first_name'], i['last_name']] , + len([i['first_name'], i['last_name'0]

Basically, you are trying to override the value of i with your loops, you should avoid doing it, also when you just use for i in users: you are iterating only through the keys of dict users, you can use dict.items() or dict.iteritems() to access both key, value pair in the dictionary.
And here is your updated code, just in case.
for usr, info in users.iteritems():
print "{}".format(usr)
for i, _info in enumerate(info, 1):
name = "{first_name} {last_name}".format(**_info).upper()
print "{} - {} - {}".format(i, name, len(name)-1)
I will leave it to you to understand the code and the builtin functions being used in the code.

Related

I'm getting a SyntaxError | I'm learning from Learn Python The Hard Way

This is my code it's the exact same code as the pdf
states = [
"Oregon":"OR",
"Florida": "FL",
"California": "CA",
"New York": "NY",
"Michigan": "MI"
]
cities = [
"CA": "San Francisco",
"MI": "Detroit",
"FL": "Jacksonville"
]
cities["NY"] = "New York"
cities["OR"] = "Portland"
print "-" * 10
print "NY state has: ", cities["NY"]
print "OR state has: ", cities["OR"]
print "-" * 10
print "Michigan's abbreviation is: ", states["Michigan"]
print "Florida's abbreviation is: ", states["Florida"]
print "-" * 10
print "Michigan has: ", cities[states["Michigan"]]
print "Florida has: ", cities[states["Florida"]]
print "-" * 10
for state, abbrev in states.items():
print "%s is abbreviated %s", % (state, abbrev)
print "-" * 10
for abbrev, city in cities.items():
print "%s has the city %s" % (abbrev, city)
print "-" * 10
for state, abbrev in states.items():
print "%s state is abbreviated %s and has city %s" % (
state, abbrev, cities[abbrev])
print "-" * 10
state = states.get("Texas", None)
if not state:
print "Sorry, no Texas."
city = cities.get("TX", "Does Not Exist")
print "The city for the state 'TX' is: %s" % city
This is my error i put into my terminal python ex39.py and i get this.
File "ex39.py", line 3
"Oregon":"OR",
^
SyntaxError: invalid syntax
i'm running macOS 10.13.6 Beta (17G47b)
MacBook (13-inch, Mid 2010)
Processor 2.4 GHz Intel Core 2 Duo
Memory 8 GB 1067 MHz DDR3
Graphics NVIDIA GeForce 320M 256 MB
So the issue here is when you use brackets [] it makes a list, like [1,2,3,4,5].
While 1-5 in that list are all in the same list, they don't directly interact with each other.
You're looking for a dictionary, which uses curly brackets {}. It takes a array of information, but it takes them in pairs of a key and its value.
So you need this
states = {
"Oregon":"OR",
"Florida": "FL",
"California": "CA",
"New York": "NY",
"Michigan": "MI"
}
cities = {
"CA": "San Francisco",
"MI": "Detroit",
"FL": "Jacksonville"
}
The first of the pair is they key, the second is the value.
Hope this helps! Happy coding!

how to get the series out of a data frame?

fake = {'EmployeeID' : [0,1,2,3,4,5,6,7,8,9],
'State' : ['a','b','c','d','e','f','g','h','i','j'],
'Email' : ['a','b','c','d','e','f','g','h','i','j']
}
fake_df = pd.DataFrame(fake)
I am trying to define a function that returns a Series of strings of all email addresses of employees in states. The email addresses should be separated by a given delimiter. I think I will use ";".
Arguments:
- DataFrame
- delimiter (;)
Do I have to use for loop?? to be honest, I don't even know how to start on this..
====EDITION
After done with coding, I should run
emails = getEmailListByState(fake_df, ", ")
for state in sorted(emails.index):
print "%15s: %s" % (state, emails[state])
and should get something like
a: a
b: b
c: c,d
d: e
e: f,g
as my output
If I understand the problem properly you are looking for groupby state,get the emails and apply join i.e joining the emails based on the state i.e
fake = {'EmployeeID' : [0,1,2,3,4,5,6,7,8,9],
'State' : ['NZ','NZ','NY','NY','ST','ST','YK','YK','YK','YK'],
'Email' : ['ab#h.com','bab#h.com','cab#h.com','dab#h.com','eab#h.com','fab#h.com','gab#h.com','hab#h.com','iab#h.com','jab#h.com']
}
fake_df = pd.DataFrame(fake)
ndf = fake_df.groupby('State')['Email'].apply(', '.join)
Output:
State
NY cab#h.com, dab#h.com
NZ ab#h.com, bab#h.com
ST eab#h.com, fab#h.com
YK gab#h.com, hab#h.com, iab#h.com, jab#h.com
Name: Email, dtype: object
If you want that in a method then
def getEmailListByState(df,delim):
return df.groupby('State')['Email'].apply(delim.join)
emails = getEmailListByState(fake_df, ", ")
for state in sorted(emails.index):
print( "%15s: %s" % (state, emails[state])

extra commas when using read_csv causing too many "s in data frame

I'm trying to read in a large file (~8Gb) using pandas read_csv. In one of the columns in the data, there is sometimes a list which includes commas but it enclosed by curly brackets e.g.
"label1","label2","label3","label4","label5"
"{A1}","2","","False","{ "apple" : false, "pear" : false, "banana" : null}
Therefore, when these particular lines were read in I was getting the error "Error tokenizing data. C error: Expected 37 fields in line 35, saw 42". I found this solution which said to add
sep=",(?![^{]*})" into the read_csv arguments which worked with splitting the data correctly. However, the data now includes the quotation marks around every entry (this didn't happen before I added the sep argument in).
The data looks something like this now:
"label1" "label2" "label3" "label4" "label5"
"{A1}" "2" "" "False" "{ "apple" : false, "pear" : false, "banana" : null}"
meaning I can't use, for example, .describe(), etc on the numerical data because they're still strings.
Does anyone know of a way of reading it in without the quotation marks but still splitting the data where it is?
Very new to Python so apologies if there is an obvious solution.
serialdev found a solution to removing the "s but the data columns are objects and not what I would expect/want, e.g. the integer values aren't seen as integers.
The data needs to be split at "," explicitly (including the "s), is there a way of stating that in the read_csv arguments?
Thanks!
To read in the data structure you specified, where the last element is an unknown length.
"{A1}","2","","False","{ "apple" : false, "pear" : false, "banana" : null}"
"{A1}","2","","False","{ "apple" : false, "pear" : false, "banana" : null, "orange": "true"}"
Change the separate to a regular expression using a negative forward lookahead assertion. This will enable you to separate on a ',' only when not immediately followed by a space.
df = pd.read_csv('my_file.csv', sep='[,](?!\s)', engine='python', thousands='"')
print df
0 1 2 3 4
0 "{A1}" 2 NaN "False" "{ "apple" : false, "pear" : false, "banana" :...
1 "{A1}" 2 NaN "False" "{ "apple" : false, "pear" : false, "banana" :...
Specifying the thousands separator as the quote is a bit of a hackie way to parse fields contains a quoted integer into the correct datatype. You can achieve the same result using converters which can also remove the quotes from the strings should you need it to and cast "True" or "False" to a boolean.
If need remove " from column, use vectorized function str.strip:
import pandas as pd
mydata = [{'"first_name"': '"Bill"', '"age"': '"7"'},
{'"first_name"': '"Bob"', '"age"': '"8"'},
{'"first_name"': '"Ben"', '"age"': '"9"'}]
df = pd.DataFrame(mydata)
print (df)
"age" "first_name"
0 "7" "Bill"
1 "8" "Bob"
2 "9" "Ben"
df['"first_name"'] = df['"first_name"'].str.strip('"')
print (df)
"age" "first_name"
0 "7" Bill
1 "8" Bob
2 "9" Ben
If need apply function str.strip() to all columns, use:
df = pd.concat([df[col].str.strip('"') for col in df], axis=1)
df.columns = df.columns.str.strip('"')
print (df)
age first_name
0 7 Bill
1 8 Bob
2 9 Ben
Timings:
mydata = [{'"first_name"': '"Bill"', '"age"': '"7"'},
{'"first_name"': '"Bob"', '"age"': '"8"'},
{'"first_name"': '"Ben"', '"age"': '"9"'}]
df = pd.DataFrame(mydata)
df = pd.concat([df]*3, axis=1)
df.columns = ['"first_name1"','"age1"','"first_name2"','"age2"','"first_name3"','"age3"']
#create sample [300000 rows x 6 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
df1,df2 = df.copy(),df.copy()
def a(df):
df.columns = df.columns.str.strip('"')
df['age1'] = df['age1'].str.strip('"')
df['first_name1'] = df['first_name1'].str.strip('"')
df['age2'] = df['age2'].str.strip('"')
df['first_name2'] = df['first_name2'].str.strip('"')
df['age3'] = df['age3'].str.strip('"')
df['first_name3'] = df['first_name3'].str.strip('"')
return df
def b(df):
#apply str function to all columns in dataframe
df = pd.concat([df[col].str.strip('"') for col in df], axis=1)
df.columns = df.columns.str.strip('"')
return df
def c(df):
#apply str function to all columns in dataframe
df = df.applymap(lambda x: x.lstrip('\"').rstrip('\"'))
df.columns = df.columns.str.strip('"')
return df
print (a(df))
print (b(df1))
print (c(df2))
In [135]: %timeit (a(df))
1 loop, best of 3: 635 ms per loop
In [136]: %timeit (b(df1))
1 loop, best of 3: 728 ms per loop
In [137]: %timeit (c(df2))
1 loop, best of 3: 1.21 s per loop
Would this work since you have all the data that you need:
.map(lambda x: x.lstrip('\"').rstrip('\"'))
So simply clean up all the occurrences of " afterwards
EDIT with example:
mydata = [{'"first_name"' : '"bill', 'age': '"75"'},
{'"first_name"' : '"bob', 'age': '"7"'},
{'"first_name"' : '"ben', 'age': '"77"'}]
IN: df = pd.DataFrame(mydata)
OUT:
"first_name" age
0 "bill "75"
1 "bob "7"
2 "ben "77"
IN: df['"first_name"'] = df['"first_name"'].map(lambda x: x.lstrip('\"').rstrip('\"'))
OUT:
0 bill
1 bob
2 ben
Name: "first_name", dtype: object
Use this sequence after selecting the column, it is not ideal but will get the job done:
.map(lambda x: x.lstrip('\"').rstrip('\"'))
You can change the Dtypes after using this pattern:
df['col'].apply(lambda x: pd.to_numeric(x, errors='ignore'))
or simply:
df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)
It depend on your file. Did you check your data if there is comma or not, in cell ? If you have like this e.g Banana : Fruit, Tropical, Eatable, etc. in same cell, you're gonna get this kind of bug. One of basic solution is removing all commas in a file. Or, if you can read it, you can remove special characters :
>>>df
Banana
0 Hello, Salut, Salom
1 Bonjour
>>>df['Banana'] = df['Banana'].str.replace(',','')
>>>df
Banana
0 Hello Salut Salom
1 Bonjour

Regex to find specific pattern in R

I have a dataset like below:
dput(d1)
structure(list(FNUM = structure(1L, .Label = "20140824-0227", class = "factor"),
DESCRIPTION = "From : J LTo : feedback#lsd.goe.sfcc : Bcc : Sent On : Mon Apr 13 08:59:18 S 2015Subject : RE:Re: Suspect illegally modified vehiclesBody : Our Ref: BS-CT-1408-0665Date : 2-Apr-2015Our Ref: 2015/Jan/3224Date : 2-Apr-2015Thank you very much! Please conduct a thorough check on the vehicle other than the exhaust system. Warm regards,J L--------------------------------------------On Mon, 4/13/15, feedback#lsd.goe.sf <feedback#lsd.goe.sf> wrote: Subject: RE:Re: Suspect illegally modified vehicles To: jl1229#yahoo.ca Received: Monday, April 13, 2015, 8:56 AM Our Ref: GCE/VS/VS/VE/F20.000.000/38104 Date : 8-Apr-2015 Tel : 1800 2255 582 Fax : 6553 5329 -------------------------------------------- On Mon, 4/6/15, feedback#lsd.goe.sf <feedback#lsd.goe.sf> wrote: Subject: Suspect illegally modified vehicles To: joa#dccs.ca Received: Monday, April 6, 2015, 11:06 AM Our Ref: GCE/VS/VS/VE/F20.000.000/37661 Date : 2-Apr-2015 Tel : 1812 2235 582 Fax : 6553 5329 Dear Ms L Our records show that the vehicle bearing registration"), .Names = c("FNUM",
"DESCRIPTION"), row.names = "1", class = "data.frame")
I use the below regex to identfiy values Our Ref:
> gsub(" *(Our Ref|Date) *:? *","",regmatches(d1[1,2],gregexpr("Our Ref *:[^:]+",d1[1,2]))[[1]])
[1] "BS-CT-1408-0665" "2015/Jan/3224"
[3] "GCE/VS/VS/VE/F20.000.000/38104" "GCE/VS/VS/VE/F20.000.000/37661"
But i only wanted values of Our Ref: which starts with GCE , how do i limit my output to those values which begins with GCE.
Desired Result:
[1] "GCE/VS/VS/VE/F20.000.000/38104" "GCE/VS/VS/VE/F20.000.000/37661"
Updated For Second part of the problem:
dput(d1)
structure(list(FNUM = structure(1L, .Label = "20140824-0227", class = "factor"),
DESCRIPTION = "From : J LTo : feedback#lsd.goe.sfcc : Bcc : Sent On : Mon Apr 13 08:59:18 S 2015Subject : RE:Re: Suspect illegally modified vehiclesBody : Our Ref: BS-CT-1408-0665Date : 2-Apr-2015Our Ref: 2015/Jan/3224Date : 2-Apr-2015Thank you very much! Please conduct a thorough check on the vehicle other than the exhaust system. Warm regards,J L--------------------------------------------On Mon, 4/13/15, feedback#lsd.goe.sf <feedback#lsd.goe.sf> wrote: Subject: RE:Re: Suspect illegally modified vehicles To: jl1229#yahoo.ca Received: Monday, April 13, 2015, 8:56 AM Our Ref: GCE/VS/VS/VE/F20.000.000/38104 Date : 8-Apr-2015 Tel : 1800 2255 582 Fax : 6553 5329 -------------------------------------------- On Mon, 4/6/15, feedback#lsd.goe.sf <feedback#lsd.goe.sf> wrote: Subject: Suspect illegally modified vehicles To: joa#dccs.ca Received: Monday, April 6, 2015, 11:06 AM Our Ref: GCE/QSMO/SQSS/SQ/F20.000.000/503533/lc Date : 2-Apr-2015 Tel : 1812 2235 582 Fax : 6553 5329 Our Ref: GCE/CC/PCF/FB/F20.000.000/233546/SK/PW Date : 2-Apr-2015 Dear Ms L Our records show that the vehicle bearing registration "), .Names = c("FNUM",
"DESCRIPTION"), row.names = "1", class = "data.frame")
> gsub(" *(Our Ref|Date) *:? *","",regmatches(d1[1,2],gregexpr("Our Ref *:\\s+GCE[^:]+",d1[1,2]))[[1]])
[1] "GCE/VS/VS/VE/F20.000.000/38104" "GCE/QSMO/SQSS/SQ/F20.000.000/503533/lc"
[3] "GCE/CC/PCF/FB/F20.000.000/233546/SK/PW"
However i want to limit my result to
[1] "GCE/VS/VS/VE/F20.000.000/38104" "GCE/QSMO/SQSS/SQ/F20.000.000/503533"
[3] "GCE/CC/PCF/FB/F20.000.000/233546"
which is i wanted only v1/v2/v3/v4/v5/v6 anything after 6 values should be removed or ends with number after 5 /(slashes).GCE/QSMO/SQSS/SQ/F20.000.000/503533/lc should change to GCE/QSMO/SQSS/SQ/F20.000.000/503533 and GCE/CC/PCF/FB/F20.000.000/233546/SK/PW should change to GCE/CC/PCF/FB/F20.000.000/233546
You can add in a requirement that "GCE" (with space before it) occurs before your [^:]
regmatches(d1[1,2],gregexpr("Our Ref *:\\s+GCE[^:]+",d1[1,2]))
EDIT: try this, you can match groups n numbers of times with {n},
gsub(" *(Our Ref|Date) *:? *", "",
regmatches(d1[1,2],
gregexpr("Our Ref *:\\s+GCE(/[^/-]+){5}",
d1[1,2], perl=T))[[1]])
Here is a different approach using strpslit to split on any non-digit character one or more times: \\D+ followed by a space:
splts <- strsplit(d1$DESCRIPTION, "\\D+ ")[[1]]
splts[grep("GCE", splts)]
# [1] "GCE/VS/VS/VE/F20.000.000/38104" "GCE/QSMO/SQSS/SQ/F20.000.000/503533"
# [3] "GCE/CC/PCF/FB/F20.000.000/233546"

Multiple inputs to produce wanted output

I am trying to create a code that takes the user input, compares it to a list of tuples (shares.py) and then prints the values in a the list. for example if user input was aia, this code would return:
Please list portfolio: aia
Code Name Price
AIA Auckair 1.50
this works fine for one input, but what I want to do is make it work for multiple inputs.
For example if user input was aia, air, amp - this input would return:
Please list portfolio: aia, air, amp
Code Name Price
AIA Auckair 1.50
AIR AirNZ 5.60
AMP Amp 3.22
This is what I have so far. Any help would be appreciated!
import shares
a=input("Please input")
s1 = a.replace(' ' , "")
print ('Please list portfolio: ' + a)
print (" ")
n=["Code", "Name", "Price"]
print ('{0: <6}'.format(n[0]) + '{0:<20}'.format(n[1]) + '{0:>8}'.format(n[2]))
z = shares.EXCHANGE_DATA[0:][0]
b=s1.upper()
c=b.split()
f=shares.EXCHANGE_DATA
def find(f, a):
return [s for s in f if a.upper() in s]
x= (find(f, str(a)))
print ('{0: <6}'.format(x[0][0]) + '{0:<20}'.format(x[0][1]) + ("{0:>8.2f}".format(x[0][2])))
shares.py contains this
EXCHANGE_DATA = [('AIA', 'Auckair', 1.5),
('AIR', 'Airnz', 5.60),
('AMP', 'Amp',3.22),
('ANZ', 'Anzbankgrp', 26.25),
('ARG', 'Argosy', 12.22),
('CEN', 'Contact', 11.22)]
I am assuming a to contain values in the following format 'aia air amp'
raw = a # just in case you want the original string at a later point
toDisplay = []
a = a.split() # a now looks like ['aia','air','amp']
for i in a:
temp = find(f, i)
if(temp):
toDisplay.append(temp)
for i in toDisplay:
print ('{0: <6}'.format(i[0][0]) + '{0:<20}'.format(i[0][1]) + ("{0:>8.2f}".format(i[0][2])))
Essentially what I'm trying to do is
Split the input into a list
Do exactly what you were doing for a single input for each item in that list
Hope this helps!