Python Dictionary Fuzzy Match on keys - python-2.7

I have the following dictionary:
classes = {'MATH6371': 'Statistics 1', 'COMP7330': 'Database Management',
'MATH6471': 'Statistics 2','COMP7340': 'Creative Computation' }
And I am trying make a raw_input fuzzy match on the dictionary keys. For example, if I type in 'math', the output would be Statistics 1 and Statistics 2.
I have the following code, but it only matches keys exactly:
def print_courses (raw_input):
search = raw_input("Type a course ID here:")
if search in classes:
print classes.get(search)
else:
print "Sorry, that course doesn't exist, try again"
print_courses(raw_input)
Thanks

Here you go:
>>> search = 'math'
>>> result = [classes[key] for key in classes if search in key.lower()]
['Statistics 2', 'Statistics 1']

Related

How to remove unwanted items from a parse file

from googlefinance import getQuotes
import json
import time as t
import re
List = ["A","AA","AAB"]
Time=t.localtime() # Sets variable Time to retrieve date/time info
Date2= ('%d-%d-%d %dh:%dm:%dsec'%(Time[0],Time[1],Time[2],Time[3],Time[4],Time[5])) #formats time stamp
while True:
for i in List:
try: #allows elements to be called and if an error does the next step
Data = json.dumps(getQuotes(i.lower()),indent=1) #retrieves Data from google finance
regex = ('"LastTradePrice": "(.+?)",') #sets parse
pattern = re.compile(regex) #compiles parse
price = re.findall(pattern,Data) #retrieves parse
print(i)
print(price)
except: #sets Error coding
Error = (i + ' Failed to load on: ' + Date2)
print (Error)
It will display the quote as: ['(number)'].
I would like it to only display the number, which means removing the brackets and quotes.
Any help would be great.
Changing:
print(price)
into:
print(price[0])
prints this:
A
42.14
AA
10.13
AAB
0.110
Try to use type() function to know the datatype, in your case type(price)
it the data type is list use print(price[0])
you will get the output (number), for brecess you need to check google data and regex.

Reference a table column by its column header in Python

Is there a Pythonic way to refer to columns of 2D lists by name?
I import a lot of tables from the web so I made a general purpose function that creates 2 dimensional lists out of various HTML tables. So far so good. But the next step is often to parse the table row by row.
# Sample table.
# In real life I would do something like: table = HTML_table('url', 'table id')
table =
[
['Column A', 'Column B', 'Column C'],
['One', 'Two', 3],
['Four', 'Five', 6]
]
# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
process_row(row[iA], row[iC])
# Desired code:
for row in table[1:]:
process_row(row['Column A'], row['Column C'])
I think you'll really like the pandas module! http://pandas.pydata.org/
Put your list into a DataFrame
This could also be done directly from html, csv, etc.
df = pd.DataFrame(table[1:], columns=table[0]).astype(str)
Access columns
df['Column A']
Access first row by index
df.iloc[0]
Process row by row
df.apply(lambda x: '_'.join(x), axis=0)
for index,row in df.iterrows():
process_row(row['Column A'], row['Column C'])
Process a column
df['Column C'].astype(int).sum()
Wouldn't a ordereddict of keys being columns names and values a list of rows be a better approach for your problem? I would go with something like:
table = {
'Column A': [1, 4],
'Column B': [2, 5],
'Column C': [3, 6]
}
# And you would parse column by column...
for col, rows in table.iteritems():
#do something
My QueryList is simple to use.
ql.filter(portfolio='123')
ql.group_by(['portfolio', 'ticker'])
class QueryList(list):
"""filter and/or group_by a list of objects."""
def group_by(self, attrs) -> dict:
"""Like a database group_by function.
args:
attrs: str or list.
Returns:
{value_of_the_group: list_of_matching_objects, ...}
When attrs is a list, each key is a tuple.
Ex:
{'AMZN': QueryList(),
'MSFT': QueryList(),
...
}
-- or --
{('Momentum', 'FB'): QueryList(),
...,
}
"""
result = defaultdict(QueryList)
if isinstance(attrs, str):
for item in self:
result[getattr(item, attrs)].append(item)
else:
for item in self:
result[tuple(getattr(item, x) for x in attrs)].append(item)
return result
def filter(self, **kwargs):
"""Returns the subset of IndexedList that has matching attributes.
args:
kwargs: Attribute name/value pairs.
Example:
foo.filter(portfolio='123', account='ABC').
"""
ordered_kwargs = OrderedDict(kwargs)
match = tuple(ordered_kwargs.values())
def is_match(item):
if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
return True
else:
return False
result = IndexedList([x for x in self if is_match(x)])
return result
def scalar(self, default=None, attr=None):
"""Returns the first item in this QueryList.
args:
default: The value to return if there is less than one item,
or if the attr is not found.
attr: Returns getattr(item, attr) if not None.
"""
item, = self[0:1] or [default]
if attr is None:
result = item
else:
result = getattr(item, attr, default)
return result
I tried pandas. I wanted to like it, I really did. But ultimately it is too complicated for my needs.
For example:
df[df['portfolio'] == '123'] & df['ticker'] == 'MSFT']]
is not as simple as
ql.filter(portfolio='123', ticker='MSFT')
Furthermore, creating a QueryList is simpler than creating a df.
That's because you tend to use custom classes with a QueryList. The data conversion code would naturally be placed into the custom class which keeps that separate from the rest of the logic. But data conversion for a df would normally be done inline with the rest of the code.

Search a list with words in string as parameter in python

I could use some advice, how to search in a list for genres with words in a string as parameter.
So if i have created a list called genre, which contains a string like:
['crime, drama,action']
I want to use this list to search for movies containing all genres or maybe just 1 of them.
I have created a big list which contains all information about the movie. An example from the list you see here:
('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n'),
So if i want to search for saving private ryan, which is a drama + action genre, but not crime, how can i then use my genre list to search for it?
Is there a way to search by something in the string?
UPDATE:
So this is what i done so far. I have tried to precess my tuple movie and use the def function.
Navn_rating = dict(zip(names1, ratings))
Actor_genre = dict(zip(actorlist, genre_list))
var = raw_input("Enter movie: ")
print "you entered ", var
for row in name_rating_actor_genre:
if var in row:
movie.append(row)
print "Movie found",movie
def process_movie(movie):
return {'title': names1, 'rating': ratings, 'actors': actorlist, 'genre': genre_list}
You can "search by something in the string" using in:
>>> genres = 'action, drama, war,\n'
>>> 'action' in genres
True
>>> 'drama' in genres
True
>>> 'romantic comedy' in genres
False
But note that this might not always give the result you want:
>>> 'war' in 'award-winning'
True
I think you should change your data structure. Consider making each movie a dictionary e.g.
{'title': 'Saving Private Ryan', 'year': 1998, 'rating': 8.5, 'actors': ['Tom Hanks', ...], 'genres': ['action', ...]}
then your query becomes
if 'drama' in movie.genres and 'action' in movie.genres:
You can use indexing, split and slicing to process your tuple of strings to make the values of the dictionary, e.g.:
>>> movie = ('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')
>>> int(movie[0][-5:-1])
1998
>>> float(movie[1])
8.5
>>> movie[0][:-7]
'Saving Private Ryan'
>>> movie[2].split(",")
['Tom Hanks', ' Matt Damon', " Tom Sizemore'", '\n']
As you can see, some tidying up may be needed. You could write a function that takes the tuple as an argument and returns the corresponding dictionary:
def process_movie(movie_tuple):
# ... process the tuple here
return {'title': title, 'rating': rating, ...}
and apply this to your list of movies using map:
movies = list(map(process_movie, name_rating_actor_genre))
Edit:
You will know your function works when the following line doesn't raise any errors:
assert process_movie(('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')) == {"title": "Saving Private Ryan", "year": 1998, "rating": 8.5, "actors": ["Tom Hanks", "Matt Damon", "Tom Sizemore"], "genres": ["action", "drama", "war"]}

Python: extract all placeholders from format string

I have to 'parse' a format string in order to extract the variables.
E.g.
>>> s = "%(code)s - %(description)s"
>>> get_vars(s)
'code', 'description'
I managed to do that by using regular expressions:
re.findall(r"%\((\w+)\)", s)
but I wonder whether there are built-in solutions (actually Python do parse the string in order to evaluate it!).
This seems to work great:
def get_vars(s):
d = {}
while True:
try:
s % d
except KeyError as exc:
# exc.args[0] contains the name of the key that was not found;
# 0 is used because it appears to work with all types of placeholders.
d[exc.args[0]] = 0
else:
break
return d.keys()
gives you:
>>> get_vars('%(code)s - %(description)s - %(age)d - %(weight)f')
['age', 'code', 'description', 'weight']

What is the most efficient method to parse this line of text?

The following is a row that I have extracted from the web:
AIG $30 AIG is an international renowned insurance company listed on the NYSE. A period is required. Manual Auto Active 3 0.0510, 0.0500, 0.0300 [EXTRACT]
I will like to create 5 separate variables by parsing the text and retrieving the relevant data. However, i seriously don't understand the REGEX documentation! Can anyone guide me on how i can do it correctly with this example?
Name = AIG
CurrentPrice = $30
Status = Active
World_Ranking = 3
History = 0.0510, 0.0500, 0.0300
Not sure what do you want to achieve here. There's no need to use regexps, you could just use str.split:
>>> str = "AIG $30 AIG is an international renowned insurance company listed on the NYSE. A period is required. Manual Auto Active 3 0.0510, 0.0500, 0.0300 [EXTRACT]"
>>> list = str.split()
>>> dict = { "Name": list[0], "CurrentPrice": list[1], "Status": list[19], "WorldRanking": list[20], "History": ' '.join((list[21], list[22], list[23])) }
#output
>>> dict
{'Status': 'Active', 'CurrentPrice': '$30', 'Name': 'AIG', 'WorldRanking': '3', 'History': '0.0510, 0.0500, 0.0300'}
Instead of using list[19] and so on, you may want to change it to list[-n] to not depend to the company's description length. Like that:
>>> history = ' '.join(list[-4:-1])
>>> history
'0.0510, 0.0500, 0.0300'
For floating history indexes it could be easier to use re:
>>> import re
>>> history = re.findall("\d\.\d{4}", str)
>>> ['0.0510', '0.0500', '0.0300']
For identifying status, you could get the indexes of history values and then substract by one:
>>> [ i for i, substr in enumerate(list) if re.match("\d\.\d{4}", substr) ]
[21, 22, 23]
>>> list[21:24]
['0.0510,', '0.0500,', '0.0300,']
>>> status = list[20]
>>> status
'3'