Search a list with words in string as parameter in python - list

I could use some advice, how to search in a list for genres with words in a string as parameter.
So if i have created a list called genre, which contains a string like:
['crime, drama,action']
I want to use this list to search for movies containing all genres or maybe just 1 of them.
I have created a big list which contains all information about the movie. An example from the list you see here:
('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n'),
So if i want to search for saving private ryan, which is a drama + action genre, but not crime, how can i then use my genre list to search for it?
Is there a way to search by something in the string?
UPDATE:
So this is what i done so far. I have tried to precess my tuple movie and use the def function.
Navn_rating = dict(zip(names1, ratings))
Actor_genre = dict(zip(actorlist, genre_list))
var = raw_input("Enter movie: ")
print "you entered ", var
for row in name_rating_actor_genre:
if var in row:
movie.append(row)
print "Movie found",movie
def process_movie(movie):
return {'title': names1, 'rating': ratings, 'actors': actorlist, 'genre': genre_list}

You can "search by something in the string" using in:
>>> genres = 'action, drama, war,\n'
>>> 'action' in genres
True
>>> 'drama' in genres
True
>>> 'romantic comedy' in genres
False
But note that this might not always give the result you want:
>>> 'war' in 'award-winning'
True
I think you should change your data structure. Consider making each movie a dictionary e.g.
{'title': 'Saving Private Ryan', 'year': 1998, 'rating': 8.5, 'actors': ['Tom Hanks', ...], 'genres': ['action', ...]}
then your query becomes
if 'drama' in movie.genres and 'action' in movie.genres:
You can use indexing, split and slicing to process your tuple of strings to make the values of the dictionary, e.g.:
>>> movie = ('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')
>>> int(movie[0][-5:-1])
1998
>>> float(movie[1])
8.5
>>> movie[0][:-7]
'Saving Private Ryan'
>>> movie[2].split(",")
['Tom Hanks', ' Matt Damon', " Tom Sizemore'", '\n']
As you can see, some tidying up may be needed. You could write a function that takes the tuple as an argument and returns the corresponding dictionary:
def process_movie(movie_tuple):
# ... process the tuple here
return {'title': title, 'rating': rating, ...}
and apply this to your list of movies using map:
movies = list(map(process_movie, name_rating_actor_genre))
Edit:
You will know your function works when the following line doesn't raise any errors:
assert process_movie(('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')) == {"title": "Saving Private Ryan", "year": 1998, "rating": 8.5, "actors": ["Tom Hanks", "Matt Damon", "Tom Sizemore"], "genres": ["action", "drama", "war"]}

Related

Django Q object AND operation not working

I have a listview of people with a filter. The filter has the following code:
if textquery:
qs = qs.filter()
qs = qs.filter(Q(name__icontains=textquery) |
Q(surname__icontains=textquery) |
Q(surname__icontains=textquery) & Q(name__icontains=textquery)
)
return qs
People can have both a first and a last name, and searching for those works as expected. However, when I input both the first AND the lastname, the program does not return any results (even though I thought that the '&' operation in the end should include both variables).
In summary: this is currently the result of my queries:
Person: 'John Doe'
Query: 'John'
Result: 'John Doe'
Person: 'John Doe'
Query: 'Doe'
Result: 'John Doe'
Person 'John Doe'
Query: 'Johh Doe'
Result: ''
Does anyone know what I am doing wrong, and why matches for both the NAME and SURNAME do not return any results?
The filter returns nothing because current code check if firstname or lastname contain FULL name. This will return False everytime. E.g. John is not contains John Doe. So you need to parse textquery as name and surname:
if textquery:
qs = qs.filter()
name, surname = textquery.split()
qs = qs.filter(Q(name__icontains=textquery) |
Q(surname__icontains=textquery) |
Q(surname__icontains=name) & Q(name__icontains=surname)
)
#neverwalkaloner has it right. You can use his solution or try this one below, it combines the name and surname together and make the solution more flexible. So now if even the textquery is "n D" it still matches.
from django.db.models.functions import Concat
from django.db.models import F, Value
qs = qs.annotate(fullname=Concat(F('name'), Value(' '), F('surname')))\
.filter(fullname__icontains=textquery)
Your query is working fine.
Person: 'John Doe'
Query: 'John Doe'
Result: ''
is also correct.
Try to understand.
John Doe neither match with John nor with Doe.

Better way to find sub strings in Datastore?

I have an aplication where an user inputs a name and the aplication gives back the adress and city for that name
The names are in datastore
class Person(ndb.Model):
name = ndb.StringProperty(repeated=True)
address = ndb.StringProperty(indexed=False)
city = ndb.StringProperty()
There are more than 5 million of Person entities. Names can be formed from 2 to 8 words (yes, there are people with 8 words in his names)
Users can enter any words for the name (in any order) and the aplication will return the first match.("John Doe Smith" is equivalent to " Smith Doe John")
This is a sample of my entities(the way how was put(ndb.put_multi)
id="L12802795",nombre=["Smith","Loyola","Peter","","","","",""], city="Cali",address="Conchuela 471"
id="M19181478",nombre=["Hoffa","Manzano","Linda","Rosse","Claudia","Cindy","Patricia",""], comuna="Lima",address=""
id="L18793849",nombre=["Parker","Martinez","Claudio","George","Paul","","",""], comuna="Santiago",address="Calamar 323 Villa Los Pescadores"
This is the way I get the name from the user:
name = self.request.get('content').strip() #The input is the name (an string with several words)
name=" ".join(name.split()).split() #now the name is a list of single words
In my design, in order to find a way to find and search words in the name for each entity, I did this.
q = Person.query()
if len(name)==1:
names_query =q.filter(Person.name==name[0])
elif len(name)==2:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1])
elif len(name)==3:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2])
elif len(name)==4:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2]).filter(Person.name==name[3])
elif len(name)==5:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2]).filter(Person.name==name[3]).filter(Person.name==name[4])
elif len(name)==6:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2]).filter(Person.name==name[3]).filter(Person.name==name[4]).filter(Person.name==name[5])
elif len(name)==7:
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2]).filter(Person.name==name[3]).filter(Person.name==name[4]).filter(Person.name==name[5]).filter(Person.name==name[6])
else :
names_query =q.filter(Person.name==name[0]).filter(Person.name==name[1]).filter(Person.name==name[2]).filter(Person.name==name[3]).filter(Person.name==name[4]).filter(Person.name==name[5]).filter(Person.name==name[6]).filter(Person.name==name[7])
Person = names_query.fetch(1)
person_id=Person.key.id()
Question 1
Do you think, there is a better way for searching sub strings in strings (ndb.StringProperty), in my design. (I know it works, but I feel it can be improved)
Question 2
My solution has a problem for entities with repeted words in the name.
If I want to find an entity with words "Smith Smith" it brings me "Paul Smith Wshite" instead of "Paul Smith Smith", I do not know how to modify my query in order to find 2(or more) repeated words in Person.name
You could generate a list of all possible tokens for each name and use prefix filters to query them:
class Person(ndb.Model):
name = ndb.StringProperty(required=True)
address = ndb.StringProperty(indexed=False)
city = ndb.StringProperty()
def _tokens(self):
"""Returns all possible combinations of name tokens combined.
For example, for input 'john doe smith' we will get:
['john doe smith', 'john smith doe', 'doe john smith', 'doe smith john',
'smith john doe', 'smith doe john']
"""
tokens = [t.lower() for t in self.name.split(' ') if t]
return [' '.join(t) for t in itertools.permutations(tokens)] or None
tokens = ndb.ComputedProperty(_tokens, repeated=True)
#classmethod
def suggest(cls, s):
s = s.lower()
return cls.query(ndb.AND(cls.tokens >= s, cls.tokens <= s + u'\ufffd'))
ndb.put_multi([Person(name='John Doe Smith'), Person(name='Jane Doe Smith'),
Person(name='Paul Smith Wshite'), Person(name='Paul Smith'),
Person(name='Test'), Person(name='Paul Smith Smith')])
assert Person.suggest('j').count() == 2
assert Person.suggest('ja').count() == 1
assert Person.suggest('jo').count() == 1
assert Person.suggest('doe').count() == 2
assert Person.suggest('t').count() == 1
assert Person.suggest('Smith Smith').get().name == 'Paul Smith Smith'
assert Person.suggest('Paul Smith').count() == 3
And make sure to use keys_only queries if you only want keys/ids. This will make things significantly faster and almost free in terms of datastore OPs.

How to find item in list

I am new and have been working on this for a week now but can't find any solution. I hope someone can help me figure this out.
How can I find items in the list - listitems and output their items individually?
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Is there anyway i can compute the related food in the receipt list?
So far I am only able to find the foods for ONE item. My code is as follows:
I also computed another list of each item is for comparison.
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
I would personally use a dictionary based on keys for each item with associated items as their values, would also be much easier for you to get the results you want, not exactly sure right know how I would accomplish it from the list you made only.
From your original code, add " print (combi) ", " print (checklist) " and " print (correlatedlist) " at the end and you will see it doesn't really append it the way you want.
In Python 3.5:
import itertools
listOfStrings = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
print ('Original list:', listOfStrings)
print ()
listOfLists = [group.replace (', ', ',') .split (',') for group in listOfStrings]
print ('Turned into list of lists:', listOfLists)
print ()
allFoods = set (itertools.chain (*listOfLists))
print ('All individual foods:', allFoods)
print ()
listOfSets = [set (group) for group in listOfLists]
print ('A list of sets is handy, since sets contain no duplicates:', listOfSets)
print ()
dictOfFoods = dict ([[food, set ()] for food in allFoods])
print ('Prepare a dictionary, where we can put the associated foods:', dictOfFoods)
print ()
for food in dictOfFoods:
for foodSet in listOfSets:
if food in foodSet:
dictOfFoods [food] .update (foodSet)
dictOfFoods [food] .remove (food)
print ('The dictionary is now filled:', dictOfFoods)
print ()
for food in dictOfFoods:
print ('People who buy', food, 'also buy:')
for otherFood in dictOfFoods [food]:
print (otherFood)
print ()
Will print:
Original list: ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Turned into list of lists: [['Beer', 'Chicken'], ['Cake', 'Chocolate'], ['Lemon with ci', 'Chicken'], ['Beer', 'Beer', 'Cake', 'Chocolate']]
All individual foods: {'Chocolate', 'Lemon with ci', 'Chicken', 'Cake', 'Beer'}
A list of sets is handy, since sets contain no duplicates: [{'Chicken', 'Beer'}, {'Chocolate', 'Cake'}, {'Chicken', 'Lemon with ci'}, {'Chocolate', 'Cake', 'Beer'}]
Prepare a dictionary, where we can put the associated foods: {'Chocolate': set(), 'Lemon with ci': set(), 'Cake': set(), 'Beer': set(), 'Chicken': set()}
The dictionary is now filled: {'Chocolate': {'Cake', 'Beer'}, 'Lemon with ci': {'Chicken'}, 'Cake': {'Chocolate', 'Beer'}, 'Beer': {'Chocolate', 'Chicken', 'Cake'}, 'Chicken': {'Lemon with ci', 'Beer'}}
People who buy Chocolate also buy:
Cake
Beer
People who buy Lemon with ci also buy:
Chicken
People who buy Cake also buy:
Chocolate
Beer
People who buy Beer also buy:
Chocolate
Chicken
Cake
People who buy Chicken also buy:
Lemon with ci
Beer
If you don't want to use itertools and *, you can also make a loop in a loop to traverse all elements of the listOfLists and add them to allFoods, which you initially make empty.
It took me sometime to understand exactly what you wanted, but this is a working solution.
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
for item in eachitems:
assoc = [associated for associated in listitems if item in associated]
result = set()
for itemlist in assoc:
itemlist = itemlist.replace(', ', ',').split(',')
itemlist = set(itemlist)
itemlist.remove(item)
result = result | itemlist
print('People who buy {} also buy: '.format(item), ', '.join(sorted(result)))
Output
People who buy Beer also buy: Cake, Chicken, Chocolate
People who buy Cake also buy: Beer, Chocolate
People who buy Chocolate also buy: Beer, Cake
People who buy Lemon with ci also buy: Chicken
People who buy Chicken also buy: Beer, Lemon with ci
They key part to this solution is the use of Sets to remove the duplicate items and the |(union) operator.
As a side note, instead of using | like this
result = result | itemlist
you can modify the set in place with
result.update(itemlist)

Python Dictionary Fuzzy Match on keys

I have the following dictionary:
classes = {'MATH6371': 'Statistics 1', 'COMP7330': 'Database Management',
'MATH6471': 'Statistics 2','COMP7340': 'Creative Computation' }
And I am trying make a raw_input fuzzy match on the dictionary keys. For example, if I type in 'math', the output would be Statistics 1 and Statistics 2.
I have the following code, but it only matches keys exactly:
def print_courses (raw_input):
search = raw_input("Type a course ID here:")
if search in classes:
print classes.get(search)
else:
print "Sorry, that course doesn't exist, try again"
print_courses(raw_input)
Thanks
Here you go:
>>> search = 'math'
>>> result = [classes[key] for key in classes if search in key.lower()]
['Statistics 2', 'Statistics 1']

What is the most efficient method to parse this line of text?

The following is a row that I have extracted from the web:
AIG $30 AIG is an international renowned insurance company listed on the NYSE. A period is required. Manual Auto Active 3 0.0510, 0.0500, 0.0300 [EXTRACT]
I will like to create 5 separate variables by parsing the text and retrieving the relevant data. However, i seriously don't understand the REGEX documentation! Can anyone guide me on how i can do it correctly with this example?
Name = AIG
CurrentPrice = $30
Status = Active
World_Ranking = 3
History = 0.0510, 0.0500, 0.0300
Not sure what do you want to achieve here. There's no need to use regexps, you could just use str.split:
>>> str = "AIG $30 AIG is an international renowned insurance company listed on the NYSE. A period is required. Manual Auto Active 3 0.0510, 0.0500, 0.0300 [EXTRACT]"
>>> list = str.split()
>>> dict = { "Name": list[0], "CurrentPrice": list[1], "Status": list[19], "WorldRanking": list[20], "History": ' '.join((list[21], list[22], list[23])) }
#output
>>> dict
{'Status': 'Active', 'CurrentPrice': '$30', 'Name': 'AIG', 'WorldRanking': '3', 'History': '0.0510, 0.0500, 0.0300'}
Instead of using list[19] and so on, you may want to change it to list[-n] to not depend to the company's description length. Like that:
>>> history = ' '.join(list[-4:-1])
>>> history
'0.0510, 0.0500, 0.0300'
For floating history indexes it could be easier to use re:
>>> import re
>>> history = re.findall("\d\.\d{4}", str)
>>> ['0.0510', '0.0500', '0.0300']
For identifying status, you could get the indexes of history values and then substract by one:
>>> [ i for i, substr in enumerate(list) if re.match("\d\.\d{4}", substr) ]
[21, 22, 23]
>>> list[21:24]
['0.0510,', '0.0500,', '0.0300,']
>>> status = list[20]
>>> status
'3'