I am new and have been working on this for a week now but can't find any solution. I hope someone can help me figure this out.
How can I find items in the list - listitems and output their items individually?
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Is there anyway i can compute the related food in the receipt list?
So far I am only able to find the foods for ONE item. My code is as follows:
I also computed another list of each item is for comparison.
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
I would personally use a dictionary based on keys for each item with associated items as their values, would also be much easier for you to get the results you want, not exactly sure right know how I would accomplish it from the list you made only.
From your original code, add " print (combi) ", " print (checklist) " and " print (correlatedlist) " at the end and you will see it doesn't really append it the way you want.
In Python 3.5:
import itertools
listOfStrings = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
print ('Original list:', listOfStrings)
print ()
listOfLists = [group.replace (', ', ',') .split (',') for group in listOfStrings]
print ('Turned into list of lists:', listOfLists)
print ()
allFoods = set (itertools.chain (*listOfLists))
print ('All individual foods:', allFoods)
print ()
listOfSets = [set (group) for group in listOfLists]
print ('A list of sets is handy, since sets contain no duplicates:', listOfSets)
print ()
dictOfFoods = dict ([[food, set ()] for food in allFoods])
print ('Prepare a dictionary, where we can put the associated foods:', dictOfFoods)
print ()
for food in dictOfFoods:
for foodSet in listOfSets:
if food in foodSet:
dictOfFoods [food] .update (foodSet)
dictOfFoods [food] .remove (food)
print ('The dictionary is now filled:', dictOfFoods)
print ()
for food in dictOfFoods:
print ('People who buy', food, 'also buy:')
for otherFood in dictOfFoods [food]:
print (otherFood)
print ()
Will print:
Original list: ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Turned into list of lists: [['Beer', 'Chicken'], ['Cake', 'Chocolate'], ['Lemon with ci', 'Chicken'], ['Beer', 'Beer', 'Cake', 'Chocolate']]
All individual foods: {'Chocolate', 'Lemon with ci', 'Chicken', 'Cake', 'Beer'}
A list of sets is handy, since sets contain no duplicates: [{'Chicken', 'Beer'}, {'Chocolate', 'Cake'}, {'Chicken', 'Lemon with ci'}, {'Chocolate', 'Cake', 'Beer'}]
Prepare a dictionary, where we can put the associated foods: {'Chocolate': set(), 'Lemon with ci': set(), 'Cake': set(), 'Beer': set(), 'Chicken': set()}
The dictionary is now filled: {'Chocolate': {'Cake', 'Beer'}, 'Lemon with ci': {'Chicken'}, 'Cake': {'Chocolate', 'Beer'}, 'Beer': {'Chocolate', 'Chicken', 'Cake'}, 'Chicken': {'Lemon with ci', 'Beer'}}
People who buy Chocolate also buy:
Cake
Beer
People who buy Lemon with ci also buy:
Chicken
People who buy Cake also buy:
Chocolate
Beer
People who buy Beer also buy:
Chocolate
Chicken
Cake
People who buy Chicken also buy:
Lemon with ci
Beer
If you don't want to use itertools and *, you can also make a loop in a loop to traverse all elements of the listOfLists and add them to allFoods, which you initially make empty.
It took me sometime to understand exactly what you wanted, but this is a working solution.
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
for item in eachitems:
assoc = [associated for associated in listitems if item in associated]
result = set()
for itemlist in assoc:
itemlist = itemlist.replace(', ', ',').split(',')
itemlist = set(itemlist)
itemlist.remove(item)
result = result | itemlist
print('People who buy {} also buy: '.format(item), ', '.join(sorted(result)))
Output
People who buy Beer also buy: Cake, Chicken, Chocolate
People who buy Cake also buy: Beer, Chocolate
People who buy Chocolate also buy: Beer, Cake
People who buy Lemon with ci also buy: Chicken
People who buy Chicken also buy: Beer, Lemon with ci
They key part to this solution is the use of Sets to remove the duplicate items and the |(union) operator.
As a side note, instead of using | like this
result = result | itemlist
you can modify the set in place with
result.update(itemlist)
Related
For my assignment, I am trying to scrape information off the following website: https://www.blueroomcinebar.com/movies/now-showing/.
My code needs to find movie names, times and posters. Both the movie times and posters appear to be displayed in the list I have created according to the order they appear in the HTML, however, the names seem to be in alphabetical order.
We are not allowed to use BeautifulSoup
This is my current code for scraping movies:
from re import findall, finditer, MULTILINE, DOTALL
from urllib.request import urlopen
movies_name = []
movies_times = []
movies_image = []
movies_list = []
movies_page = urlopen("https://www.blueroomcinebar.com/movies/now-showing/").read().decode('utf-8')
#Add movies to Movies at Blue Room Screen
find_movie_names = findall(r'<h1>(.*?)</h1>', movies_page)
find_movie_times = findall(r'<p>([0-9]{1,2}:[0-9]{2} AM|PM)</p>', movies_page)
find_movie_image = findall(r'<div class="poster" style="background-image: url\((.*?)\)">', movies_page)
print(find_movie_names)
#Add movies to arrays
for movie in find_movie_names:
movies_name.append(movie)
for movie in find_movie_times:
movies_times.append(movie)
for movie in find_movie_image:
movies_image.append(movie)
print(movies_name)
print(movies_image)
for movie in range(len(movies_name)):
movies_list.append("{};{};{}".format(movies_name[movie], movies_times[movie], movies_image[movie - 1]))
Currently, the names are in the list in the order of
['Aladdin', 'Avengers: Endgame', 'Chandigarh Amritsar Chandigarh', 'John Wick - Parabellum', 'Long Shot', 'Pokemon Detective Pikachu', 'Poms', 'The Hustle', 'Top End Wedding']
They should be in the order:
['Avengers: Endgame', 'Long Shot', 'Pokemon Detective Pikachu', 'The Hustle', 'John Wick - Parabellum', 'Aladdin', 'Chandigarh Amritsar Chandigarh']
N.P.
There may be a movie that comes up a second time with the precursor OCAP. I'm not 100% sure why it has that but it seems to be some kind of special screening that rotates through different movies each day.
I created the following Series and DataFrame:
import pandas as pd
Series_1 = pd.Series({'Name': 'Adam','Item': 'Sweet','Cost': 1})
Series_2 = pd.Series({'Name': 'Bob','Item': 'Candy','Cost': 2})
Series_3 = pd.Series({'Name': 'Cathy','Item': 'Chocolate','Cost': 3})`
df = pd.DataFrame([Series_1,Series_2,Series_3], index=['Store 1', 'Store 2', 'Store 3'])
I want to display/print out just one column from the DataFrame (with or without the header row):
Either
Adam
Bob
Cathy
Or:
Sweet
Candy
Chocolate
I have tried the following code which did not work:
print(df['Item'])
print(df.loc['Store 1'])
print(df.loc['Store 1','Item'])
print(df.loc['Store 1','Name'])
print(df.loc[:,'Item'])
print(df.iloc[0])
Can I do it in one simple line of code?
By using to_string
print(df.Name.to_string(index=False))
Adam
Bob
Cathy
For printing the Name column
df['Name']
Not sure what you are really after but if you want to print exactly what you have you can do:
Option 1
print(df['Item'].to_csv(index=False))
Sweet
Candy
Chocolate
Option 2
for v in df['Item']:
print(v)
Sweet
Candy
Chocolate
I have a big txt file which includes chat transcripts, My goal would be extract different components and create a Pandas Df to store in it. A sample of the chat is as below:
*****************************************************
Session:123456
Chat Date: 2017-05-01T08:01:45+00:00
Chat exec name: Sam
Member name: Sara
2017-05-01T08:01:45+00:00 Sara: I need help on element A
2017-05-01T08:01:47+00:00 Sam: Sure I can help you on this one
2017-05-01T08:01:48+00:00 Sara: Is there a better product
2017-05-01T08:01:48+10:00 Sam: Sure we have a lot of new products
2017-05-01T08:01:49+18:00 Sara: Can you let me know
2017-05-01T08:01:51+20:00 Sam: Here is the solution
2017-05-01T08:01:52+00:00 Sara: Thanks for this
2017-05-01T08:01:52+11:00 Sam: Have a Nive day Bye!!
*****************************************************
Session:234567
Chat Date: 2017-05-02T18:00:30+00:00
Chat exec name: PAUL
Member name:CHRIS
2017-05-02T18:00:30+00:00 CHRIS: I need help on element A
2017-05-02T18:02:30+00:00 PAUL: Sure I can help you on this one
2017-05-02T18:02:39+00:00 CHRIS: Is there a better product
2017-05-02T18:04:01+00:00 PAUL: Sure we have a lot of new products
2017-05-02T18:04:30+00:00 CHRIS: Can you let me know
2017-05-02T18:08:11+00:00 PAUL: Here is the solution
2017-05-02T18:08:59+00:00 CHRIS: Thanks for this
2017-05-02T18:09:11+00:00 PAUL: Have a Nice day Bye!!
*****************************************************
If I am able to create a table with the columns:
Session, ChatDate, ChatExecName, Membername, Time, Person, Sentence
The first 4 columns should be repeated for the complete block of chat. besides the delimiters are fixed and they never change.
I have tried this but this returns all blocks together can somebody please help.
import re
def GetTheSentences(infile):
Delim1 = '*****************************************************'
Delim2 = '*****************************************************'
with open(infile) as fp:
for result in re.findall('Delim1(.*?)Delim2', fp.read(), re.S):
print (result)
and
import re
def GetTheSentences2(file):
start_rx =re.compile('*****************************************************')
end_rx = re.compile('*****************************************************')
start = False
output = []
with open(file, encoding="latin-1") as datafile:
for line in datafile.readlines():
if re.match(start_rx, line):
start = True
elif re.match(end_rx, line):
start = False
if start:
output.append(line)
print (output)
I sure hope this is helpful:
data = '''*****************************************************
Session:123456
Chat Date: 2017-05-01T08:01:45+00:00
Chat exec name: Sam
Member name: Sara
2017-05-01T08:01:45+00:00 Sara: I need help on element A
2017-05-01T08:01:47+00:00 Sam: Sure I can help you on this one
2017-05-01T08:01:48+00:00 Sara: Is there a better product
2017-05-01T08:01:48+10:00 Sam: Sure we have a lot of new products
2017-05-01T08:01:49+18:00 Sara: Can you let me know
2017-05-01T08:01:51+20:00 Sam: Here is the solution
2017-05-01T08:01:52+00:00 Sara: Thanks for this
2017-05-01T08:01:52+11:00 Sam: Have a Nive day Bye!!
*****************************************************
Session:234567
Chat Date: 2017-05-02T18:00:30+00:00
Chat exec name: PAUL
Member name:CHRIS
2017-05-02T18:00:30+00:00 CHRIS: I need help on element A
2017-05-02T18:02:30+00:00 PAUL: Sure I can help you on this one
2017-05-02T18:02:39+00:00 CHRIS: Is there a better product
2017-05-02T18:04:01+00:00 PAUL: Sure we have a lot of new products
2017-05-02T18:04:30+00:00 CHRIS: Can you let me know
2017-05-02T18:08:11+00:00 PAUL: Here is the solution
2017-05-02T18:08:59+00:00 CHRIS: Thanks for this
2017-05-02T18:09:11+00:00 PAUL: Have a Nice day Bye!!
*****************************************************'''
data = data.split('*****************************************************')
data = [item.split('\n') for item in data if item]
result = []
for group in data:
group = [item for item in group if item]
times = []
people = []
lines = []
for item in group:
if item.startswith('Session'):
session = item.split(':')[-1]
print session
elif item.startswith('Chat Date'):
chatDate = item.split(':', 1)[-1]
elif item.startswith('Chat exec'):
execName = item.split(':')[-1]
elif item.startswith('Member'):
memberName = item.split(':')[-1]
else:
times.append(item[:25])
people.append(item[26:].split(':')[0])
lines.append(item[26:].split(':')[-1])
for i in range(len(times)):
result.append([session, chatDate, execName, memberName, times[i], people[i], lines[i]])
import pandas as pd
df = pd.DataFrame(result, columns=['Session', 'ChatDate', 'ChatExecName', 'Membername', 'Time', 'Person', 'Sentence'])
print df
I have a information in the format -
AAMOD, Robert Kevin; Salt Lake, '91; Sales Associate, Xyz, UT; r: 101 Williams Ave, Salt Lake City, UT 84105, cell: (xxx) xxx- xxxx, abc#yahoo.com.
I am trying to convert the information to CSV.
I have converted the information to a list by splitting with respect to ';' and now for each item of the list, I am using regex to convert it to another list holding only required information and in particular sequence and none if that information is not present.
for item in list_1:
direct = []
if re.search(r'([A-Z]{3,}),([\d\w\s.]+)', item):
match = re.search(r'([A-Z]{3,}),([\d\w\s.]+)', item)
direct.append(match.group(1))
direct.append(match.group(2))
break
else:
match1 = re.search(r'\'(\d+)', item)
if match1:
direct.append(match1.group(1))
break
else:
match2 = re.search(r'(r:[\w\d\s.]*,*\s*([\w]*)\s*([A-Z]{2})\s([\d]{5}),\s*(.\d{3}.\s\d{3}-\d{4}),\s*(\w+[\w\d\s.]+#+\s*[\w\d.]+\.+\w+))', item)
if match2:
direct.append(match2.group(2))
direct.append(match2.group(3))
direct.append(match2.group(4))
direct.append(match2.group(5))
direct.append(match2.group(6))
break
else:
direct.append('')
break
print direct
when I run this code, list only shows first match.
And if I run each re.search operation individually, it is working. But the moment I try to combine them using nested if-else, nothing happens. So can anyone suggest where is the logic wrong?
Expected output:
[AAMOD, Robert Kevin, 91, Sales Associate, Salt Lake City, UT, 84105, (xxx) xxx- xxxx, abc#yahoo.com]
I could use some advice, how to search in a list for genres with words in a string as parameter.
So if i have created a list called genre, which contains a string like:
['crime, drama,action']
I want to use this list to search for movies containing all genres or maybe just 1 of them.
I have created a big list which contains all information about the movie. An example from the list you see here:
('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n'),
So if i want to search for saving private ryan, which is a drama + action genre, but not crime, how can i then use my genre list to search for it?
Is there a way to search by something in the string?
UPDATE:
So this is what i done so far. I have tried to precess my tuple movie and use the def function.
Navn_rating = dict(zip(names1, ratings))
Actor_genre = dict(zip(actorlist, genre_list))
var = raw_input("Enter movie: ")
print "you entered ", var
for row in name_rating_actor_genre:
if var in row:
movie.append(row)
print "Movie found",movie
def process_movie(movie):
return {'title': names1, 'rating': ratings, 'actors': actorlist, 'genre': genre_list}
You can "search by something in the string" using in:
>>> genres = 'action, drama, war,\n'
>>> 'action' in genres
True
>>> 'drama' in genres
True
>>> 'romantic comedy' in genres
False
But note that this might not always give the result you want:
>>> 'war' in 'award-winning'
True
I think you should change your data structure. Consider making each movie a dictionary e.g.
{'title': 'Saving Private Ryan', 'year': 1998, 'rating': 8.5, 'actors': ['Tom Hanks', ...], 'genres': ['action', ...]}
then your query becomes
if 'drama' in movie.genres and 'action' in movie.genres:
You can use indexing, split and slicing to process your tuple of strings to make the values of the dictionary, e.g.:
>>> movie = ('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')
>>> int(movie[0][-5:-1])
1998
>>> float(movie[1])
8.5
>>> movie[0][:-7]
'Saving Private Ryan'
>>> movie[2].split(",")
['Tom Hanks', ' Matt Damon', " Tom Sizemore'", '\n']
As you can see, some tidying up may be needed. You could write a function that takes the tuple as an argument and returns the corresponding dictionary:
def process_movie(movie_tuple):
# ... process the tuple here
return {'title': title, 'rating': rating, ...}
and apply this to your list of movies using map:
movies = list(map(process_movie, name_rating_actor_genre))
Edit:
You will know your function works when the following line doesn't raise any errors:
assert process_movie(('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')) == {"title": "Saving Private Ryan", "year": 1998, "rating": 8.5, "actors": ["Tom Hanks", "Matt Damon", "Tom Sizemore"], "genres": ["action", "drama", "war"]}