Related
I created the following Series and DataFrame:
import pandas as pd
Series_1 = pd.Series({'Name': 'Adam','Item': 'Sweet','Cost': 1})
Series_2 = pd.Series({'Name': 'Bob','Item': 'Candy','Cost': 2})
Series_3 = pd.Series({'Name': 'Cathy','Item': 'Chocolate','Cost': 3})`
df = pd.DataFrame([Series_1,Series_2,Series_3], index=['Store 1', 'Store 2', 'Store 3'])
I want to display/print out just one column from the DataFrame (with or without the header row):
Either
Adam
Bob
Cathy
Or:
Sweet
Candy
Chocolate
I have tried the following code which did not work:
print(df['Item'])
print(df.loc['Store 1'])
print(df.loc['Store 1','Item'])
print(df.loc['Store 1','Name'])
print(df.loc[:,'Item'])
print(df.iloc[0])
Can I do it in one simple line of code?
By using to_string
print(df.Name.to_string(index=False))
Adam
Bob
Cathy
For printing the Name column
df['Name']
Not sure what you are really after but if you want to print exactly what you have you can do:
Option 1
print(df['Item'].to_csv(index=False))
Sweet
Candy
Chocolate
Option 2
for v in df['Item']:
print(v)
Sweet
Candy
Chocolate
I have a text which contains different news articles about terrorist attacks. Each article starts with an html tag (<p>Advertisement) and I would like to extract from each article a specific information: the number of people wounded in the terrorist attacks.
This is a sample of the text file and how the articles are separated:
[<p>Advertisement , By MILAN SCHREUER and ALISSA J. RUBIN OCT. 5, 2016
, BRUSSELS — A man wounded 2 police officers with a knife in Brussels around noon on Wednesday in what the authorities called “a potential terrorist attack.” , The two officers were attacked on the Boulevard Lambermont.....]
[<p>Advertisement ,, By KAREEM FAHIM and MOHAMAD FAHIM ABED JUNE 30, 2016
, At least 33 people were killed and 25 were injured when the Taliban bombed buses carrying police cadets on the outskirts of Kabul, Afghanistan, on Thursday. , KABUL, Afghanistan — Taliban insurgents bombed a convoy of buses carrying police cadets on the outskirts of Kabul, the Afghan capital, on Thursday, killing at least 33 people, including four civilians, according to government officials and the United Nations. , During a year...]
This is my code so far:
text_open = open("News_cleaned_definitive.csv")
text_read = text_open.read()
splitted = text.read.split("<p>")
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) were injured")
for article in splitted:
result = re.findall(pattern,article)
The output that I get is:
[]
[]
[]
[('', '40', '')]
[('', '150', '')]
[('94', '', '')]
And I would like to make the output more readable and then save it as csv file:
article_1,0
article_2,0
article_3,40
article_3,150
article_3,94
Any suggestion in how to make it more readable?
I rewrote your loop like this and merged with csv write since you requested it:
import csv
with open ("wounded.csv","w",newline="") as f:
writer = csv.writer(f, delimiter=",")
for i,article in enumerate(splitted):
result = re.findall(pattern,article)
nb_casualties = sum(int(x) for x in result[0] if x) if result else 0
row=["article_{}".format(i+1),nb_casualties]
writer.writerow(row)
get index of the article using enumerate
sum the number of victims (in case more than 1 group matches) using a generator comprehension to convert to integer and pass it to sum, that only if something matched (ternary expression checks that)
create the row
print it, or optionally write it as row (one row per iteration) of a csv.writer object.
I am new and have been working on this for a week now but can't find any solution. I hope someone can help me figure this out.
How can I find items in the list - listitems and output their items individually?
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Is there anyway i can compute the related food in the receipt list?
So far I am only able to find the foods for ONE item. My code is as follows:
I also computed another list of each item is for comparison.
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
I would personally use a dictionary based on keys for each item with associated items as their values, would also be much easier for you to get the results you want, not exactly sure right know how I would accomplish it from the list you made only.
From your original code, add " print (combi) ", " print (checklist) " and " print (correlatedlist) " at the end and you will see it doesn't really append it the way you want.
In Python 3.5:
import itertools
listOfStrings = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
print ('Original list:', listOfStrings)
print ()
listOfLists = [group.replace (', ', ',') .split (',') for group in listOfStrings]
print ('Turned into list of lists:', listOfLists)
print ()
allFoods = set (itertools.chain (*listOfLists))
print ('All individual foods:', allFoods)
print ()
listOfSets = [set (group) for group in listOfLists]
print ('A list of sets is handy, since sets contain no duplicates:', listOfSets)
print ()
dictOfFoods = dict ([[food, set ()] for food in allFoods])
print ('Prepare a dictionary, where we can put the associated foods:', dictOfFoods)
print ()
for food in dictOfFoods:
for foodSet in listOfSets:
if food in foodSet:
dictOfFoods [food] .update (foodSet)
dictOfFoods [food] .remove (food)
print ('The dictionary is now filled:', dictOfFoods)
print ()
for food in dictOfFoods:
print ('People who buy', food, 'also buy:')
for otherFood in dictOfFoods [food]:
print (otherFood)
print ()
Will print:
Original list: ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
Turned into list of lists: [['Beer', 'Chicken'], ['Cake', 'Chocolate'], ['Lemon with ci', 'Chicken'], ['Beer', 'Beer', 'Cake', 'Chocolate']]
All individual foods: {'Chocolate', 'Lemon with ci', 'Chicken', 'Cake', 'Beer'}
A list of sets is handy, since sets contain no duplicates: [{'Chicken', 'Beer'}, {'Chocolate', 'Cake'}, {'Chicken', 'Lemon with ci'}, {'Chocolate', 'Cake', 'Beer'}]
Prepare a dictionary, where we can put the associated foods: {'Chocolate': set(), 'Lemon with ci': set(), 'Cake': set(), 'Beer': set(), 'Chicken': set()}
The dictionary is now filled: {'Chocolate': {'Cake', 'Beer'}, 'Lemon with ci': {'Chicken'}, 'Cake': {'Chocolate', 'Beer'}, 'Beer': {'Chocolate', 'Chicken', 'Cake'}, 'Chicken': {'Lemon with ci', 'Beer'}}
People who buy Chocolate also buy:
Cake
Beer
People who buy Lemon with ci also buy:
Chicken
People who buy Cake also buy:
Chocolate
Beer
People who buy Beer also buy:
Chocolate
Chicken
Cake
People who buy Chicken also buy:
Lemon with ci
Beer
If you don't want to use itertools and *, you can also make a loop in a loop to traverse all elements of the listOfLists and add them to allFoods, which you initially make empty.
It took me sometime to understand exactly what you wanted, but this is a working solution.
listitems = ['Beer, Chicken', 'Cake, Chocolate', 'Lemon with ci, Chicken', 'Beer, Beer, Cake, Chocolate']
eachitems = ['Beer', 'Cake', 'Chocolate', 'Lemon with ci', 'Chicken']
for item in eachitems:
assoc = [associated for associated in listitems if item in associated]
result = set()
for itemlist in assoc:
itemlist = itemlist.replace(', ', ',').split(',')
itemlist = set(itemlist)
itemlist.remove(item)
result = result | itemlist
print('People who buy {} also buy: '.format(item), ', '.join(sorted(result)))
Output
People who buy Beer also buy: Cake, Chicken, Chocolate
People who buy Cake also buy: Beer, Chocolate
People who buy Chocolate also buy: Beer, Cake
People who buy Lemon with ci also buy: Chicken
People who buy Chicken also buy: Beer, Lemon with ci
They key part to this solution is the use of Sets to remove the duplicate items and the |(union) operator.
As a side note, instead of using | like this
result = result | itemlist
you can modify the set in place with
result.update(itemlist)
I could use some advice, how to search in a list for genres with words in a string as parameter.
So if i have created a list called genre, which contains a string like:
['crime, drama,action']
I want to use this list to search for movies containing all genres or maybe just 1 of them.
I have created a big list which contains all information about the movie. An example from the list you see here:
('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n'),
So if i want to search for saving private ryan, which is a drama + action genre, but not crime, how can i then use my genre list to search for it?
Is there a way to search by something in the string?
UPDATE:
So this is what i done so far. I have tried to precess my tuple movie and use the def function.
Navn_rating = dict(zip(names1, ratings))
Actor_genre = dict(zip(actorlist, genre_list))
var = raw_input("Enter movie: ")
print "you entered ", var
for row in name_rating_actor_genre:
if var in row:
movie.append(row)
print "Movie found",movie
def process_movie(movie):
return {'title': names1, 'rating': ratings, 'actors': actorlist, 'genre': genre_list}
You can "search by something in the string" using in:
>>> genres = 'action, drama, war,\n'
>>> 'action' in genres
True
>>> 'drama' in genres
True
>>> 'romantic comedy' in genres
False
But note that this might not always give the result you want:
>>> 'war' in 'award-winning'
True
I think you should change your data structure. Consider making each movie a dictionary e.g.
{'title': 'Saving Private Ryan', 'year': 1998, 'rating': 8.5, 'actors': ['Tom Hanks', ...], 'genres': ['action', ...]}
then your query becomes
if 'drama' in movie.genres and 'action' in movie.genres:
You can use indexing, split and slicing to process your tuple of strings to make the values of the dictionary, e.g.:
>>> movie = ('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')
>>> int(movie[0][-5:-1])
1998
>>> float(movie[1])
8.5
>>> movie[0][:-7]
'Saving Private Ryan'
>>> movie[2].split(",")
['Tom Hanks', ' Matt Damon', " Tom Sizemore'", '\n']
As you can see, some tidying up may be needed. You could write a function that takes the tuple as an argument and returns the corresponding dictionary:
def process_movie(movie_tuple):
# ... process the tuple here
return {'title': title, 'rating': rating, ...}
and apply this to your list of movies using map:
movies = list(map(process_movie, name_rating_actor_genre))
Edit:
You will know your function works when the following line doesn't raise any errors:
assert process_movie(('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')) == {"title": "Saving Private Ryan", "year": 1998, "rating": 8.5, "actors": ["Tom Hanks", "Matt Damon", "Tom Sizemore"], "genres": ["action", "drama", "war"]}
I have been trying to find the frequency distribution of nouns in a given sentence. If I do this:
text = "This ball is blue, small and extraordinary. Like no other ball."
text=text.lower()
token_text= nltk.word_tokenize(text)
tagged_sent = nltk.pos_tag(token_text)
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP","NNS"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
It considers "ball" and "ball." as separate words. So I went ahead and tokenized the sentence before tokenizing the words:
text = "This ball is blue, small and extraordinary. Like no other ball."
text=text.lower()
sentences = nltk.sent_tokenize(text)
words = [nltk.word_tokenize(sent)for sent in sentences]
tagged_sent = [nltk.pos_tag(sent)for sent in words]
nouns= []
for word,pos in tagged_sent:
if pos in ['NN',"NNP","NNS"]:
nouns.append(word)
freq_nouns=nltk.FreqDist(nouns)
print freq_nouns
It gives the following error:
Traceback (most recent call last):
File "C:\beautifulsoup4-4.3.2\Trial.py", line 19, in <module>
for word,pos in tagged_sent:
ValueError: too many values to unpack
What am I doing wrong? Please help.
You were so close!
In this case, you changed your tagged_sent from a list of tuples to a list of lists of tuples because of your list comprehension tagged_sent = [nltk.pos_tag(sent)for sent in words].
Here's some things you can do to discover what type of objects you have:
>>> type(tagged_sent), len(tagged_sent)
(<type 'list'>, 2)
This shows you that you have a list; in this case of 2 sentences. You can further inspect one of those sentences like this:
>>> type(tagged_sent[0]), len(tagged_sent[0])
(<type 'list'>, 9)
You can see that the first sentence is another list, containing 9 items. Well, what does one of those items look like? Well, lets look at the first item of the first list:
>>> tagged_sent[0][0]
('this', 'DT')
If your curious to see the entire object, which I frequently am, you can ask the pprint (pretty-print) module to make it nicer to look at like this:
>>> from pprint import pprint
>>> pprint(tagged_sent)
[[('this', 'DT'),
('ball', 'NN'),
('is', 'VBZ'),
('blue', 'JJ'),
(',', ','),
('small', 'JJ'),
('and', 'CC'),
('extraordinary', 'JJ'),
('.', '.')],
[('like', 'IN'), ('no', 'DT'), ('other', 'JJ'), ('ball', 'NN'), ('.', '.')]]
So, the long answer is your code needs to iterate over the new second layer of lists, like this:
nouns= []
for sentence in tagged_sent:
for word,pos in sentence:
if pos in ['NN',"NNP","NNS"]:
nouns.append(word)
Of course, this just returns a non-unique list of items, which look like this:
>>> nouns
['ball', 'ball']
You can unique-ify this list in many different ways, but you can quickly by using the set() data structure, like so:
unique_nouns = list(set(nouns))
>>> print unique_nouns
['ball']
For an examination of other ways you can unique-ify a list of items, see the slightly older but extremely useful: http://www.peterbe.com/plog/uniqifiers-benchmark