Convert a text file into a list - list

This is what my text file consists of:
"I like ice cream very much"
So far this is my code:
f = open('nums.txt', 'r')
list = []
data1 = readline()
print (data1)
This is the output:
I like ice cream very much
I was wondering how I could get it so each word is separated e.g. ['I', 'like', 'ice', 'cream', 'very', 'much']
I am working in Python 3.3. Any ideas?

Use the str.split method:
print(data1.split())
>>> data1 = 'I like ice cream very much'
>>> data1.split()
['I', 'like', 'ice', 'cream', 'very', 'much']

Here is the corresponding one-liner:
print(open("nums.txt").read().split())

Related

append multiple numbers into a list

I have this column of numbers from a txt file that I want to append into a list:
18.0
13.0
10.0
12.0
8.0
my code for placing all these numbers into a list is
last_number_lis = []
for numbers_to_put_in in (path/to/txt):
last_number_lis.append(float(last_number))
print last_number_lis
I want the list to look like
[18.0,13.0,10.0,12.0,8.0]
but instead, when running the code, it shows
[18.0]
[13.0]
[10.0]
[12.0]
[8.0]
Is there any way that all the number can be in one line. Later on, I would like to add all the numbers up. Thanks for your help!!
you can append a list just like :
>>> list=[]
>>> list.append(18.0)
>>> list.append(13.0)
>>> list.append(10.0)
>>> list
[18.0, 13.0, 10.0]
but depend where your number are coming from ...
for example with input in terminal :
>>> list=[]
>>> t=input("type a number to append the list : ")
type a number to append the list : 12.45
>>> list.append(float(t))
>>> t=input("type a number to append the list : ")
type a number to append the list : 15.098
>>> list.append(float(t))
>>> list
[12.45, 15.098]
or reading from file :
>>> list=[]
>>> with open('test.txt', 'r') as infile:
... for i in infile:
... list.append(float(i))
...
>>> list
[13.189, 18.8, 15.156, 11.0]
If it is from a .txt file you would have to do the readline() method,
You could do a for loop and loop through the list of numbers (you never know how many numbers you may be given and might as well let the loop handle it,
with open(file_name) as f:
elemts = f.readlines()
elemts = [x.strip() for x in content]
and then you'd want to loop through the file and add the elements in the list
last_number_list = []
for last_number in elements:
last_number_list.append(float(last_number))
print last_number_list
A slightly less compact but easy to read approach is
num_list = []
f = open('file.txt', 'r') # open in read mode 'r'
lines = f.readlines() # read all lines in file
f.close() # safe to close file now
for line in lines:
num_list.append(float(line.strip()))
print num_list

How to append a row in CSV file to list?

I have a CSV file contains data reviews and I want to append it to list.
Here is a sample in my file.csv:
I love eating them and they are good for watching TV and looking at movies
This taffy is so good. It is very soft and chewy
I want save in a list all the words of the second line and print them:
['This', 'taffy', 'is', 'so', 'good.', 'It', 'is', 'very', 'soft', 'and', 'chewy']
I tried this:
import csv
with open('file.csv', 'r') as csvfile:
data = csv.reader(csvfile, delimiter=',')
texts = []
next(data)
for row in data:
texts.append(row[2])
print(texts)
My problem is it doesn't print anythings. Can anyone help here?.. Thanks in advance
Don't forget to import csv, if you want to save all the words in the second line, you have to enumerate the lines and take what you want, after that split them and save it in the list, like this:
import csv
texts = []
with open('csvfile.csv', 'r') as csvfile:
for i, line in enumerate(csvfile):
if i == 1:
for word in line.split():
texts.append(word)
print(texts)
$['This', 'taffy', 'is', 'so', 'good.', 'It', 'is', 'very', 'soft', 'and', 'chewy']

Print line if any of these words are matched

I have a text file with 1000+ lines, each one representing a news article about a topic that I'm researching. Several hundred lines/articles in this dataset are not about the topic, however, and I need to remove these.
I've used grep to remove many of them (grep -vwE "(wordA|wordB)" test8.txt > test9.txt), but I now need to go through the rest manually.
I have a working code that finds all lines that do not contain a certain word, prints this line to me, and asks if it should be removed or not. It works well, but I'd like to include several other words. E.g. let's say my research topic is meat eating trends. I hope to write a script that prints lines that do not contain 'chicken' or 'pork' or 'beef', so I can manually verify if the lines/articles are about the relevant topic.
I know I can do this with elif, but I wonder if there is a better and simpler way? E.g. I tried if "chicken" or "beef" not in line: but it did not work.
Here's the code I have:
orgfile = 'text9.txt'
newfile = 'test10.txt'
newFile = open(newfile, 'wb')
with open("test9.txt") as f:
for num, line in enumerate(f, 1):
if "chicken" not in line:
print "{} {}".format(line.split(',')[0], num)
testVar = raw_input("1 = delete, enter = skip.")
testVar = testVar.replace('', '0')
testVar = int(testVar)
if testVar == 10:
print ''
os.linesep
else:
f = open(newfile,'ab')
f.write(line)
f.close()
else:
f = open(newfile,'ab')
f.write(line)
f.close()
Edit: I tried Pieter's answer to this question but it does not work here, presumeably because I am not working with integers.
you can use any or all and a generator. For example
>>> key_word={"chicken","beef"}
>>> test_texts=["the price of beef is too high", "the chicken farm now open","tomorrow there is a lunar eclipse","bla"]
>>> for title in test_texts:
if any(key in title for key in key_words):
print title
the price of beef is too high
the chicken farm now open
>>>
>>> for title in test_texts:
if not any(key in title for key in key_words):
print title
tomorrow there is a lunar eclipse
bla
>>>

Search a list with words in string as parameter in python

I could use some advice, how to search in a list for genres with words in a string as parameter.
So if i have created a list called genre, which contains a string like:
['crime, drama,action']
I want to use this list to search for movies containing all genres or maybe just 1 of them.
I have created a big list which contains all information about the movie. An example from the list you see here:
('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n'),
So if i want to search for saving private ryan, which is a drama + action genre, but not crime, how can i then use my genre list to search for it?
Is there a way to search by something in the string?
UPDATE:
So this is what i done so far. I have tried to precess my tuple movie and use the def function.
Navn_rating = dict(zip(names1, ratings))
Actor_genre = dict(zip(actorlist, genre_list))
var = raw_input("Enter movie: ")
print "you entered ", var
for row in name_rating_actor_genre:
if var in row:
movie.append(row)
print "Movie found",movie
def process_movie(movie):
return {'title': names1, 'rating': ratings, 'actors': actorlist, 'genre': genre_list}
You can "search by something in the string" using in:
>>> genres = 'action, drama, war,\n'
>>> 'action' in genres
True
>>> 'drama' in genres
True
>>> 'romantic comedy' in genres
False
But note that this might not always give the result you want:
>>> 'war' in 'award-winning'
True
I think you should change your data structure. Consider making each movie a dictionary e.g.
{'title': 'Saving Private Ryan', 'year': 1998, 'rating': 8.5, 'actors': ['Tom Hanks', ...], 'genres': ['action', ...]}
then your query becomes
if 'drama' in movie.genres and 'action' in movie.genres:
You can use indexing, split and slicing to process your tuple of strings to make the values of the dictionary, e.g.:
>>> movie = ('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')
>>> int(movie[0][-5:-1])
1998
>>> float(movie[1])
8.5
>>> movie[0][:-7]
'Saving Private Ryan'
>>> movie[2].split(",")
['Tom Hanks', ' Matt Damon', " Tom Sizemore'", '\n']
As you can see, some tidying up may be needed. You could write a function that takes the tuple as an argument and returns the corresponding dictionary:
def process_movie(movie_tuple):
# ... process the tuple here
return {'title': title, 'rating': rating, ...}
and apply this to your list of movies using map:
movies = list(map(process_movie, name_rating_actor_genre))
Edit:
You will know your function works when the following line doesn't raise any errors:
assert process_movie(('Saving Private Ryan (1998)', '8.5', "Tom Hanks, Matt Damon, Tom Sizemore',\n", 'action, drama, war,\n')) == {"title": "Saving Private Ryan", "year": 1998, "rating": 8.5, "actors": ["Tom Hanks", "Matt Damon", "Tom Sizemore"], "genres": ["action", "drama", "war"]}

How to split tokens, count number of tokens, and write in a file in python?

I have file which has data in lines as follows:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour', 'The Smashing Pumpkins', 'Warner Bros. Entertainment','This is a good Beer]
['Voices Inside', 'Expressivista', 'The Kentucky Fried Movie', 'The Bridges of Madison County']
and so on. I want to re-write the data into a file which has lines with tokens with words less than 3 or some other number. e.g.:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour']
['Voices Inside', 'Expressivista']
this is what I have tried so far:
for line in open(file):
line = line.strip()
line = line.rstrip()
prog = re.compile("([a-z0-9]){32}")
if line:
line = line.replace('"', '')
line = line.split(",")
if re.match(prog, line[0]) and len(line)>2:
wo=[]
for words in line:
word=words.split()
if len(word)<3:
print word.append(word)
But the output says None. Any thoughts where I am making a mistake?
A better way to do what you're doing is to use ast.literal_eval, which automagically converts string representations of Python objects (e.g. lists) into actual Python objects.
import ast
# raw data
data = """
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour', 'The Smashing Pumpkins', 'Warner Bros. Entertainment','This is a good Beer']
['Voices Inside', 'Expressivista', 'The Kentucky Fried Movie', 'The Bridges of Madison County']
"""
# set threshold number of tokens
threshold = 3
# split into lines
lines = data.split('\n')
# parse non-blank lines into python lists
lists = [ast.literal_eval(line) for line in lines if line]
# for each list, keep only those tokens with less than `threshold` tokens
result = [[item for item in lst if len(item.split()) < threshold]
for lst in lists]
# show result
for line in result:
print(line)
Result:
['Marilyn Manson', 'Web', 'Skydera Inc.', 'Stone Sour']
['Voices Inside', 'Expressivista']
I think the reason your code isn't working is that you're trying to match line[0] against your regex prog - but the problem is that line[0] isn't 32 characters long for either of your lines, so your regex won't match.