How can I tokenize my CSV file in Python?

How can I tokenize my CSV file in Python? - python-2.7

I open the folder and then I tried to tokenize each word in the CSV file. Is this code correct? I tried to read the file and then tokenize, but I cannot see the result. I am new in programming, can some one help me with it?
filename=open("positivecsv.csv","r")
type(raw) #str
tokens = []
for line in filename.readlines():
tokens+=nltk.word_tokenize(line)
>>> print tokens

Python has a built-in CVS reader and writer
, so you need to do it yourself.
Here is an example:
import csv
with open('positivecsv.csv', 'r') as csvfile: # this will close the file automatically.
reader = csv.reader(csvfile)
for row in reader:
print row
Row will be a list which contains all elements of the current line.

Related

Python3: split up list and save as file - how to?

I'm kinda new to Python, so thx for your help!
I want to tell Python to take a big .csv list and split it up to many small lists of only two columns
Take this .csv file
Always use column "year" which is the first column
Then take always the next column (for-loop?), starting with column 2 which is "Object1", then column 3 which is "Object2" and so on...
Save each list as .csv - now only containing two columns - and name it after the second column (f.e. "Object1")
So far I am up to this:
import csv
object = 0
f = open("/home/Data/data.csv")
csv_f = csv.reader(f, delimiter=';', quotechar='|')
writer = csv.writer(csv_f)
for row in csv_f:
writer("[0],[object]")
object += 1
f.close()

Your code is trying to open the same file for reading and writing, which may have unexpected results.
Think about your problem as a series of steps; one way to approach the problem is:
Open the big file
Read the first line of the file, which contains the column titles.
Go through the column titles (the first line of your big csv file), skipping the first one, then:
For each column title, create a new csv file, where the filename is the name of the column.
Take the value of the first column, plus the value of the column you are currently reading, and write it to the file.
Repeat till all column titles are read
Close the file
Close the big file.
Here is the same approach as above, taking advantage of Python's csv reading capabilities:
import csv
with open('big-file.csv') as f:
reader = csv.reader(f, delimiter=';', quotechar='|')
titles = next(reader)
for index, column_name in enumerate(titles[1:]):
with open('{}.csv'.format(column_name), 'w') as i:
writer = csv.writer(i, delimiter=';', quotechar='|')
for row in reader:
writer.writerow((row[0],row[index+1]))
f.seek(0) # start from the top of the big file again
next(reader) # skip the header column

how to split csv file content based on 'dot' in python code

how to split csv file content based on dot in python code
eg: Smt. Pattu Ramamurthy => [Smt] [Pattu Ramamurthy]
please anyone tell

I guess what you are looking for is
import csv
with open(csvfile,'rb') as f:
fcsv=csv.reader(f,delimiter='.')
for row in fcsv:
print row
I would recommend reading the documentation of csv

Python: Reading in .csv data as dictionary and printing out data as dictionary to .csv file?

I'm writing a python executable script that does the following:
I want to gather information from a .csv file and read it into python as a dictionary. This .csv file contains several columns of information with headings, and I only want to extract particular columns (those columns with specific headings I want) , and print those columns out to another .csv file. I am using the functions DictReader and DictWriter.
I am reading in the .csv file as a dictionary (with the headings being the key and the column values being the items),and output the information as a dictionary to another .csv file.
After I read it in, I print out the items in the particular headings (so I can double check what I have read it). I then open up a new .csv file and want to write the data (which I have just read in) as a dictionary. I can write in the keys (column headings) but my code doesn't print any of the item values for some reason. The headings that I want in this case are 'Name' and 'DOB'.
Here is my code:
#!/usr/bin/python
import os
import os.path
import re
import sys
import pdb
import csv
csv_file = csv.DictReader(open(sys.argv[1],'rU'),delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
test_file = open('test2.csv','wr')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
Any ideas of where I'm going wrong ? I want to print the item values under their their corresponding column headers in the output file.
I am using python 2.7.11 on a Mac machine. I am also printing values to the terminal.

You're unfortunately tricked by your own testing, that is, the printing of the individual rows. By looping through csv_file initially, you've exhausted the iterator and are at the end. Further iterations, as done in the bottom of your code, are not possible and will be ignored.
Your question is essentially a duplicate of various other question, such as how to read from a CSV file repeatedly. Albeit that the issue here comes up in a different way: you didn't realise what the problem was, while those questions do know the cause, but not the solution.
Answers to those questions tell you to simply reset the file pointer of the input file. Unfortunately, the input file gets closed promptly after reading, in your current code.
Thus, something like this should work:
infile = open(sys.argv[1], 'rU')
csv_file = csv.DictReader(infile ,delimiter = ',')
<all other code>
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
infile.close()
As an aside, just use the with statement when opening files:
with open(sys.argv[1], 'rU') as infile, open('test2.csv', 'wr') as outfile:
csv_file = csv.DictReader(infile ,delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
csvwriter = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames)
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
Note: DictWriter will take care of the header row. No need to write it yourself.

python search for file extension in csv, return full string

I have a csv of file names, and I want to search through the file and return only file names with the .tif extension. I thought maybe I had to write a regular expression, but I couldn't figure that out either. I know this is a simple question, but I'm new at python and could really use some basic help. Thank you!
import csv
import re
with open('all_file_names.csv') as f:
reader = csv.reader(f)
#a = re.compile('tif')
#b = a.search(reader)
#findall()
for row in reader:
str.find(".tif") in reader
print(row)

with open('all_file_names.csv') as f:
for line in f:
if '.tif' in line:
print(line)
This assumes that every line only has one filename. you could also do:
if line.endswith('.tif'):

Using Python CSV and glob to find matching strings and print row

I have hundreds of CSV files and I'm trying to write a Python script that will parse through all of them and print out rows that have matching string(s). I'll be happy if we can get this to work using one string (and not a list of strings). Using Python 2.7.5. I've figured out so far:
The csv module in Python will print the row with the matching string in a particular column (the eighth column from the left):
import csv
reader = csv.reader(open('2015-08-25.csv'))
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row
So the above works for one .csv file. Now I need to parse hundreds of .csv files with glob. The glob module will print out all the file names with this code:
import glob
for name in glob.glob('20??-??-??.csv'):
print name
I tried putting the two together into one script but the error message reads:
File "test7.py", line 6, in
reader = csv.reader(open(csvfiles))
TypeError: coercing to Unicode: need string or buffer, list found
import csv
import glob
csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
reader = csv.reader(open(csvfiles))
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row

You are trying to open a List - csvfiles is the list you are iterating on.
Use this instead, because open() expects a filename:
reader = csv.reader(open(filename))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I tokenize my CSV file in Python? - python-2.7

Related

Python3: split up list and save as file - how to?

how to split csv file content based on 'dot' in python code

Python: Reading in .csv data as dictionary and printing out data as dictionary to .csv file?

python search for file extension in csv, return full string

Using Python CSV and glob to find matching strings and print row

Categories

Resources