Using Python CSV and glob to find matching strings and print row - python-2.7

I have hundreds of CSV files and I'm trying to write a Python script that will parse through all of them and print out rows that have matching string(s). I'll be happy if we can get this to work using one string (and not a list of strings). Using Python 2.7.5. I've figured out so far:
The csv module in Python will print the row with the matching string in a particular column (the eighth column from the left):
import csv
reader = csv.reader(open('2015-08-25.csv'))
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row
So the above works for one .csv file. Now I need to parse hundreds of .csv files with glob. The glob module will print out all the file names with this code:
import glob
for name in glob.glob('20??-??-??.csv'):
print name
I tried putting the two together into one script but the error message reads:
File "test7.py", line 6, in
reader = csv.reader(open(csvfiles))
TypeError: coercing to Unicode: need string or buffer, list found
import csv
import glob
csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
reader = csv.reader(open(csvfiles))
for row in reader:
col8 = str(row[8])
if col8 == '36862210':
print row

You are trying to open a List - csvfiles is the list you are iterating on.
Use this instead, because open() expects a filename:
reader = csv.reader(open(filename))

Related

Read multiple excel sheets on specific column and right them in one csv file using python

I have multiple sheets in one excel file like Sheet1, Sheet2, Sheet3,etc. Now I have to list all the particular column in one csv file. Both the sheets has one unique column "Attribute" and only those records should be listed in the csv file line by line. (First sheet's 'Attribute' values should be in 1st line and 2nd sheet's 'Attribute' values should be in 2nd line and etc.,)
If instances,
Sheet1:
Attribute,Order
P,1
Emp_ID,2
DOJ,3
Name,4
Sheet2:
Attribute,Order
C,1
Emp_ID,2
Exp,3
LWD,4
Expected result: (In some .csv file)
P,Emp_ID,DOJ,name
C,Emp_ID,Exp,LWD
Note: Line starting from P should be in first line and C should be in 2nd line and etc.,
Below is my code:
import pandas as pd
excel = 'E:\Python Utility\Inbound.xlsx'
K = 'E:\Python Utility\Headers_Files\All_Header.csv'
df = pd.read_excel(excel,sheet_name = None)
data = pd.DataFrame(df,columns=['Attribute']).T
print data
M = data.to_csv(K, encoding='utf-8',index=False,header=False)
print 'done'
Output show's as below:
Empty DataFrame Columns: [] Index: [Attribute] done
If I use sheet_name = 'sheet1' then DataFrame works good and data loaded as expected in csv file.
Thanks in advance

Python: Write two columns in csv for many lines

I have two parameters like filename and time and I want to write them in a column in a csv file. These two parameters are in a for-loop so their value is changed in each iteration.
My current python code is the one below but the resulting csv is not what I want:
import csv
import os
with open("txt/scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
filename = ["one","two", "three"]
time = ["1","2", "3"]
zipped_lists = zip(filename,time)
for row in zipped_lists:
print row
writer.writerow(row)
My csv file must be like below. The , must be the delimeter. So I must get two columns.
one, 1
two, 2
three, 3
My csv file now reads as the following picture. The data are stored in one column.
Do you know how to fix this?
Well, the issue here is, you are using writerows instead of writerow
import csv
import os
with open("scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
level_counter = 0
max_levels = 3
filename = ["one","two", "three"]
time = ["1","2", "3"]
while level_counter < max_levels:
writer.writerow((filename[level_counter], time[level_counter]))
level_counter = level_counter +1
This gave me the result:
one,1
two,2
three,3
Output:
This is another solution
Put the following code into a python script that we will call sc-123.py
filename = ["one","two", "three"]
time = ["1","2", "3"]
for a,b in zip(filename,time):
print('{}{}{}'.format(a,',',b))
Once the script is ready, run it like that
python2 sc-123.py > scalable_decoding_time.csv
You will have the results formatted the way you want
one,1
two,2
three,3

Python3: split up list and save as file - how to?

I'm kinda new to Python, so thx for your help!
I want to tell Python to take a big .csv list and split it up to many small lists of only two columns
Take this .csv file
Always use column "year" which is the first column
Then take always the next column (for-loop?), starting with column 2 which is "Object1", then column 3 which is "Object2" and so on...
Save each list as .csv - now only containing two columns - and name it after the second column (f.e. "Object1")
So far I am up to this:
import csv
object = 0
f = open("/home/Data/data.csv")
csv_f = csv.reader(f, delimiter=';', quotechar='|')
writer = csv.writer(csv_f)
for row in csv_f:
writer("[0],[object]")
object += 1
f.close()
Your code is trying to open the same file for reading and writing, which may have unexpected results.
Think about your problem as a series of steps; one way to approach the problem is:
Open the big file
Read the first line of the file, which contains the column titles.
Go through the column titles (the first line of your big csv file), skipping the first one, then:
For each column title, create a new csv file, where the filename is the name of the column.
Take the value of the first column, plus the value of the column you are currently reading, and write it to the file.
Repeat till all column titles are read
Close the file
Close the big file.
Here is the same approach as above, taking advantage of Python's csv reading capabilities:
import csv
with open('big-file.csv') as f:
reader = csv.reader(f, delimiter=';', quotechar='|')
titles = next(reader)
for index, column_name in enumerate(titles[1:]):
with open('{}.csv'.format(column_name), 'w') as i:
writer = csv.writer(i, delimiter=';', quotechar='|')
for row in reader:
writer.writerow((row[0],row[index+1]))
f.seek(0) # start from the top of the big file again
next(reader) # skip the header column

Python: Reading in .csv data as dictionary and printing out data as dictionary to .csv file?

I'm writing a python executable script that does the following:
I want to gather information from a .csv file and read it into python as a dictionary. This .csv file contains several columns of information with headings, and I only want to extract particular columns (those columns with specific headings I want) , and print those columns out to another .csv file. I am using the functions DictReader and DictWriter.
I am reading in the .csv file as a dictionary (with the headings being the key and the column values being the items),and output the information as a dictionary to another .csv file.
After I read it in, I print out the items in the particular headings (so I can double check what I have read it). I then open up a new .csv file and want to write the data (which I have just read in) as a dictionary. I can write in the keys (column headings) but my code doesn't print any of the item values for some reason. The headings that I want in this case are 'Name' and 'DOB'.
Here is my code:
#!/usr/bin/python
import os
import os.path
import re
import sys
import pdb
import csv
csv_file = csv.DictReader(open(sys.argv[1],'rU'),delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
test_file = open('test2.csv','wr')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
Any ideas of where I'm going wrong ? I want to print the item values under their their corresponding column headers in the output file.
I am using python 2.7.11 on a Mac machine. I am also printing values to the terminal.
You're unfortunately tricked by your own testing, that is, the printing of the individual rows. By looping through csv_file initially, you've exhausted the iterator and are at the end. Further iterations, as done in the bottom of your code, are not possible and will be ignored.
Your question is essentially a duplicate of various other question, such as how to read from a CSV file repeatedly. Albeit that the issue here comes up in a different way: you didn't realise what the problem was, while those questions do know the cause, but not the solution.
Answers to those questions tell you to simply reset the file pointer of the input file. Unfortunately, the input file gets closed promptly after reading, in your current code.
Thus, something like this should work:
infile = open(sys.argv[1], 'rU')
csv_file = csv.DictReader(infile ,delimiter = ',')
<all other code>
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
infile.close()
As an aside, just use the with statement when opening files:
with open(sys.argv[1], 'rU') as infile, open('test2.csv', 'wr') as outfile:
csv_file = csv.DictReader(infile ,delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
csvwriter = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames)
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
Note: DictWriter will take care of the header row. No need to write it yourself.

How can I tokenize my CSV file in Python?

I open the folder and then I tried to tokenize each word in the CSV file. Is this code correct? I tried to read the file and then tokenize, but I cannot see the result. I am new in programming, can some one help me with it?
filename=open("positivecsv.csv","r")
type(raw) #str
tokens = []
for line in filename.readlines():
tokens+=nltk.word_tokenize(line)
>>> print tokens
Python has a built-in CVS reader and writer
, so you need to do it yourself.
Here is an example:
import csv
with open('positivecsv.csv', 'r') as csvfile: # this will close the file automatically.
reader = csv.reader(csvfile)
for row in reader:
print row
Row will be a list which contains all elements of the current line.