Python3: split up list and save as file - how to? - list

I'm kinda new to Python, so thx for your help!
I want to tell Python to take a big .csv list and split it up to many small lists of only two columns
Take this .csv file
Always use column "year" which is the first column
Then take always the next column (for-loop?), starting with column 2 which is "Object1", then column 3 which is "Object2" and so on...
Save each list as .csv - now only containing two columns - and name it after the second column (f.e. "Object1")
So far I am up to this:
import csv
object = 0
f = open("/home/Data/data.csv")
csv_f = csv.reader(f, delimiter=';', quotechar='|')
writer = csv.writer(csv_f)
for row in csv_f:
writer("[0],[object]")
object += 1
f.close()

Your code is trying to open the same file for reading and writing, which may have unexpected results.
Think about your problem as a series of steps; one way to approach the problem is:
Open the big file
Read the first line of the file, which contains the column titles.
Go through the column titles (the first line of your big csv file), skipping the first one, then:
For each column title, create a new csv file, where the filename is the name of the column.
Take the value of the first column, plus the value of the column you are currently reading, and write it to the file.
Repeat till all column titles are read
Close the file
Close the big file.
Here is the same approach as above, taking advantage of Python's csv reading capabilities:
import csv
with open('big-file.csv') as f:
reader = csv.reader(f, delimiter=';', quotechar='|')
titles = next(reader)
for index, column_name in enumerate(titles[1:]):
with open('{}.csv'.format(column_name), 'w') as i:
writer = csv.writer(i, delimiter=';', quotechar='|')
for row in reader:
writer.writerow((row[0],row[index+1]))
f.seek(0) # start from the top of the big file again
next(reader) # skip the header column

Related

How to append a specific row from an existing csv file to a new one with Pyhton 2

I have two csv files test1.csv and test2.csv that contain two rows with values (altitude,time).
test1.csv is quite larger that test2.csv.
I want to compare the altitudes based on the same time
I have found this piece of code that runs on Python2
import csv
with open('test1.csv', 'rb') as master:
master_indices = dict((r[0], i) for i, r in enumerate(csv.reader(master)))
with open('test2.csv', 'rb') as hosts:
with open('results.csv', 'wb') as results:
reader = csv.reader(hosts)
writer = csv.writer(results)
writer.writerow(next(reader, []) + ['result'])
for row in reader:
index = master_indices.get(row[0])
if index is not None:
message = 'Same time is found (row {})'.format(index)
else:
message = 'No same time is found'
writer.writerow(row + [message])
and it works fine as it writes the index from time1.csv that was found the same.
The result csv contains the time and altitude of test2.csv and also the message that show when there is match on time value or not.
Since I'm quite new to Python I'm trying to find away so that the results.csv file contains also the altitude column from test1.csv.
I tried to replicated the above code for the test1.csv file in order to add the row by adding the following code to the existing:
with open('test1.csv', 'rb') as master:
with open('results.csv', 'wb') as results:
writer = csv.writer(results)
reader2 = csv.reader(master)
writer.writerow(next(reader2, []) + ['altitude'])
for row in reader2:
writer.writerow(row)
But I got a csv file without the previous result column and an new but empty altitude column.
So eventually the result.csv should contain the following columns:
time,altitude(from test2.csv),altitude(from test1.csv),result
How can this be achieved?

Read multiple excel sheets on specific column and right them in one csv file using python

I have multiple sheets in one excel file like Sheet1, Sheet2, Sheet3,etc. Now I have to list all the particular column in one csv file. Both the sheets has one unique column "Attribute" and only those records should be listed in the csv file line by line. (First sheet's 'Attribute' values should be in 1st line and 2nd sheet's 'Attribute' values should be in 2nd line and etc.,)
If instances,
Sheet1:
Attribute,Order
P,1
Emp_ID,2
DOJ,3
Name,4
Sheet2:
Attribute,Order
C,1
Emp_ID,2
Exp,3
LWD,4
Expected result: (In some .csv file)
P,Emp_ID,DOJ,name
C,Emp_ID,Exp,LWD
Note: Line starting from P should be in first line and C should be in 2nd line and etc.,
Below is my code:
import pandas as pd
excel = 'E:\Python Utility\Inbound.xlsx'
K = 'E:\Python Utility\Headers_Files\All_Header.csv'
df = pd.read_excel(excel,sheet_name = None)
data = pd.DataFrame(df,columns=['Attribute']).T
print data
M = data.to_csv(K, encoding='utf-8',index=False,header=False)
print 'done'
Output show's as below:
Empty DataFrame Columns: [] Index: [Attribute] done
If I use sheet_name = 'sheet1' then DataFrame works good and data loaded as expected in csv file.
Thanks in advance

Python - webscraping; dictionary data structure

I need to scrape this website (http://setkab.go.id/profil-kabinet/#) and produce an Excel file that has headers "Cabinet names" in column 1 and "Era" in column 2. That means each Cabinet name (e.g. Kabinet Presidensil, Kabinet Sjahrir I) should have its own row - alongside its respective era (e.g. Era Revolusi Fisik, Era Republik Indonesia Serikat).
This is the closest I've gotten:
import requests
from bs4 import BeautifulSoup
response = requests.get('http://setkab.go.id/profil-kabinet/#')
soup = BeautifulSoup(response.text, 'html.parser')
eras = soup.find_all('div', attrs={'class':"wpb_accordion_section group"})
setkab = {}
for element in eras:
setkab[element.a.get_text()] = {}
for element in eras:
cabname = element.find('div',attrs={'class':'wpb_wrapper'}).get_text()
setkab[element.a.get_text()]['cbnm'] = cabname
for item in setkab.keys():
print item + setkab[item]['cbnm']
import os, csv
os.chdir("/Users/mxcodes/Code")
with open("setkabfinal.csv", "w") as toWrite:
writer = csv.writer(toWrite, delimiter=",")
writer.writerow(["Era", "Cabinet name"])
for a in setkab.keys():
writer.writerow([a.encode("utf-8"), setkab[a]["cbnm"]])
However, this creates an Excel file with the headers "Era" and "Cabinet names" in column 1 and 2, respectively. It fails to put each Cabinet name in a separate row. For example, it has 'Era Revolusi Fisik' in column 1 and lists all the cabinets together in column 2.
My guess is that I need to switch the key-value pairs somehow so that each Cabinet becomes a key and its era becomes its value - because currently it's the other way around. But I've tried and failed to do so. Any help? Thank you!
From what I can see, the cabinets[a]["cbnm"] variable you use for writing is just a long Unicode so when you do writer.writerow([a.encode("utf-8"), cabinets[a]["cbnm"]]) what actually happens is that you write the era at the first column and the whole Unicode in the single cell in the next column (even if you have \n in your string it does not prevent it from being writed in a single cell (csv actually think that you want the unicode to be in ONLY one cell so it puts " before and after the cabinets[a]["cbnm"] value to be sure it will actually be in one cell)), what you should do to write every cabinet value in another row is to use the writerow method separately for each desired row.
for example this code worked fine for me:
cabinets = setkab
with open("cabinets.csv", "w") as toWrite:
writer = csv.writer(toWrite, delimiter=",")
writer.writerow(["Era", "Cabinet name"])
for a in setkab.keys():
writer.writerow([a.encode("utf-8")]) #write the era column
cabinets_list = [i for i in cabinets[a]["cbnm"].split('\n') if i != ''] #get all the values that are separated by newline chars (if they aren't empty strings)
for i in cabinets_list: writer.writerow([a.encode("utf-8"),i]) #write every value separately in the CABINET NAME row
as you can see I changed only the last 3 lines.
I hope this will help you!

Python: Reading in .csv data as dictionary and printing out data as dictionary to .csv file?

I'm writing a python executable script that does the following:
I want to gather information from a .csv file and read it into python as a dictionary. This .csv file contains several columns of information with headings, and I only want to extract particular columns (those columns with specific headings I want) , and print those columns out to another .csv file. I am using the functions DictReader and DictWriter.
I am reading in the .csv file as a dictionary (with the headings being the key and the column values being the items),and output the information as a dictionary to another .csv file.
After I read it in, I print out the items in the particular headings (so I can double check what I have read it). I then open up a new .csv file and want to write the data (which I have just read in) as a dictionary. I can write in the keys (column headings) but my code doesn't print any of the item values for some reason. The headings that I want in this case are 'Name' and 'DOB'.
Here is my code:
#!/usr/bin/python
import os
import os.path
import re
import sys
import pdb
import csv
csv_file = csv.DictReader(open(sys.argv[1],'rU'),delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
test_file = open('test2.csv','wr')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
Any ideas of where I'm going wrong ? I want to print the item values under their their corresponding column headers in the output file.
I am using python 2.7.11 on a Mac machine. I am also printing values to the terminal.
You're unfortunately tricked by your own testing, that is, the printing of the individual rows. By looping through csv_file initially, you've exhausted the iterator and are at the end. Further iterations, as done in the bottom of your code, are not possible and will be ignored.
Your question is essentially a duplicate of various other question, such as how to read from a CSV file repeatedly. Albeit that the issue here comes up in a different way: you didn't realise what the problem was, while those questions do know the cause, but not the solution.
Answers to those questions tell you to simply reset the file pointer of the input file. Unfortunately, the input file gets closed promptly after reading, in your current code.
Thus, something like this should work:
infile = open(sys.argv[1], 'rU')
csv_file = csv.DictReader(infile ,delimiter = ',')
<all other code>
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
test_file.close()
infile.close()
As an aside, just use the with statement when opening files:
with open(sys.argv[1], 'rU') as infile, open('test2.csv', 'wr') as outfile:
csv_file = csv.DictReader(infile ,delimiter = ',')
for line in csv_file:
print line['Name'] + ',' + line['DOB']
fieldnames = ['Name','DOB']
csvwriter = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames)
infile.seek(0)
for row in csv_file:
csvwriter.writerow(row)
Note: DictWriter will take care of the header row. No need to write it yourself.

Extracting columnar data correctly as it is in the file

Suppose i have tabular column as below.Now i want to extract the column wise data.I tried extracting data by creating a list.But it is extracting the first row correctly but from second row onwards there is space i.e under CEN/4.Now my code considers zeroth column has 5.0001e-1 form second row,it starts reading from there. How to extract the data correctly coulmn wise.output is scrambled.
0 1 25 CEN/4 -5.000000E-01 -3.607026E+04 -5.747796E+03 -8.912796E+02 -88.3178
5.000000E-01 3.607026E+04 5.747796E+03 8.912796E+02 1.6822
27 -5.000000E-01 -3.641444E+04 -5.783247E+03 -8.912796E+02 -88.3347
5.000000E-01 3.641444E+04 5.783247E+03 8.912796E+02 1.6653
28 -5.000000E-01 -3.641444E+04 -5.712346E+03 -8.912796E+02 -88.3386
5.000000E-01 3.641444E+04 5.712346E+03 8.912796E+02
my code is :
f1=open('newdata1.txt','w')
L = []
for index, line in enumerate(open('Trial_1.txt','r')):
#print index
if index < 0: #skip first 5 lines
continue
else:
line =line.split()
L.append('%s\t%s\t %s\n' %(line[0], line[1],line[2]))
f1.writelines(L)
f1.close()
my output looks like this:
0 1 CEN/4 -5.000000E-01 -5.120107E+04
5.000000E-01 5.120107E+04 1.028093E+04 5.979930E+03 8.1461
i want columnar data as it is in the file.How to do that.I am a bgeinner
its hard to tell from the way the input data is presented in your question, but Im guessing your file is using tabs to separate columns, in any case, consider using python csv module with the relevant delimiter like:
import csv
with open('input.csv') as f_in, open('newdata1', 'w') as f_out:
reader = csv.reader(f_in, delimiter='\t')
writer = csv.writer(f_out, delimiter='\t')
for row in reader:
writer.writerow(row)
see python csv module documentation for further details