I need to import ".xls" in python.
Can you please give me an idea for this.I must need to import only a excel file not ".csv" file.Because csv file neglects a leading zero's in number.
Here’s the code example:
import xlrd
def open_file(path):
#Open and read an Excel file
book = xlrd.open_workbook(path)
print book.nsheets # print number of sheets
print book.sheet_names() # print sheet names
first_sheet = book.sheet_by_index(0) # get the first worksheet
print first_sheet.row_values(0) # read a row
cell = first_sheet.cell(0,0) # read a cell
print cell
print cell.value
# read a row slice
print first_sheet.row_slice(rowx=0,
start_colx=0,
end_colx=2)
if __name__ == "__main__":
path = "test.xls"
open_file(path)
Related
I need to parse through a directory of multiple excel files to find matches to a set of 500+ strings (that I currently have in a set).
If there is a match to one of the strings in an excel file, I need to pull that row out into a new file.
Please let me know if you can assist! Thank you in advance for the help!
The directory is called: All_Data
The set is from a list of strings in a file (MRN_file_path)
My code:
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
for root, dires, files in os.walk('path/All_Data'):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
with open(search_results_path, "w") as search_results:
if MRN in filepath:
search_results.write(line)
Your code doesn't actually read the .xlsx files. As far as I know, there isn't anything in native Python to read .xlsx files. However, you can check out openpyxl and see if that helps. Here's a solution which reads all the .xlsx files in the specified directory, and writes them into a single tab-delimited txt file.
import os
from openpyxl import load_workbook
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
outfile = open(search_results_path, "w")
for root, dires, files in os.walk(path):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
# load in the .xlsx workbook
wb = load_workbook(filename = filepath, read_only = True)
# assuming we select the worksheet which is active
ws = wb.active
# iterate through each row in the worksheet
for row in ws.rows:
# iterate over each cell
for cell in row:
if cell.value in MRN:
# create a temporary array with all the cell values in the matching row.
# the 'None' check is there to avoid errors when joining the array
# into a tab-delimited row
arr = [cell.value if cell.value is not None else "" for cell in row]
outfile.write("\t".join(arr) + "\n")
outfile.close()
If a tab-delimited output isn't what you're looking for, then you can adjust the second last line to whatever fits your needs.
I have been reading up on the csv.reader next but did not see a way to compare the values in a column from one row to the next. For instance, if my data looked like this in Maps.csv file:
County1 C:/maps/map1.pdf
County1 C:/maps/map2.pdf
County2 C:/maps/map1.pdf
County2 C:/maps/map3.pdf
County3 C:/maps/map3.pdf
County4 C:/maps/map2.pdf
County4 C:/maps/map4.pdf
If line two's county equals line one's county do something
The following code compares rows, I want to compare the county values between current and previous rows.
import csv.
f = open("Maps.csv", "r+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if pre_line == cur_line:
print "Matches"
pre_line = cur_line
except:
break
I know I can grab the current value (see below) but do not know how to grab previous value. Is this possible? If so, could someone please tell me how. On day three of trying to solve writing my script to append pdf files from a csv file and am about to toss my coffee cup at my monitor. I am breaking these down into smaller parts and using simpler data as pilot. My file is much larger. I was advised to focus on just one issue at a time when posting to this forum. This is my latest issue. It seems no matter what tack I take, I can't seem to read the data the way I want. Arrrggghhhhh.
CurColor = row[color]
Using python 2.7
You already know how to look up the previous row. Why not get the column you need from that row?
import csv.
f = open("Maps.csv", "r+")
ff = csv.reader(f)
pre_line = ff.next()
while(True):
try:
cur_line = ff.next()
if pre_line[0] == cur_line[0]: # <-- compare first column
print "Matches"
pre_line = cur_line
except:
break
or more simply:
pre_line = ff.next()
for cur_line in ff:
if pre_line[0] == cur_line[0]: # <-- compare first column
print "Matches"
pre_line = cur_line
import csv
f = open("Maps.csv", "r+")
# Use delimiters to split each line into different elements
# In my example i used a comma. Your csv may have a different delimiter
# make sure the delimiter is a single character string though
# so no multiple spaces between "County1 C:/maps/map1.pdf"
# it should be something like "County1,C:/maps/map1.pdf"
ff = csv.reader(f, delimiter=',')
COUNTY_INDEX = 0
# each time ff.next() is called, it makes an array variable ['County1', 'C:/maps/map1.pdf ']
# since you want to compare the value in the first index, then you need to reference it like so
# the line below will set pre_line = 'County1'
pre_line = ff.next()[COUNTY_INDEX]
while(True):
try:
# the current line will be 'County1' or 'County2' etc...Depending on which line is read
cur_line = ff.next()[COUNTY_INDEX]
if pre_line == cur_line:
print "Matches"
pre_line = cur_line
except:
break
I have two parameters like filename and time and I want to write them in a column in a csv file. These two parameters are in a for-loop so their value is changed in each iteration.
My current python code is the one below but the resulting csv is not what I want:
import csv
import os
with open("txt/scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
filename = ["one","two", "three"]
time = ["1","2", "3"]
zipped_lists = zip(filename,time)
for row in zipped_lists:
print row
writer.writerow(row)
My csv file must be like below. The , must be the delimeter. So I must get two columns.
one, 1
two, 2
three, 3
My csv file now reads as the following picture. The data are stored in one column.
Do you know how to fix this?
Well, the issue here is, you are using writerows instead of writerow
import csv
import os
with open("scalable_decoding_time.csv", "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
level_counter = 0
max_levels = 3
filename = ["one","two", "three"]
time = ["1","2", "3"]
while level_counter < max_levels:
writer.writerow((filename[level_counter], time[level_counter]))
level_counter = level_counter +1
This gave me the result:
one,1
two,2
three,3
Output:
This is another solution
Put the following code into a python script that we will call sc-123.py
filename = ["one","two", "three"]
time = ["1","2", "3"]
for a,b in zip(filename,time):
print('{}{}{}'.format(a,',',b))
Once the script is ready, run it like that
python2 sc-123.py > scalable_decoding_time.csv
You will have the results formatted the way you want
one,1
two,2
three,3
Can anyone suggest a way to import a CSV file into a Oracle BD using cx_Oracle. The below code works but I have to manually delete the CSV headers column on row 1 before I run the below Python Script. Is there a way to change the code to ignore line 1 of the CSV file?
import cx_Oracle
import csv
connection = cx_Oracle.connect(USER,PASSWORD,'adhoc_serv')#DADs
cursor = connection.cursor()
insert = """
INSERT INTO MUK (CODE, UNIT_NAME, GROUP_CODE, GROUP_NAME,)
VALUES(:1, :2, :3, :4)"""
# Initialize list that will serve as a container for bind values
L = []
reader = csv.reader(open(r'C:\Projects\MUK\MUK_Latest_PY.csv'),delimiter=',')
for row in reader:
L.append(tuple(row))
# prepare insert statement
cursor.prepare(insert)
print insert
# execute insert with executemany
cursor.executemany(None, L)
# report number of inserted rows
print 'Inserted: ' + str(cursor.rowcount) + ' rows.'
# commit
connection.commit()
# close cursor and connection
cursor.close()
connection.close()
If you want to simply ignore line 1 of the CSV file, that is easily accomplished by performing this immediately after the reader has been created:
next(reader)
This will simply get the first row from the CSV file and discard it.
I'm trying to filter a large tab delimited file and print out just the lines with a score of >0.999 in one of the columns, but for some reason script's output continues to just print every line. Any insights as to why my "if score > 0.999:" isn't working as intended?
import sys
import string
import re
def split_lines(lines):
for line in lines:
if line.find('#') >-1:
print line
else:
#pass
#fields = re.split('\t',line)
fields = line.split('\t')
score = fields[3]
if score > 0.999:
print score
#else:
# pass
data = sys.stdin.read()
lines = data.split('\n')
split_lines(lines)
You need to convert the string score to a number format, Decimal or float
if float(score) > 0.999