new_list = eval(my_list[0]) # my_list contains dictionaries
def get_next_file():
for key, value in new_list.iteritems():
yield value
file = get_next_file
book = xlrd.open_workbook(file_contents=file)
for sheet in book.sheet_names():
print sheet
I am trying to take a string from a dict and turn it into an xls file so it can be processed. It was an xls file that I used str(list(xls_file)) so that it could be saved in my database.
Any thoughts?
the saved string prints out as hex with some words in it.
The library xlrd you are using is only to read informations from an excel file.
If you want to write a new excel file you need to use xlwt.
If you want to change some cells in an excel file you should use xlutils.
Homepage of these three libraries: http://www.python-excel.org/
Related
I have gone through similar questions but am having trouble fitting this to my needs. I am reading a csv, creating a list and appending the list to a seperate csv.
with open('in_table.csv', 'rb') as vo:
next(vo) # skip header row
reader = csv.reader(vo)
vo_list = list(reader)
print vo_list
with open('out_table.csv', 'ab') as f:
cf = csv.writer(f)
for row in vo_list:
cf.writerow(row)
I need to write the list starting at the second column and not the first, as the first column will contain separate information. What is the simplest way to do this?
Realistically I have another input CSV exactly like the first one and I need to put them both into the output file into a total of 4 columns. Like so:
Column1, join_count1, grid_id1, join_count2, grid_id2
Blah, 0, U24, 3, U24
I would go with the built-in csv package. Also, you are opening CSV files as binary files, was that intentional? CSVs should be text files by definition, but if yours are binary then please correct the flags below:
import csv
with open("out_table.csv", "a+") as out_file:
writer = csv.writer(out_file)
with open("in_table.csv") as in_file:
reader = csv.reader(in_file)
next(reader) # skip the header
for oid, join_count, grid_id in reader:
writer.writerow([join_count, grid_id])
I am trying to build a tool that can convert .csv files into .yaml files for further use. I found a handy bit of code that does the job nicely from the link below:
Convert CSV to YAML, with Unicode?
which states that the line will take the dict created by opening a .csv file and dump it to a .yaml file:
out_file.write(ry.safe_dump(dict_example,allow_unicode=True))
However, one small kink I have noticed is that when it is run once, the generated .yaml file is typically incomplete by a line or two. In order to have the .csv file exhaustively read through to create a complete .yaml file, the code must be run two or even three times. Does anybody know why this could be?
UPDATE
Per request, here is the code I use to parse my .csv file, which is two columns long (with a string in the first column and a list of two strings in the second column), and will typically be 50 rows long (or maybe more). Also note that it designed to remove any '\n' or spaces that could potentially cause problems later on in the code.
csv_contents={}
with open("example1.csv", "rU") as csvfile:
green= csv.reader(csvfile, dialect= 'excel')
for line in green:
candidate_number= line[0]
first_sequence= line[1].replace(' ','').replace('\r','').replace('\n','')
second_sequence= line[2].replace(' ','').replace('\r','').replace('\n','')
csv_contents[candidate_number]= [first_sequence, second_sequence]
csv_contents.pop('Header name', None)
Ultimately, it is not that important that I maintain the order of the rows from the original dict, just that all the information within the rows is properly structured.
I am not sure what would cause could be but you might be running out of memory as you create the YAML document in memory first and then write it out. It is much better to directly stream it out.
You should also note that the code in the question you link to, doesn't preserve the order of the original columns, something easily circumvented by using round_trip_dump instead of safe_dump.
You probably want to make a top-level sequence (list) as in the desired output of the linked question, with each element being a mapping (dict).
The following parses the CSV, taking the first line as keys for mappings created for each following line:
import sys
import csv
import ruamel.yaml as ry
import dateutil.parser # pip install python-dateutil
def process_line(line):
"""convert lines, trying, int, float, date"""
ret_val = []
for elem in line:
try:
res = int(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = float(elem)
ret_val.append(res)
continue
except ValueError:
pass
try:
res = dateutil.parser.parse(elem)
ret_val.append(res)
continue
except ValueError:
pass
ret_val.append(elem.strip())
return ret_val
csv_file_name = 'xyz.csv'
data = []
header = None
with open(csv_file_name) as inf:
for line in csv.reader(inf):
d = process_line(line)
if header is None:
header = d
continue
data.append(ry.comments.CommentedMap(zip(header, d)))
ry.round_trip_dump(data, sys.stdout, allow_unicode=True)
with input xyz.csv:
id, title_english, title_russian
1, A Title in English, Название на русском
2, Another Title, Другой Название
this generates:
- id: 1
title_english: A Title in English
title_russian: Название на русском
- id: 2
title_english: Another Title
title_russian: Другой Название
The process_line is just some sugar that tries to convert strings in the CSV file to more useful types and strings without leading spaces (resulting in far less quotes in your output YAML file).
I have tested the above on files with 1000 rows, without any problems (I won't post the output though).
The above was done using Python 3 as well as Python 2.7, starting with a UTF-8 encoded file xyz.csv. If you are using Python 2, you can try unicodecsv if you need to handle Unicode input and things don't work out as well as they did for me.
I have to read a large CSV file almost of 100K rows in the file, also it will be very easier to process that file if I can read each file row in a dictionary format.
After little research I found python's built-in function csv.DictReader from the csv module.
But in the documentation it is not clear mentioned whether it stores whole file in memory or not.
But it has mentioned that:
The fieldnames parameter is a sequence whose elements are associated with the fields of the input data in order.
But I'm not sure whether sequence is stored in memory or not.
So the question is, does it store whole file in the memory?
If so, is there any other option to read single row as a generaror expression from the file and read get row as dict .
Here is my code:
def file_to_dictionary(self, file_path):
"""Read CSV rows as a dictionary """
file_data_obj ={}
try:
self.log("Reading file: [{}]".format(file_path))
if os.path.exists(file_path):
file_data_obj = csv.DictReader(open(file_path, 'rU'))
else:
self.log("File does not exist: {}".format(file_path))
except Exception as e:
self.log("Failed to read file.", e, True)
return file_data_obj
As far as im aware the DictReader object you create, in your case file_data_obj, is a generator type object.
Generator objects are not stored in memory but can only be iterated over once!
To print the fieldnames of your data as a list you can simply use: print file_data_obj.fieldnames
Secondly, in my experience I find it much easier to use a list of dictionaries when reading data from csv files, where each dictionary represents a row in your file. Consider the following:
def csv_to_dict_list(path):
csv_in = open(path, 'rb')
reader = csv.DictReader(csv_in, restkey=None, restval=None, dialect='excel')
fields = reader.fieldnames
list_out = [row for row in reader]
return list_out, fields
Using the function above (or something similar), you can acheive your goal with a couple of lines. Eg:
data, data_fields = csv_to_dict_list(path)
print data_fields (prints fieldnames)
print data[0] (prints first row of data from file)
Hope this helps!
Luke
I am parsing a csv file (created in windows) and trying to populate a database table using a model i've created.
I am getting this error:
pl = PriceList.objects.create(code=row[0], description=row[1],.........
Incorrect string value: '\xD0h:NAT...' for column 'description' at row 1
My table and the description field use utf-8 and utf8_general_ci collation.
The actual value i am trying to insert is this.
HOUSING:PS-187:1g\xd0h:NATURAL CO
I am not aware of any string processing i should do to get over this error.
I think i used a simple python script before to populate the database using conn.escape_string() and it worked (if that helps)
Thanks
I've had trouble with the CSV reader and unicode before as well. In my case using the following got me past the errors.
From http://docs.python.org/library/csv.html
The csv module doesn’t directly support reading and writing Unicode, ...
unicode_csv_reader() below is a
generator that wraps csv.reader to
handle Unicode CSV data (a list of
Unicode strings). utf_8_encoder() is a
generator that encodes the Unicode
strings as UTF-8, one string (or row)
at a time. The encoded strings are
parsed by the CSV reader, and
unicode_csv_reader() decodes the
UTF-8-encoded cells back into Unicode:
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
I've seen copy & paste converters to MediaWiki or HTML format. But I could not find one that converts to the Trac Wiki format (WikiFormattting) which uses pipes to separate cells, such as:
||Cell 1||Cell 2||Cell 3||
||Cell 4||Cell 5||Cell 6||
You could save your excel sheet as a CSV file. Then from a command prompt (assuming you are running Windows XP or newer) type this command:
for /f "tokens=1,2,3 delims=," %a in (mycsvfile.csv) do ((echo ^|^|%a^|^|%b^|^|%c^|^|) >> mywikifile.txt)
The number of tokens depends on how many columns you have. You could do up to 26 columns this way in a single pass by increasing the number of tokens and adding the corresponding number of variable names %d, %e, etc.
I made a jsFiddle to do just this, just put your CSV test in the HTML box and run the script. The content that you would paste into a TracWiki page would be in the Result box then.
In case something happens to the jsFiddle, here is the JavaScript I used (I probably didn't need to use jQuery, but it was faster for me then to have to think of the non-jQuery way to do it:
var csv = $('body').html().trim();
csv = csv.replace(/,/g, "||");
csv = csv.replace(/$/gm, "||<br />");
csv = csv.replace(/^/gm, "||");
// set to false if you don't want empty cells
if (true) {
while (csv.indexOf("||||") > -1) {
csv = csv.replace(/\|\|\|\|/g, "|| ||");
}
}
$('body').html(csv);
My port of Shan Carter's Mr. Data Converter now supports Wiki in the format you specified. You can copy & paste directly from Excel or from a CSV file.
http://thdoan.github.io/mr-data-converter/
UPDATE: I've added Trac-specific formatting under the "Trac" option.