PYTHON2 - Is there ways to declare encoding method in csv write - python-2.7

When I try to write cursor result coming from database execution (type is a list) to the csv, the error throws
a.writerow(lst) TypeError: write() argument 1 must be unicode, not str
This is for python2. I've tried in python3 like the script below. But the system requirement asks me to change Python2.
This is the correct script using python3.
results_percent = cursor.fetchall()
with open(file4,'w',encoding="utf-8",newline='') as fp:
a = csv.writer(fp, delimiter=',')
a.writerow(['MFIName','ClientCountAtSignUp','UploadCountLastMonth','UploadCount','80%','Status'])
a.writerows(results_percent)
The below is by using python2 which gives me error a.writerow(lst) TypeError: write() argument 1 must be unicode, not str
results_percent = cursor.fetchall()
with io.open(file4,'w',encoding='utf-8') as fp:
a = csv.writer(fp, delimiter=',')
lst = ['MFIName','ClientCountAtSignUp','UploadCountLastMonth','UploadCount','80%','Status']
a.writerow(lst)
a.writerows(results_percent)
The output is to write results_percent to csv file.

Related

3D Drawing from a file in an extra directory [duplicate]

I'm trying to get a data parsing script up and running. It works as far as the data manipulation is concerned. What I'm trying to do is set this up so I can enter multiple user defined CSV's with a single command.
e.g.
> python script.py One.csv Two.csv Three.csv
If you have any advice on how to automate the naming of the output CSV so that if input = test.csv, output = test1.csv, I'd appreciate that as well.
Getting
TypeError: coercing to Unicode: need string or buffer, list found
for the line
for line in csv.reader(open(args.infile)):
My code:
import csv
import pprint
pp = pprint.PrettyPrinter(indent=4)
res = []
import argparse
parser = argparse.ArgumentParser()
#parser.add_argument("infile", nargs="*", type=str)
#args = parser.parse_args()
parser.add_argument ("infile", metavar="CSV", nargs="+", type=str, help="data file")
args = parser.parse_args()
with open("out.csv","wb") as f:
output = csv.writer(f)
for line in csv.reader(open(args.infile)):
for item in line[2:]:
#to skip empty cells
if not item.strip():
continue
item = item.split(":")
item[1] = item[1].rstrip("%")
print([line[1]+item[0],item[1]])
res.append([line[1]+item[0],item[1]])
output.writerow([line[1]+item[0],item[1].rstrip("%")])
I don't really understand what is going on with the error. Can someone explain this in layman's terms?
Bear in mind I am new to programming/python as a whole and am basically learning alone, so if possible could you explain what is going wrong/how to fix it so I can note it for future reference.
args.infile is a list of filenames, not one filename. Loop over it:
for filename in args.infile:
base, ext = os.path.splitext(filename)
with open("{}1{}".format(base, ext), "wb") as outf, open(filename, 'rb') as inf:
output = csv.writer(outf)
for line in csv.reader(inf):
Here I used os.path.splitext() to split extension and base filename so you can generate a new output filename adding 1 to the base.
If you specify an nargs argument to .add_argument, the argument will always be returned as a list.
Assuming you want to deal with all of the files specified, loop through that list:
for filename in args.infile:
for line in csv.reader(open(filename)):
for item in line[2:]:
#to skip empty cells
[...]
Or if you really just want to be able to specify a single file; just get rid of nargs="+".

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

Read & write txt file error - 'str' object has no attribute 'name', polish dialectical chars in path error

I use Python 2.7 on Win 7 Pro SP1.
I try code:
import os
path = "E:/data/keyword"
os.chdir(path)
files = os.listdir(path)
query = "{keyword} AND NOT("
result = open("query.txt", "w")
for file in files:
if file.endswith(".txt"):
file_path = file.name
dane = open(file_path, "r")
query.append(dane)
result.append(" OR ")
result.write(query)
result.write(")")
result.close()
I get error:
file_path = file.name AttributeError: 'str' object has no attribute
'name'
I can't figure why.
I have secon error when path is with polish dialectical chars like "ąęłńóżć". I get error for:
path = "E:/Bieżące projekty/keyword"
I try fix it to:
path =u"E:/Bieżące projekty/keyword"
but it not help. I'm starting with Python and I can't find out why this code is not working.
What i want
Find all text file in the directory.
Join all text file in one file text named "query.txt"
fx.
file 1
data1 data2
file 2
data 3 data 4
Output from "query.txt":
data1 data2 data 3 data 4
Above code working fine when path variable is without polish dialectical characters. When I change path I get error:
SyntaXError: Non-ASCII character '\xc5' in file query.py on line 9, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
On python doc PEP263 I find magic quote. Polish lang coding characters like "ąęłńóźżć" standard is ISO-8859-2. So i try add encoding to code. I try use UTF-8 too and I get the same error. My all code is (without 5 first lines with comment what code doing):
import os
#path = r"E:/data"
# -*- coding: iso-8859-2 -*-
path = r"E:/Bieżące przedsięwzięcia"
os.chdir(path)
files = os.listdir(path)
query = "{keyword} AND NOT("
for file in files:
if file.endswith(".txt"):
dane = open(file, "r")
text = dane.read()
query += text
print(query)
dane.close()
query.join(" OR ")
result = open("query.txt", "w")
result.write(query)
result.write(")")
result.close()
On Unicode/UTF-8 character here I found that polish char "ż" is coded in UTF-8 as "\xc5\xbc". Mark # to coding line with path with "ż" as comment make error too. When I remove line with this char code:
path = r"E:/Bieżące przedsięwzięcia"
working fine and I get result which I want.
For editing I use Notepad++ with default setings. I only set in python code tab replace by four space.
*
Second Question
I try find in Python doc in variable path what r does mean. I can't find it in Python 2.7 string documentation. Could someone tell my how this part of Python (like u, r before string value) is named fx.
path = u"somedata"
path = r"somedata"?
I would get doc to read about it.

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Jupyter string tokenization for python

I'm trying to implement simple_tokenize using dictionary as the output from my previous code but i get an error message. Any assistance with the following code would be much appreciated. I'm using Python 2.7 Jupyter
import csv
reader = csv.reader(open('data.csv'))
dictionary = {}
for row in reader:
key = row[0]
dictionary[key] = row[1:]
print dictionary
The above works pretty well but issue is with the following:
import re
words = dictionary
split_regex = r'\W+'
def simple_tokenize(string):
for i in rows:
word = words.split
#pass
print word
I get this error:
NameError Traceback (most recent call last)
<ipython-input-2-0d0e05fb1556> in <module>()
1 import re
2
----> 3 words = dictionary
4 split_regex = r'\W+'
5
NameError: name 'dictionary' is not defined
Variables are not saved between Jupyter sessions, unless you explicitly do so yourself. Thus, if you ran the first code section, then quit your Jupyter session, started a new Jupyter session and ran the second code block, dictionary is not preserved from the first session and will thus be undefined, as indicated by the error.
If you run the above code blocks differently (e.g., not across Jupyter sessions), you should indicate this, but the tags and traceback suggest this is what you do.