Error: unknown dialect - python-2.7

I'm using the csv reader in the csv module to read a file in the format.
Filename, Foo, Label
Each record looks as follows.
file1.wav,"[ 1.92849546e+02 2.86156126e+00 -7.96250116e+00
7.29509485e+02 4.79000000e+02 5.51000000e+02]",1
I get the following error when reading the file.
set_ = csv.reader(open(foo), 'rb', delimiter = ',')
Error: unknown dialect
Also I am using python 2.7 on a windows machine.

You are using the csv.reader api wrong
As per the documentation the 2nd argument to csv.reader is dialect and "rb" does not make sense.
Instead you probably intend to do something on these lines:
with open(foo) as input :
reader = csv.reader(foo)
#etc

Related

Django encoding error when reading from a CSV

When I try to run:
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Most of my data gets created in the database, except for one particular row. When my script reaches the row, I receive the error:
ProgrammingError: You must not use 8-bit bytestrings unless you use a
text_factory that can interpret 8-bit bytestrings (like text_factory = str).
It is highly recommended that you instead just switch your application to Unicode strings.`
The particular row in the CSV that causes this error is:
>>> row
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}
I've looked at the other similar Stackoverflow threads with the same or similar issues, but most aren't specific to using Sqlite with Django. Any advice?
If it matters, I'm running the script by going into the Django shell by calling python manage.py shell, and copy-pasting it in, as opposed to just calling the script from the command line.
This is the stacktrace I get:
Traceback (most recent call last):
File "<console>", line 4, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte
EDIT: I decided to just manually import this entry into my database, rather than try to read it from my CSV, based on Alastair McCormack's feedback
Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.
I suspect you're using Python 2 - open() returns str which are simply byte strings.
The error is telling you that you need to decode your text to Unicode string before use.
The simplest method is to decode each cell:
with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].decode('utf-8),
team=row['Team'].decode('utf-8),
position=row['Position'].decode('utf-8)
)
That'll work but it's ugly add decodes everywhere and it won't work in Python 3. Python 3 improves things by opening files in text mode and returning Python 3 strings which are the equivalent of Unicode strings in Py2.
To get the same functionality in Python 2, use the io module. This gives you a open() method which has an encoding option. Annoyingly, the Python 2.x CSV module is broken with Unicode, so you need to install a backported version:
pip install backports.csv
To tidy your code and future proof it, do:
import io
from backports import csv
with io.open('data.csv', 'r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
# now every row is automatically decoded from UTF-8
pgd = Player.objects.get_or_create(
player_name=row['Player'],
team=row['Team'],
position=row['Position']
)
Encode Player name in utf-8 using .encode('utf-8') in player name
import csv
with open('data.csv', 'rU') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
pgd = Player.objects.get_or_create(
player_name=row['Player'].encode('utf-8'),
team=row['Team'],
position=row['Position']
)
In Django, decode with latin-1, csv.DictReader(io.StringIO(csv_file.read().decode('latin-1'))), it would devour all special characters and all comma exceptions you get in utf-8.

python 2 not recognize "newline" for file stream

With Python 3.3, the following code works fine
import csv
with open(foname, "w", newline='') as outstream:
csv.writer(outstream, delimiter =' ').writerows(
[cell.value for cell in row]
for row in ws.rows
)
However, python-2 is unable to run that and says
with open(foname, "w", newline='') as outstream:
TypeError: 'newline' is an invalid keyword argument for this function
What is the equivalent for previous versions?
Use with open(foname, 'wb') as outstream:. newline was a parameter added in Python 3.
This is documented for Python 2 as:
If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.
Whereas for Python 3, the documentation says:
If csvfile is a file object, it should be opened with newline=''

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Beginner, python - how to read a list from a file

I have a Word document that is literally a list of lists, that is 8 pages long. Eg:
[['WTCS','Dec 21'],['THWD','Mar 22']...]
I am using Linux Mint, Python 3.2 and the IDLE interface, plus my own .py programs. I need to read and reference this list frequently and when I stored it inside .py programs it seemed to slow down the window considerably as I was editing code. How can I store this information in a separate file and read it into python? I have it in a .txt file now and tried the following code:
def readlist():
f = open(r'/home/file.txt','r')
info = list(f.read())
return(info)
but I get each character as an element of a list. I also tried info = f.read() but I get a string. Thanks!
You can convert a Python list read from a text file from a text file as a string into a list using the ast module:
>>> import ast
>>> s = "[['WTCS','Dec 21'],['THWD','Mar 22']]"
>>> ast.literal_eval(s)
[['WTCS', 'Dec 21'], ['THWD', 'Mar 22']]

Weka 3-7 CSVLoader do not work with ";" (semicolon) as field separator

I think that i found a bug in weka 3.7,
When I try to load a csv file using weka.core.converters.CSVLoader with separator ";", I get the following error:
Exception in thread "main" java.io.IOException: number expected, read Token[1;2], line 1
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:294)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:656)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:477)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:445)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:430)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:202)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:803)
at de.tuhh.thesis.repower.pcanalysis.BinningWindSpeed.from_CSV_to_ARFF(BinningWindSpeed.java:99)
at de.tuhh.thesis.repower.pcanalysis.Main.main(Main.java:49)
My csv file is:
a;b
1;2
my code is:
CSVLoader loader = new CSVLoader();
File inputFile = new File(csvFileName);
loader.setSource(inputFile);
loader.setFieldSeparator(";");
data = loader.getDataSet();
if I try the same code but changing ";" for "," and using the following file, the program succeeds
a,b
1,2
I really need to work with ";"
Thanks and regards
There is (at least by now) an option to set the field separator:
CSVLoader loader = new CSVLoader();
loader.setFieldSeparator(";");
Just in case someone else stumbles upon this question..