Django InMemoryUploadedFile with universal line brakes and utf8 - django

In my django application users can upload their csv files to import data into django. It works fine for CLRF unicode files.
But there are two issues:
When the file is not encoded with utf8 I keep getting 'utf8' codec can't decode byte 0xdc in position 393: invalid continuation byte. I tried to resolve that by using the following code
file = codecs.EncodedFile(request.FILES['import'],"utf-8")
dialect = csv.Sniffer().sniff(file.read(2048))
file.open() # seek to 0
reader = csv.reader(file,dialect=dialect)
When the file uses CR Linebreaks they are not recognized or I get: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?. But the InMemoryUploadedFile is already an opened file object.
My issue is very similar to this one but the solution mentioned for point 1 didn't work for me (as you can see my code is very similar) and point 2 isn't answered at all:
Proccessing a Django UploadedFile as UTF-8 with universal newlines

Related

Python: How to solve SyntaxError: Non-ASCII character?

This is the unicode that I have defined at the top of my program
#!/usr/bin/env python
# -*- coding: utf-8 -*-
And yet I still get this error
SyntaxError: Non-ASCII character '\xfe' in file C:/Users/aaron/Desktop/Python/Bicycle_Diagnosis_System/Main.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
I have looked at the website it wprovides and trawled other websites and still can't find the answer. Any ideas (im using pycharm community edition as my IDE if that affects it)
Any help is much appreciated!
Trying add #coding=utf-8 on line 1 and re-run
You file is saved as UTF-16 with BOM (big-endian). I saved your sample code in Notepad++ with that encoding and reproduced the error:
File "test.py", line 1
SyntaxError: Non-ASCII character '\xfe' in file x.py on line 1, but no encoding declared; see http://python.org/dev/peps
/pep-0263/ for details
Make sure your file is saved in the encoding declared. You have to use an encoding compatible with ASCII for the hash bang and encoding lines to be read properly. UTF-16 is not compatible, hence the error message when it read the non-ASCII bytes of the byte order mark (BOM) character.

How to set the UTF-8 as default encoding in Odoo Build?

Can anyone tell me how to set the UTF-8 as default encoding option in Odoo Build.?
Note : I have mentioned "# -- coding: utf-8 --" in all the files which takes no effect on my expected encoding.
If you put # coding: utf-8 at the top of a Python module, this affects the way how Python interprets the source code. This is important if you have string literals with non-ASCII characters in your code, in order to have them represent the correct characters.
However, since you talk about "default encoding", I assume you care about the encoding of text files opened for reading or writing. In Python 2.x, the default for reading and writing files is not to decode/encode at all. I don't think you can change this default (because the built-in function open simply doesn't support encoding), but you can use io.open() or codecs.open() to open files with an explicit encoding.
Thus, to read from a file encoded with UTF-8, open it as follows:
with io.open(filename, encoding='utf-8') as f:
for line in f:
...
In Python 3, built-in open() is the same as io.open(), and the default encoding is platform-dependent.

in python2 is OK, but in python3 doesn't work

#!/usr/bin/env python3
f = open('dv.bmp', mode='rb')
slika = f.read()
f.closed
pic = slika[:28]
slika = slika[54:]
# dimenzije originalnog bitmapa
pic_w = ord(pic[18]) + ord(pic[19])*256
pic_h = ord(pic[22]) + ord(pic[23])*256
print(pic_w, pic_h)
why this code doesn't work in python3 (in python2 it works fine) OR
howto read binary file into string type in python3?
In Python 2.x, binary mode (e.g. 'rb') only affects how Python interprets end-of-line characters:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
However in Python 3.x, binary mode also changes the type of the resulting data:
Normally, files are opened in text mode, that means, you read and
write strings from and to the file, which are encoded in a specific
encoding. If encoding is not specified, the default is platform
dependent (see open()). 'b' appended to the mode opens the file in
binary mode: now the data is read and written in the form of bytes
objects. This mode should be used for all files that don’t contain
text.
Since the read results in a bytes object, indexing it results in an integer, not a one-character string as in Python 2. Passing that integer to the ord() function raises the error mentioned in your comment.
The solution is just to omit the ord() call in Python 3, since the integer you get from indexing the bytes object is the same as what you'd get from calling ord() on the string equivalent.

wrong text encoding on linux

I downloaded a source code .rar file from internet to my linux server. Then, I extract all source files into local directory. When I use "cat" command to see the content of each file, the wrong text encoding is shown on my terminal (There are some chinese characters in the source file).
I use
file -bi testapi.cpp
then shows:
text/plain; charset=iso-8859-1
I tried to convert that file to uft-8 encoding with following command:
iconv -f ISO88591 -t UTF8 testapi.cpp > new.cpp
But it doesn't work.
I set my .vimrc file with following two lines:
set encoding=utf-8
set fileencoding=utf-8
After this, when I vim testapi.cpp, the chinese characters will be normally displayed in the vim. But cat testapi.cpp doesn't work.
When I compile and run the program, the printf statement with chinese characters will print wrong characters like ????
What should I do to display correct chinese characters when I run the program?
TLDR Quickest Solution: Copy/Paste the Visible Text to a Brand-New, Confirmed UTF-8 File
Your file is marked as latin1, but the data is stored as utf8.
When you set set-enc=utf8 or set fileencoding=utf-8 in VIM, you're not changing the data or converting it. You're looking at the same exact data, but interpreting as if it is the utf8 charset. So, good news: Your data is good. No conversion or changing necessary.
You just need to put the same exact data into a file already marked as UTF-8 encoding. That can be done easily by simply making a brand new file in vim, using set enc=utf8, and then copy-pasting your old data into the new file. You can test this out by making a testfile with the only text "汉语" ("chinese language"), set enc, save, close, reopen, and see that the text didn't get corrupted. And you can test with file -bi $pathtofile, though that is not super reliable.
Anyway, TLDR: Make a brand new UTF-8 file, confirm that it's utf-8, make your data visible, and then copy/paste and/or transfer it to the new UTF-8 file, without doing any conversion.
Also, theoretically, I considered that iconv -f utf8 -t utf8 would work, since all I wanted to do was make utf-8-encoded data be marked as utf-8-encoded, without changing it. But this gave me an error that indicated it was still trying to do a data conversion.

I need to create list in python from OpenOffice Calc columns

The problem is I have large amounts of data in OpenOffice Calc, approximately 3600 entries for each of 4 different categories and 3 different sets of this data, and I need to run some calculations on it in python. I want to create lists corresponding each of the four categories. I am hoping someone can help guide me to an easy-ish, efficient way to do this whether it be script or importing data. I am using python 2.7 on a windows 8 machine. Any help is greatly appreciated.
My current method i am trying is to save odf file as cvs then use genfromtxt(from numpy).
from numpy import genfromtxt
my_data = genfromtxt('C:\Users\tomdi_000\Desktop\Load modeling(WSU)\PMU Data\Data18-1fault-Alvey-csv trial.csv', delimiter=',')
print(my_data)
File "C:\Program Files (x86)\Wing IDE 101 5.0\src\debug\tserver\_sandbox.py", line 5, in <module>
File "c:\Python27\Lib\site-packages\numpy\lib\npyio.py", line 1352, in genfromtxt
fhd = iter(np.lib._datasource.open(fname, 'rbU'))
File "c:\Python27\Lib\site-packages\numpy\lib\_datasource.py", line 147, in open
return ds.open(path, mode)
File "c:\Python27\Lib\site-packages\numpy\lib\_datasource.py", line 496, in open
raise IOError("%s not found." % path)
IOError: C:\Users omdi_000\Desktop\Load modeling(WSU)\PMU Data\Data18-1fault-Alvey-csv trial.csv not found.
the error stems from this code in _datasource.py
# NOTE: _findfile will fail on a new file opened for writing.
found = self._findfile(path)
if found:
_fname, ext = self._splitzipext(found)
if ext == 'bz2':
mode.replace("+", "")
return _file_openers[ext](found, mode=mode)
else:
raise IOError("%s not found." % path)
Your problem is that your path string 'C:\Users\tomdi_000\Desktop\Load modeling(WSU)\PMU Data\Data18-1fault-Alvey-csv trial.csv' contains an escape sequence - \t. Since you are not using raw string literal, the \t is being interpreted as a tab character, similar to the way a \n is interpreted as a newline. If you look at the line starting with IOError:, you'll see a tab has been inserted in its place. You don't get this problem with UNIX-style paths, as they use forward slashes /.
There are two ways around this. The first is to use a raw string literal:
r'C:\Users\tomdi_000\Desktop\Load modeling(WSU)\PMU Data\Data18-1fault-Alvey-csv trial.csv'
(note the r at the beginning). As explained in the link above, raw string literals don't interpret back slashes \ as beginning an escape sequence.
The second way is to use a UNIX-style path with forward slashes as path delimiters:
'C:/Users/tomdi_000/Desktop/Load modeling(WSU)/PMU Data/Data18-1fault-Alvey-csv trial.csv'
This is fine if you're hard-coding the paths into your code, or reading from a file that you generate, but if the paths are getting generated automatically, such as reading the results of an os.listdir() command for example, it's best to use raw strings instead.
If you're going to be using numpy to do the calculations on your data, then using np.genfromtxt() is fine. However, for working with CSV files, you'd be much better off using the csv module. It includes all sorts of functions for reading columns and rows, and doing data transformation. If you're just reading the data then storing it in a list, for example, csv is definitely the way to go.