Django and UnicodeDecodeError - django

What i do...
___I have an upload form from where i upload .zip files with pictures. And everytime when there is a file title with some non-ascii character äüõ i get a unicode decode error.
title = ' '.join([filename[:filename.rfind('.')], str(count)])
Error:
This line generates the title of the picture , and that is exactly the line that gives me error: 'utf8' codec can't decode byte 0x82 in position 2: invalid start byte. You passed in 'cr\x82ations' (<type 'str'>)
What i tried to do:
I tried to .decode('utf-8') it too. But get the same result everytime no matter what i try.
I read about changing default djangos ascii to utf-8 in site.py , but am not sure it will help , and pretty sure that i don't want to do it.
ANy help is appreciated.

Django has some useful utility methods which you can use.
See: https://docs.djangoproject.com/en/dev/ref/unicode/#conversion-functions
I imagine the code might look something like this:
from django.utils.encoding import smart_str
title = ' '.join([smart_str(filename[:filename.rfind('.')]), str(count)])

I also believe firstly using .decode() is the right option, however, the code page ('utf-8')) you used might incorrect. Can you have a try '1252' or some others? Here are some standard encoding you might interest [Link]http://docs.python.org/library/codecs.html?highlight=arabic

The reason this fails, is because you try to join with a normal str object:
Instead of
' '.join(..)
use:
u' '.join(..)
Or make your life easier using:
from __future__ import unicode_literals

Related

Django 'ascii' codec can't encode characters despite encoding in UTF-8? What am I doing wrong?

I'm still in the process of learning Django. I have a bit of a problem with encoding a cyrillic strings. I have a text input. I append it's value using JS to the URL and then get that value in my view (I know I should probably use a form for that, but that's not the issue).
So here's my code (it's not complete, but it shows the main idea I think).
JS/HTML
var notes = document.getElementById("notes").value;
...
window.location.href = 'http://my-site/example?notes='+notes
<input type="text" class="notes" name="notes" id="notes">
Django/Python
notes= request.GET.get('notes', 0)
try:
notes = notes.encode('UTF-8')
except:
pass
...
sql = 'INSERT INTO table(notes) VALUES(%s)' % str(notes)
The issue is, whenever I type a string in cyrillic I get this error message: 'ascii' codec can't encode characters at position... Also I know that I probably shouldn't pass strings like that to the query, but it's a personal project so... that would do for now. I've been stuck there for a while now. Any suggestions as to what's causing this would be appreciated.
request.GET.get("key") will already get a string, why you need to encode it?
May set request.encoding="utf-8" work for you.

read text file content with python at zapier

I have problems getting the content of a txt-file into a Zapier
object using https://zapier.com/help/code-python/. Here is the code I am
using:
with open('file', 'r') as content_file:
content = content_file.read()
I'd be glad if you could help me with this. Thanks for that!
David here, from the Zapier Platform team.
Your code as written doesn't work because the first argument for the open function is the filepath. There's no file at the path 'file', so you'll get an error. You access the input via the input_data dictionary.
That being said, the input is a url, not a file. You need to use urllib to read that url. I found the answer here.
I've got a working copy of the code like so:
import urllib2 # the lib that handles the url stuff
result = []
data = urllib2.urlopen(input_data['file'])
for line in data: # file lines are iterable
result.append(line) # keep each line, or parse, etc.
return {'lines': result}
The key takeaway is that you need to return a dictionary from the function, so make sure you somehow squish your file into one.
​Let me know if you've got any other questions!
#xavid, did you test this in Zapier?
It fails miserably beacuse urllib2 doesn't exist in the zapier python environment.

scraping chinese characters python

I learnt how to scrap website from https://automatetheboringstuff.com. I wanted to scrap http://www.piaotian.net/html/3/3028/1473227.html in which the contents is in chinese and write its contents into a .txt file. However, the .txt file contains random symbols which I assume is a encoding/decoding problem.
I've read this thread "how to decode and encode web page with python?" and figured the encoding method for my site is "gb2312" and "windows-1252". I tried decoding in those two encoding methods but failed.
Can someone kindly explain to me the problem with my code? I'm very new to programming so please let me know my misconceptions as well!
Also, when I remove the "html.parser" from the code, the .txt file turns out to be empty instead of having at least symbols. Why is this the case?
import bs4, requests, sys
reload(sys)
sys.setdefaultencoding("utf-8")
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
content = novelSoup.select("br")
novelFile = open("novel.txt", "w")
for i in range(len(content)):
novelFile.write(str(content[i].getText()))
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novel.encoding = "GBK"
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
out:
<br>
一元宗,坐落在青峰山上,绵延极长,现在是盛夏时节,天空之中,太阳慢慢落了下去,夕阳将影子拉的很长。<br/>
<br/>
一片不是很大的小湖泊边上,一个约莫着十七八岁的青衣少年坐在湖边,抓起湖边的一块石头扔出,顿时在湖边打出几朵浪花。<br/>
<br/>
叶希文有些茫然,他没想到,他居然穿越了,原本叶希文只是二十一世纪的地球上一个普通的大学生罢了,一个月了,他才后知后觉的反应过来,这不是有人和他进行恶作剧,而是,他真的穿越了。<br/>
Requests will automatically decode content from the server. Most
unicode charsets are seamlessly decoded.
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text. You can find out
what encoding Requests is using, and change it, using the r.encoding
property:
>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'
If you change the encoding, Requests will use the new value of
r.encoding whenever you call r.text.

How to test whether a Django FileField is blank?

What is the right way to test whether a Django FileField is blank, i.e., when no file has been uploaded?
It looks like the name attribute is u'' when the field is blank, but I don't know whether that is reliable.
I ran in a similar problem and found a possible solution (surely not the best).
I'm currently checking if the the cleaned data inside the file field is an instance of the class TemporaryUploadedFile (django.core.files.uploadedfile.TemporaryUploadedFile):
** your code here**
from django.core.files.uploadedfile import TemporaryUploadedFile
** your code here**
if isinstance(form_instance.cleaned_data['my_file'], TemporaryUploadedFile):
# do stuff
I hope this is going to help.
Cheers!
David
This is one from python idioms, always test it in simplest way possible:
if some_object.file:
# File exists!
# This of course does not guarantees that file exists on disk, or that it is
# readable.
else:
# No file.

How to submit image uploads in Django tests?

The Django docs (http://docs.djangoproject.com/en/dev/topics/testing/#django.test.client.Client.post) say to do this:
>>> c = Client()
>>> f = open('wishlist.doc')
>>> c.post('/customers/wishes/', {'name': 'fred', 'attachment': f})
>>> f.close()
But when I do that the field has the error message "The submitted file is empty." That smells like a PIL issue but the form works fine on the actual site.
Reading the file and sending that instead of just a handle doesn't work either and behaves the same as passing an empty string.
OK I figured it out. I was using the same dummy image for multiple fields and Django doesn't reset the pointer after validating the first field.
Also the example in the docs doesn't show that images need to be opened in binary mode as well.
I think open expects a file path relative to where it’s being called from.
I’m not sure where that would be when a test is being run, but maybe try with an absolute path and see if it works?