Urldecode GET parameter in django - django

I have a GET parameter of value Krak%F3w. It should be decoded as Kraków. I tried to urlunquote it, but when I try to print it to the console, I get this:
UnicodeEncodeError at /someurl.html
'charmap' codec can't encode character u'\ufffd' in position 4: character maps to <undefined>
And this:
Unicode error hint
The string that could not be encoded/decoded was: Krak�w

The encoding seems to be iso-8859-2, so you need to decode it:
url=urllib.unquote(url).decode('iso-8859-2')

Related

Django 'ascii' codec can't encode characters despite encoding in UTF-8? What am I doing wrong?

I'm still in the process of learning Django. I have a bit of a problem with encoding a cyrillic strings. I have a text input. I append it's value using JS to the URL and then get that value in my view (I know I should probably use a form for that, but that's not the issue).
So here's my code (it's not complete, but it shows the main idea I think).
JS/HTML
var notes = document.getElementById("notes").value;
...
window.location.href = 'http://my-site/example?notes='+notes
<input type="text" class="notes" name="notes" id="notes">
Django/Python
notes= request.GET.get('notes', 0)
try:
notes = notes.encode('UTF-8')
except:
pass
...
sql = 'INSERT INTO table(notes) VALUES(%s)' % str(notes)
The issue is, whenever I type a string in cyrillic I get this error message: 'ascii' codec can't encode characters at position... Also I know that I probably shouldn't pass strings like that to the query, but it's a personal project so... that would do for now. I've been stuck there for a while now. Any suggestions as to what's causing this would be appreciated.
request.GET.get("key") will already get a string, why you need to encode it?
May set request.encoding="utf-8" work for you.

Wrong encoding when retrieving get argument

I have a an url encoded with URL encoding, namely : /filebrowser/?cd=bank/fran%E7ais/essais
The problem is that if I retrieve the argument through :
path = request.GET.get('relative_h', None)
I get :
/filebrowser/?cd=bank/fran�ais/essais
instead of:
/filebrowser/?cd=bank/français/essais
or :
/filebrowser/?cd=bank/fran%E7ais/essais
Yet, %E7 does correspond to 'ç', as you can see there.
And since the %E7 is decoded with the replacement character, I can't even use urllib.parse.unquote to get my 'ç' back...
Is there a way to get the raw argument or the correctly decoded string?
Switching the request encoding to latin-1 before accessing the parameter returned the correctly decoded string for me, when running your example locally.
request.encoding = 'latin-1'
path = request.GET.get('relative_h', None)
However, I'm not able to tell you why that would be, since I would have assumed that the default encoding of utf-8 would have handled that particular character.

Django - Postgres - 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

So I get the error above when I try to use the Admin console and view one of the models in my database. The rows in my database were scraped from a website so if I'm correct, I accidentally scraped the u'\xa0' character and Django does not like this. Correct me if I'm wrong.
Now to fix it I imagine I can just run a psql query to find any u'\xa0' characters and replace them with whatever I need (empty string in this case).
I thought maybe I could use the replace function from postgres:
UPDATE <table> SET <field> = replace(<field>, '\xa0', '')
but it doesn't appear to be working.
Any tips?
Error:
'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)
Maybe you've implemented __unicode__() method and instead of unicode you're returning a string.
So you're doing this:
def __unicode__(self):
return "%s" % self.something
Instead of:
def __unicode__(self):
return u"%s" % self.something
For anyone who runs into a similar problem, this is what I did. I pulled my database into a csv file so I could easily search through it with ctrl+f. I then figured out the ascii characters my database couldn't understand and I adjusted them to suit my needs. I overwrote my database with the adjusted csv.
My database information was scraped from the web so in the future I will make sure my database accepts the information that I scrape.

Latin1/UTF-8 Encoding Problems in AngularJS

I have a Python 2.7 Django + AngularJS app. There's an input field that feeds into the data model and the data is sent to the server using Angular's $http. When the input field contains the character "é", Django doesn't like it. When I use "★é" Django has no problem with it. It seems to me that the star character being outside the latin1 charset forces the encoding to utf-8, while when the only non-latin character is "é", Angular sends the data as latin1, which confuses my python code.
The error message from Django is:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 0: invalid continuation byte
Telling the simplejson.loads() function on the server to read the data using the ISO-8859-1 (latin1) encoding worked fine when my input string contained just the é in it and no star, so that proves that the data coming from the browser is latin1 unless forced to utf-8 by non-latin1 characters, like the star.
Is there a way to tell Angular to always send data using utf-8?
The Angular code that sends the data to the server:
$http({
url: $scope.dataUrl,
method: 'POST',
data: JSON.stringify({recipe: recipe}),
headers: {'Content-Type': 'application/json'}
}).success(...).error(...);
The Django code that reads the data:
recipe = simplejson.loads(request.raw_post_data)['recipe']
I found one way that works, using the transformRequest config parameter.
transformRequest: function (data, headersGetter) {
return encode_utf8(JSON.stringify(data));
}
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
I'm using the encode function found and explained at http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html and the JSON library found at http://www.JSON.org/json2.js.

Django and UnicodeDecodeError

What i do...
___I have an upload form from where i upload .zip files with pictures. And everytime when there is a file title with some non-ascii character äüõ i get a unicode decode error.
title = ' '.join([filename[:filename.rfind('.')], str(count)])
Error:
This line generates the title of the picture , and that is exactly the line that gives me error: 'utf8' codec can't decode byte 0x82 in position 2: invalid start byte. You passed in 'cr\x82ations' (<type 'str'>)
What i tried to do:
I tried to .decode('utf-8') it too. But get the same result everytime no matter what i try.
I read about changing default djangos ascii to utf-8 in site.py , but am not sure it will help , and pretty sure that i don't want to do it.
ANy help is appreciated.
Django has some useful utility methods which you can use.
See: https://docs.djangoproject.com/en/dev/ref/unicode/#conversion-functions
I imagine the code might look something like this:
from django.utils.encoding import smart_str
title = ' '.join([smart_str(filename[:filename.rfind('.')]), str(count)])
I also believe firstly using .decode() is the right option, however, the code page ('utf-8')) you used might incorrect. Can you have a try '1252' or some others? Here are some standard encoding you might interest [Link]http://docs.python.org/library/codecs.html?highlight=arabic
The reason this fails, is because you try to join with a normal str object:
Instead of
' '.join(..)
use:
u' '.join(..)
Or make your life easier using:
from __future__ import unicode_literals