Wrong encoding when retrieving get argument - django

I have a an url encoded with URL encoding, namely : /filebrowser/?cd=bank/fran%E7ais/essais
The problem is that if I retrieve the argument through :
path = request.GET.get('relative_h', None)
I get :
/filebrowser/?cd=bank/fran�ais/essais
instead of:
/filebrowser/?cd=bank/français/essais
or :
/filebrowser/?cd=bank/fran%E7ais/essais
Yet, %E7 does correspond to 'ç', as you can see there.
And since the %E7 is decoded with the replacement character, I can't even use urllib.parse.unquote to get my 'ç' back...
Is there a way to get the raw argument or the correctly decoded string?

Switching the request encoding to latin-1 before accessing the parameter returned the correctly decoded string for me, when running your example locally.
request.encoding = 'latin-1'
path = request.GET.get('relative_h', None)
However, I'm not able to tell you why that would be, since I would have assumed that the default encoding of utf-8 would have handled that particular character.

Related

Django 'ascii' codec can't encode characters despite encoding in UTF-8? What am I doing wrong?

I'm still in the process of learning Django. I have a bit of a problem with encoding a cyrillic strings. I have a text input. I append it's value using JS to the URL and then get that value in my view (I know I should probably use a form for that, but that's not the issue).
So here's my code (it's not complete, but it shows the main idea I think).
JS/HTML
var notes = document.getElementById("notes").value;
...
window.location.href = 'http://my-site/example?notes='+notes
<input type="text" class="notes" name="notes" id="notes">
Django/Python
notes= request.GET.get('notes', 0)
try:
notes = notes.encode('UTF-8')
except:
pass
...
sql = 'INSERT INTO table(notes) VALUES(%s)' % str(notes)
The issue is, whenever I type a string in cyrillic I get this error message: 'ascii' codec can't encode characters at position... Also I know that I probably shouldn't pass strings like that to the query, but it's a personal project so... that would do for now. I've been stuck there for a while now. Any suggestions as to what's causing this would be appreciated.
request.GET.get("key") will already get a string, why you need to encode it?
May set request.encoding="utf-8" work for you.

Encoding automatically in Postman

i have an uri that ends in something like this
...headfields=id,id^name
i was using the encodeURIComponent(Right click on the uri) to replace that "^" by "%5E" and works fine.
But my question is, can this be automatic in postman?
url encoding is done automatically you don't have to explicitly do that
Note for query parameters if you type in special character with special meaning in the url then it will not encode it , if you give it in params then it will
usecase 1 : typing in special characters
usecase2 : giving it in params
you can also encode in prerequest script as :
pm.request.url=encodeURI(pm.variables.replaceIn(pm.request.url))

Decoding and encoding JSON in Django

I was following some django rest framework tutorials and found some obscure codes. This snippet is from the customised user model, the project from which uses jwt for authentication.
As I commented in the snippet, I can't notice the reason Why they first encodes data and decode it again. I thought this kind of pattern is not only specific to this tutorial, but quite a general pattern. Could anyone explain me please?
def _generate_jwt_token(self):
"""
Generates a JSON Web Token that stores this user's ID and
has an expiry date set to 60 days into the future.
"""
dt = datetime.now() + timedelta(days=60)
token = jwt.encode({ #first encode here
'id': self.pk,
'exp': int(dt.strftime('%s'))
}, settings.SECRET_KEY, algorithm='HS256')
return token.decode('utf-8') #returns decoded object
“Encoding” usually refers to converting data to its binary representation (bytes).
JWT (JSON Web Token) encoding uses a specific data structure and cryptographic signing to allow secure, authenticated exchanges.
The steps to encode data as JWT are as follows :
The payload is converted to json and encoded using base64.
A header, specifying the token type (eg. jwt) and the signature algorithm to use (eg. HS256), is encoded similarly.
A signature is derived from your private key and the two previous values.
Result is obtained by joining header, payload and signature with dots. The output is a binary string.
More informations here.
Decoding it using UTF-8 transforms this binary string into an Unicode string :
>>> encoded_bin = jwt.encode({'some': 'data'}, 'secret_sig', algorithm='HS256')
>>> type(encoded_bin)
<class 'bytes'>
>>> encoded_string = encoded_bin.decode('utf-8')
>>> type(encoded_string)
<class 'str'>
Notes:
It is not always possible to decode bytes to string. Base64-encoding your data allows you to store any bytes as a text representation, but the encoded form requires more space (+33%) than it's raw representation.
A binary string is prefixed by a b in your Python interpreter (eg. b"a binary string")

how can I determine whether an email header is base64 encoded

Using the email.header package, I can do
the_text,the_charset = decode_header(inputText)
to get the character set of the email header, where the inputText was retrieved by a command like
inputText = msg.get('From')
to use the From: header as an example.
in order to extract the header encoding for that header, do I have to do something like this?:
the_header_encoding = email.charset.Charset(the_charset).header_encoding
That is, do I have to create an instance of the Charset class based on the name of the charset (and would that even work?), or is there a way to extract the header encoding more directly from the header itself?
Encoded-Message header can consist of 1 or more lines, and each line can use a different encoding, or no encoding at all.
You'll have to parse the type of encoding out yourself, one per line. Using a regular expression:
import re
quopri_entry = re.compile(r'=\?[\w-]+\?(?P<encoding>[QB])\?[^?]+?\?=', flags=re.I)
encodings = {'Q': 'quoted-printable', 'B': 'base64'}
def encoded_message_codecs(header):
used = []
for line in header.splitlines():
entry = quopri_entry.search(line)
if not entry:
used.append(None)
continue
used.append(encodings.get(entry.group('encoding').upper(), 'unknown'))
return used
This returns a list of strings drawn from quoted-printable, base64, unknown or None if no Encoded-Message was used for that line.

Urldecode GET parameter in django

I have a GET parameter of value Krak%F3w. It should be decoded as Kraków. I tried to urlunquote it, but when I try to print it to the console, I get this:
UnicodeEncodeError at /someurl.html
'charmap' codec can't encode character u'\ufffd' in position 4: character maps to <undefined>
And this:
Unicode error hint
The string that could not be encoded/decoded was: Krak�w
The encoding seems to be iso-8859-2, so you need to decode it:
url=urllib.unquote(url).decode('iso-8859-2')