Python GAE unicode literals not properly decoded after deployment - python-2.7

I am building an app running on GAE which receives input from users via webform:
myUnicodeString = cgi.escape(self.request.get('myForm'))
It all works fine locally but after deployment unicode literals are converted into strings of the form: "E2=80=9C no problems with ASCII strings"
Having read Nick's comment here not to use cgi.escape I was wondering that it might be the culprit.
I have also tried adding
from __future__ import unicode_literals
after reading this post but then the program throws an error (TypeError: character mapping must return integer, None or unicode) which is apparently triggered by webapp2_extras session
Any ideas greatly appreciated!
UPDATE:
I have noticed that this decoding/encoding issue has something to do with the text input fields submitted in the same form as the file uploaded to the blobstore. No problems occur while I save the same non ASCII strings via separate forms or via ajax.
UPDATE2:
This is apparently the bug that causes the problem.

Related

Django - How to store emojis in postgres DB properly?

I'm running the latest version of Django on postgres. I'm trying to store emojis in my postgres DB in a way that a React Native app can properly render it. Below I have the initial emojis variables setup that'll go into the table. I've copy and pasted the emojis from here. How do I store emojis in my postgres DB so that a React Native app can render it properly?
I tried following this blog, which suggests adding ’OPTIONS’: {’charset’: ’utf8mb4’} to DATABASES under settings.py, but I get this error django.db.utils.ProgrammingError: invalid dsn: invalid connection option "charset". Seems like this only works for MySQL DBs. How can I store emojis in a Django postgres DB?
Like in the comments suggested, you need to put quotes around the emojis since they're just chars. Though, something like flags is actually two chars. So that's something to be careful about. All your computer is doing is converting unicode to a rendered emoji that's platform dependent.
The emojis that you're using should be unicode supported. On your computer, they're definitely supported. For the most part, additional unicode support for new emojis is very quickly implemented once published on client machines. There should be no problem with emojis in strings. This is a nice video kinda explaining emojis by Tom Scott who keeps getting interviews about emojis: https://www.youtube.com/watch?v=sTzp76JXsoY
I'm not an expert so please correct me if I'm wrong.
In your models you need to use a CharField or a TextField to store emojis, that need to be passed as characters (for example "😄" and not directly 😄). Your database must use utf8 to support emojis, connect to your database with a SQL shell, to check the current encoding run:
SHOW CLIENT_ENCODING;
If the output is not UTF8 run:
SET CLIENT_ENCODING='UTF8';
Now remove ’OPTIONS’: {’charset’: ’utf8mb4’} from your Django settings.

How to prevent unicode character corruption when using getPageContext().getRequest().getParameterValues()?

We have a scenario where a page submits multiple fields with the same name. To workaround the default approach of CF to put these into a comma-delimited string, without changing application-wide, we access field values in certain places as an array using getPageContext().getRequest().getParameterValues("#fieldname#").
The problem we are experiencing is that unicode characters submitted are being corrupted. For example El celular que compré está averiado in a field array comes back as the string El celular que compré está averiado. If I dump getHTTPRequestData() I can see the properly url encoded El+celular+que+compr%C3%A9+est%C3%A1+averiado is sent to the server.
Is the java string not being handled by CF correctly? Anyway to resolve this issue on a non-application-wide basis other than parsing getHTTPRequestData().content which we really don't want to do?
The reason will be because your webserver is not using utf-8 internally for its encoding of parameters. You don't get to see this normally when accessing variables by the url scope, because CF has already converted them for you, however you can see this difference when looking at cgi.query_string or at getPageContext().getRequest().getParameterValues(...)
In your case it looks like you're seeing windows-1252 encoding. I had a similar issue around IIS7.5 - IIS8. Assuming you can't or don't want to risk trying to change your webserver configuration, this workaround should work for you:
webserverEncodedString = getPageContext().getRequest().getParameterValues(fieldname);
binaryValue = CharsetDecode(webserverEncodedString, "windows-1252");
utf8EncodedString = CharsetEncode(binaryValue, "utf-8");

Django query returning non-unicode strings?

I'm completely baffled by a problem I found today: I have a PostgreSQL database with tables which are not managed by Django, and completely normal queries via QuerySet on these tables. However, I've started getting Unicode exceptions and when I went digging, I found that my QuerySets are returning non-Unicode strings!
Example code:
d = Document.objects.get(id=45787)
print repr(d.title), type(d.title)
The output of the above statement is a normal string (without the u prefix), followed by a <str> type identifier. What's more, this normal string contains UTF-8 data as expected, in raw byte form! If I call d.title.decode('utf-8'), I get valid Unicode strings!
Even more puzzling, some of the fields work correctly. This same table / model contains another field, html_filename of the same type (TextField) which is returned correctly, as a Unicode string!
I have no special options, the database data is correctly encoded, and I don't even know where to begin searching for a solution. This is Django 1.6.2.
Update:
Database Server encoding is UTF8, as usual, and the data is correctly encoded. This is on PostgreSQL 9.1 on Ubuntu.
Update 2:
I think I may have found the cause, but I don't know why it behaves this way: I thought the database fields were defined with the text type, as usual, but instead they are defined as citext (http://www.postgresql.org/docs/9.1/static/citext.html). Since the Django model is unmanaged, it looks like Django doesn't interpret the field type as being worthy of converting to Unicode. Any ideas how to force Django to do this?
Apparently, Django will not treat fields of type citext as textual and return them as Unicode strings.

How do I fix Exception Type: UnicodeEncodeError

I am not sure why I am getting this error:
Exception Type: UnicodeEncodeError
Unicode error hint
The string that could not be encoded/decoded was: he Théâtre d
The full traceback is here: http://dpaste.com/686751/ (I put it in a dpaste due to it's length)
I am really confused about this because it works flawlessly on our staging and has been for a year or so now, it's finally on the live server, I copied over the database to the live server and now if I edit anything or add a new page with any sort of french accents I received the above error. I've been googling for hours with not much luck.
In my research I have found some issues with DB collation but I have tried to recreate the database as utf8_general_ci, and converted the tables respectively and still no luck. Any idea?
I should also note that the apps listed in the installed apps are one's we've developed and use for about 13 other live and large web sites on the same server and with the same types of characters.
baffled
Jeff
in model add u''
def __unicode__(self):
return u"%s" % self.your_field
maybe the servers have different library versions?
afaik, the way to fix those errors is using the smart_unicode function in the unicode method in models, as mentioned here:
django unicode encode/decode errors

Django: How can I determine why Django isn't displaying certain data?

I have a Django app that runs a tool and displays the results from the tool back to the user using a Django template. Sometimes Django does not display the results. It doesn't complain about anything, it just doesn't display the results. I'm guessing this is something to do with one or more of the characters in the results being illegal as far as Django is concerned. How can I get more information about what it is that Django doesn't like? Also, is there some method I can use to filter out "bad" characters? The results are normally just lots of text. They contain company confidential stuff, so I can't give an example unfortunately. I have DEBUG set to True and TEMPLATE_DEBUG set to DEBUG.
UPDATE:
I added some code to filter out all chars with a decimal value greater than 127 and it now works.
If you are using the development server, put in a breakpoint with pdb and see what is going on. Or print out the string that you think has "bad" characters. If you aren't using the development server you could use the Python logging module to log the string you are getting from the tool.
You might be leaping to conclusions about the data containing bad characters. It may be something else, and without debugging further it is hard to speculate.
you could try using the built in django encoding methods to remove illegal characters.
from django.utils.encoding import smart_str
smart_str(your_string)