Django query returning non-unicode strings? - django

I'm completely baffled by a problem I found today: I have a PostgreSQL database with tables which are not managed by Django, and completely normal queries via QuerySet on these tables. However, I've started getting Unicode exceptions and when I went digging, I found that my QuerySets are returning non-Unicode strings!
Example code:
d = Document.objects.get(id=45787)
print repr(d.title), type(d.title)
The output of the above statement is a normal string (without the u prefix), followed by a <str> type identifier. What's more, this normal string contains UTF-8 data as expected, in raw byte form! If I call d.title.decode('utf-8'), I get valid Unicode strings!
Even more puzzling, some of the fields work correctly. This same table / model contains another field, html_filename of the same type (TextField) which is returned correctly, as a Unicode string!
I have no special options, the database data is correctly encoded, and I don't even know where to begin searching for a solution. This is Django 1.6.2.
Update:
Database Server encoding is UTF8, as usual, and the data is correctly encoded. This is on PostgreSQL 9.1 on Ubuntu.
Update 2:
I think I may have found the cause, but I don't know why it behaves this way: I thought the database fields were defined with the text type, as usual, but instead they are defined as citext (http://www.postgresql.org/docs/9.1/static/citext.html). Since the Django model is unmanaged, it looks like Django doesn't interpret the field type as being worthy of converting to Unicode. Any ideas how to force Django to do this?

Apparently, Django will not treat fields of type citext as textual and return them as Unicode strings.

Related

Django - How to store emojis in postgres DB properly?

I'm running the latest version of Django on postgres. I'm trying to store emojis in my postgres DB in a way that a React Native app can properly render it. Below I have the initial emojis variables setup that'll go into the table. I've copy and pasted the emojis from here. How do I store emojis in my postgres DB so that a React Native app can render it properly?
I tried following this blog, which suggests adding ’OPTIONS’: {’charset’: ’utf8mb4’} to DATABASES under settings.py, but I get this error django.db.utils.ProgrammingError: invalid dsn: invalid connection option "charset". Seems like this only works for MySQL DBs. How can I store emojis in a Django postgres DB?
Like in the comments suggested, you need to put quotes around the emojis since they're just chars. Though, something like flags is actually two chars. So that's something to be careful about. All your computer is doing is converting unicode to a rendered emoji that's platform dependent.
The emojis that you're using should be unicode supported. On your computer, they're definitely supported. For the most part, additional unicode support for new emojis is very quickly implemented once published on client machines. There should be no problem with emojis in strings. This is a nice video kinda explaining emojis by Tom Scott who keeps getting interviews about emojis: https://www.youtube.com/watch?v=sTzp76JXsoY
I'm not an expert so please correct me if I'm wrong.
In your models you need to use a CharField or a TextField to store emojis, that need to be passed as characters (for example "😄" and not directly 😄). Your database must use utf8 to support emojis, connect to your database with a SQL shell, to check the current encoding run:
SHOW CLIENT_ENCODING;
If the output is not UTF8 run:
SET CLIENT_ENCODING='UTF8';
Now remove ’OPTIONS’: {’charset’: ’utf8mb4’} from your Django settings.

How can I read a dictionary from database without it being turned into a string by django?

My database contains a dictionary. When I read the dictionary from the database and try to do something with it it fails because the dictionary has been automatically converted into a string. Any way to avoid Django turning the dict into a string?
you can also use simplejson.loads() and simplejson.dumps() to deserialize and serialize the dictionary. It is a bit more work, but it ensures that you are not dependent on database.
There are options for MySQL and Postgres but I don't think there's an equivalent for sqlite.
For MySQL JSONField: https://django-mysql.readthedocs.io/en/latest/model_fields/json_field.html
Similarly for Postgres:
https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/fields/#jsonfield
There's built in support to query the contents of the fields which is pretty neat. The docs show examples.
Solved the issue with help from the responses I got here.
When capturing and saving the JSON from the webhook (as that's where the JSON is coming from in my project), I had to do the strange step of serialising and deserialising the JSON before saving it to my database. This process got rid of all the \r and \t charactors which are passed by request.body but make the JSON invalid:
t = Transaction(data=json.dumps(json.loads(request.body)))
t.save()
To load the JSON from database into a python dictionary that I can then use in my code I used json.loads:
data = json.loads(t.data)

Django syncdb doesn't work after invoking inspectdb

I created a model from an existing PostgreSql database using inspectdb, when I try to do a syncdb,in order to generate the authorization tables, it generates the following error:
CommandError: One or more models did not validate:
db.t1: "ip": CharFields require a "max_length" attribute that is a positive integer.
So I put the max_length=255 to all CharFields but it doesn't work neither with that. Django version is 1.5.1.
Anyone have an idea how to fix this?
Currently inspectdb doesn't set max_length for PostgreSQL char fields without specified length. FYI, quote from postgreSQL docs:
The notations varchar(n) and char(n) are aliases for character
varying(n) and character(n), respectively. character without length
specifier is equivalent to character(1). If character varying is used
without length specifier, the type accepts strings of any size. The
latter is a PostgreSQL extension.
But, Django doesn't allow to define CharFields without max_length parameter. There is an open ticket for it.
This django snippet provides a custom CharField without length limit - should pass all django's validation.
Also, switching to TextField could help too.
Hope that helps.

django model.charfield - unicode or not unicode

I transfered my project to another computer and get an error while running a view.
I'm getting some informations of a model and want to save them to XML by using XMLGenerator.
On the one computer it works fine, type() of the model.charField() returns "unicode"
On the new computer it did not work, type() of the model.charField() returns "str"
The working computer has Python 2.7.2
The not working computer has Python 2.5.2
So on the not working computer I did not get unicode which can be handled by XMLGenerator. I tried to work around the problem by running .decode("utf-8") on the string which is served by the model and it worked.
But how can I know what encoding the string is? I guessed now that it has the same encoding as in the database but am I right?
regards Martin
could you please check the mysql collation settings? if those are also the same?
from django doc:
"In many cases, this default will not be a problem. However, if you really want case-sensitive comparisons on a particular column or table, you would change the column or table to use the utf8_bin collation. The main thing to be aware of in this case is that if you are using MySQLdb 1.2.2, the database backend in Django will then return bytestrings (instead of unicode strings) for any character fields it receive from the database."
see django doc collation settings
Let's say we have this:
a = unicode('a')
b = str('b')
A fast check is to do:
print type(a)
print type(b)
If you want to validate them you can do:
if isinstance(a, str):
if isinstance(a, unicode):
A way would to to typecast the content:
c = str(a)
d = unicode(b)

Templavoila loose it's mapping after conversion of database to UTF-8

I'm using TemplaVoila 1.5.5 and TYPO3 4.5.5. I tried to switch a TYPO3 database from latin1_swedish_ci (ISO-8859-1) to utf8_general_ci. Therefore I wrote a PHP script which converted everything to binary and than everything to utf8_general_ci. Everything seemed to work except TemplaVoila (all other settings in Typo3 were already prepared for UTF-8 but not the database). I got the following message when opening the TYP3 page:
Couldn't find a Data Structure set for table/row "pages:x". Please
select a Data Structure and Template Object first.
If I looked in a template mapping I got the next message that there is no mapping available. In the table tx_templavoila_tmplobj in the column templatemapping the mapping is stored as BLOB. When converting to UTF-8 everything is gone. Because its binary I can't access it and convert it in an easy way.
How can I keep the mapping? I don't want to map everything new. What can I do?
Here there are two proposed solutions but I want to know if there are better ones. In the solution from Michael I also have to map everything again?
What is the fastet way to restore the mapping?
I can't say if you'll be able to recover the data now that it's been converted, but if you're willing to run your conversion script I have had some success with the following approach:
Unserialize the data in the templatemapping field of the tx_templavoila_tmplobj table.
Convert the unserialized data array to your target encoding (there is a helper method t3lib_cs::convArray which you might be able to use for this purpose).
Serialize the converted data and save it back to the templatemapping field.
Fastest way: just change the field manually back to mediumtext. All mappings should be fine again. I know, it's quick and dirty, but worked for me...