Is there a easy way to dump UTF-8 data from a database?
I know this command:
manage.py dumpdata > mydata.json
But the data I got in the file mydata.json, Unicode data looks like:
"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"
I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).
After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.
manage.py dumpdata --format=xml > output.xml
I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.
Hope this will help someone, as I've landed at this thread when looking for a solution..
django-admin.py dumpdata yourapp could dump for that purpose.
Or if you use MySQL, you could use the mysqldump command to dump the whole database.
And this thread has many ways to dump data, including manual methods.
UPDATE: because OP edited the question.
To convert from JSON encoding string to human readable string you could use this:
open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
This solution worked for me from #Julian Polard's post.
Basically just add -Xutf8 in front of py or python when running this command:
python -Xutf8 manage.py dumpdata > data.json
Please upvote his answer as well if this worked for you ^_^
You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.
Here I wrote a snippet for that.
Works for me!
You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:
# serfializers/json_no_uescape.py
from django.core.serializers.json import *
class Serializer(Serializer):
def _init_options(self):
super(Serializer, self)._init_options()
self.json_kwargs['ensure_ascii'] = False
Then register new serializer (for example in your app __init__.py file):
from django.core.serializers import register_serializer
register_serializer('json-no-uescape', 'serializers.json_no_uescape')
Then you can run:
manage.py dumpdata --format=json-no-uescape > output.json
As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:
open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))
Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.
I'm usually add next strings in my Makefile:
.PONY: dump
# make APP=core MODEL=Schema dump
dump:
#python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
> ${APP}/fixtures/${MODEL}.json
It's ok for standard django project structure, fix if your project structure is different.
This problem has been fixed for both JSON and YAML in Django 3.1.
here's a new solution.
I just shared a repo on github: django-dump-load-utf8.
However, I think this is a bug of django, and hope someone can merge my project to django.
A not bad solution, but I think fix the bug in django would be better.
manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)
I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:
manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)
In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.
I hope it helps :)
Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python.
djangoproject.com
Related
I am trying to add an external locale directory from the pycountry package.
Before initializing Flask Babel, I do the following:
import pycountry
app.config['BABEL_TRANSLATION_DIRECTORIES'] = 'translations;' + pycountry.LOCALES_DIR
But alas, this does not seem to be enough. For example, gettext('Germany') will not find the translation.
I think the problem might be how translations are structured in pycountry.
~/.local/lib/python3.5/site-packages/pycountry/locales/pt/LC_MESSAGES$ ls
iso15924.mo iso3166-3.mo iso4217.mo iso639-3.mo
iso3166-1.mo iso3166.mo iso639_3.mo
Do I need to specify I want, e.g., the iso3166 file? Please see the following reference.
Reference: pycountry locale documentation section
I also needed to load pycountry locale with flask babel.
To do that, I look into flask-babel get_translations() about how they load translations.
Anyway, I have something working putting this somewhere in your app.
def hack_country_gettext(string):
translations = support.Translations()
catalog = translations.load(pycountry.LOCALES_DIR, [get_locale()], 'iso3166')
translations.merge(catalog)
return translations.ugettext(string)
and instead of _('Germany') use the hack function hack_country_gettext('Germany')
i have a mysqldump from the customer with wrong vowels.
It is a backup, and i do not get a new one.
eg instead of ü there is ü, instead of ö there is ö.
To solve this, can i make search and replace in notepad? Or can i damage other tables than tt_content or pages with a global search and replace?
I solved this by export and import with different charset configuration.
Just import your existing mysql dump at your local development server and try export/import as follow.
Create a new mysql dump and try some settings like:
mysqldump --default-character-set=latin1 --skip-set-charset --skip-extended-insert --skip-add-drop-table --no-create-info -u [USERNAME] -p [DBNAME] > [MYSQLDUMNAME].sql
Import the new created mysql dump with settings like:
mysql --default-character-set=utf8 -u [USERNAME] -p [DBNAME] < [MYSQLDUMNAME].sql
You will need some tests, to find out the correct transformation (latin1,utf8).
If you have a mix of correct and incorrect chars in your mysql dump, you will probably exclude such tables, and import them separately like:
mysqldump --default-character-set=latin1 --skip-set-charset --skip-extended-insert --add-drop-table --ignore-table=[DBNAME].[TABLENAME] -u [USERNAME] -p [DBNAME] > [MYSQLDUMNAME].sql
Replace [USERNAME],[DBNAME],[TABLENAME],[MYSQLDUMNAME] with your values.
This is mostly caused by wrong encoding settings used to dump the backup (like communicating with the server in utf-8 when database is in cp-1252). If you can get the settings used to create it, you can import it on your local machine with the same settings correctly and create a new dump with correct settings to fix it.
You can attempt to fix it with search replace, but you will probably miss a lot of symbols, unless it is really small dump and you can actually check it completely by hand afterwards.
Look at the following TYPO3Wiki entry. Here are some method described how to convert the Data into utf8:
https://wiki.typo3.org/UTF-8_support#Possibility_2
I want to use R from django app .Now I am in a huge mess .
--> I have installed rpy2 for that .
---> I am able to run everything from Python IDE
eg .
import rpy2.rinterface as rinterface
rinterface.initr()
or
import rpy2.robjects as something
As I open Python from cmd or in django file .I am getting error R_USER not defined
I am able to write separate .py files and access R but not from django and python shell .
Please help me Out!!!!
Or please tell what else I can use to call R functions from Python
Just create a new system variable R_USER in Environment Variables, with its value being the current user name, and the problem should goes away.
Note, this is clearly for windows platform only.
Otherwise you won't get R_USER exception in the first place.
Each time I added some strings to a Django project, I run "django-admin.py makemessages -all" to generate .PO files for all locales.
The problem is even I only added 5 news strings, the makemessages command will mark 50 strings as fuzzy in .PO files which brings a lot of extra work for our locale maintainers.
This also makes the entire i18n unusable before they manually revise those fuzzy strings.
Removing fuzzy is exactly what I am doing... check this out.
http://code.djangoproject.com/ticket/10852
Sounds like we need extra sh script that automatically removes all the fuzzy from po.
You can use the gettext command line tools to do this now:
msgattrib --clear-fuzzy --empty -o /path/to/output.po /path/to/input.po
The Django management commands just call these tools directly, so you must have this installed. The makemessages uses msgattrib to clear the obsolete strings by setting the output to the same as the input, so I suspect you can do the same with the above to remove fuzzy strings.
From the msgattrib man page:
--clear-fuzzy
set all messages non-'fuzzy'
--empty
when removing 'fuzzy', also set msgstr empty
Where should DATETIME_FORMAT be placed for it to have effect
on the display of date-time in the Django admin site
(Django’s automatic admin interface)?
Documentation for DATETIME_FORMAT, on page
http://docs.djangoproject.com/en/1.0/ref/settings/, says:
"The default formatting to use for datetime fields on
Django admin change-list pages -- and, possibly, by
other parts of the system."
Update 1: DATETIME_FORMAT is broken (the value of it is
ignored), despite the documentation. Many years ago it
worked, but since then the Django implementations have been
broken wrt. this feature. It seems the Django community
can't decide how to fix it (but in the meantime I think they
should remove DATETIME_FORMAT from the documentation or add
a note about this problem to it).
I have put these lines into file "settings.py" of the
website/project (not the app), but it does not seem to have
any effect (after restarting the development server):
DATETIME_FORMAT = 'Y-m-d H:i:sO'
DATE_FORMAT = 'Y-m-d'
As an example "June 29, 2009, 7:30 p.m." is displayed when
using Django admin site.
Django version is 1.0.2 final and Python version is 2.6.2
(64 bit). Platform: Windows XP 64 bit.
Stack Overflow question European date input in Django Admin seems to be about the exact opposite problem (and thus an apparent
contradiction).
The full path to file "settings.py" is
"D:\dproj\MSQall\website\GoogleCodeHost\settings.py". I now
start the development server this way (in a Windows command
line window):
cd D:\dproj\MSQall\website\GoogleCodeHost
set DJANGO_SETTINGS_MODULE=GoogleCodeHost.settings
python manage.py runserver 6800
There is no difference. Besides these are positively read
from file "settings.py":
DATABASE_NAME
INSTALLED_APPS
TEMPLATE_DIRS
MIDDLEWARE_CLASSES
"django-admin.py startproject XYZ" does not create file
"settings.py" containing DATETIME_FORMAT or DATE_FORMAT.
Perhaps there is a reason for that?
The sequence "d:", "cd D:\dproj\MSQall\website\GoogleCodeHost",
"python manage.py
shell", "from django.conf import settings",
"settings.DATE_FORMAT", "settings.DATETIME_FORMAT" outputs
(as expected):
'Y-m-d H:i:sO'
'Y-m-d'
So the content of file "settings.py" is being read, but does
not take effect in the Django Admin interface.
With:
USE_L10N = False
DATE_TIME takes effect, since the localization of l10n overrides DATETIME_FORMAT and DATE_FORMAT as documented at: https://docs.djangoproject.com/en/1.9/ref/settings/#date-format
As Ciro Santilli told, localization format overrides DATETIME_FORMAT in settings when USE_L10N = True. But you can still override DATETIME_FORMAT and other date/time formats by creating custom format files as described in Django documentation.
See detailed answer here.
You can override DATE_FORMAT, DATETIME_FORMAT, TIME_FORMAT and other date/time formats when USE_L10N = True by creating custom format files as described in Django documentation.
In summary:
Set FORMAT_MODULE_PATH = 'yourproject.formats' in settings.py
Create directory structure yourproject/formats/en (replacing en with the corresponding ISO 639-1 locale code if you are using other locale than English) and add __init__.py files to all directories to make it a valid Python module
Add formats.py to the leaf directory, containing the format definitions you want to override, e.g. DATE_FORMAT = 'j. F Y'.
Example from an actual project here.
This will solve the particular problem that is not possible
with DATETIME_FORMAT (as it is ignored in the current Django
implementations despite the documentation), is dirty too and
is similar to ayaz's answer (less global - will only affect
the admin site list view):
Right after the line
(date_format, datetime_format,time_format) = get_date_formats()
in file (Django is usually in folder Lib/site-packages in
the Python installation)
django/contrib/admin/templatetags/admin_list.py
overwrite the value of datetime_format (for a
models.DateTimeField in the model):
datetime_format = 'Y-m-d H:i:sO'
And for date-only fields:
date_format = 'Y-m-d'
Restart of the web-server (e.g. development server) or
logging out of the admin interface is NOT necessary for
this change to take effect. A simple refresh in the web-browser
is all what is required.
The two setting directives should be defined in settings.py. Could you ensure that the same settings.py that you are editing is being read when you start the development server?
You could always drop to the Python interactive shell by running python manage.py shell, and run these commands to ensure whether the date/time format values are getting through fine:
from django.conf import settings
settings.DATE_FORMAT
settings.DATETIME_FORMAT
Ok, I forgot to look it up, but ticket #2203 deals with this. Unfortunately, the ticket remains in pending state.
I remember that for a project that used a certain trunk revision of the 0.97 branch of Django, I worked around that by overwriting the date_format and datetime_format values in the get_date_formats() function inside django/utils/translation/trans_real.py. It was dirty, but I had already been using a custom Django of sorts for that project, so didn't see anything going wrong in hacking it trifle more.