Using UTF-8 encoded JSON fixture file in Django

Using UTF-8 encoded JSON fixture file in Django - django

I'm trying to write a JSON initial data fixture that will be loaded after every call to syncdb.
I placed an initial_data.json file in my mysite/myapp/fixtures directory:
[
{
"model": "myapp.Person",
"pk": 1,
"fields": {
"first_name": "Tom",
"last_name": "Yam"
}
}
]
Everything is working when the file is encoded in ASCII, but when I save it in UTF-8 encoding (I need to use non-ASCII characters) I get to following error:
Problem installing fixture 'initial_data.json': Traceback (most recent call last):
File "D:\Tom\DjangoEnv\Lib\site-packages\django\core\management\commands\loaddata.py", line 190, in handle
for obj in objects:
File "D:\Tom\DjangoEnv\Lib\site-packages\django\core\serializers\json.py", line 47, in Deserializer
raise DeserializationError(e)
DeserializationError: No JSON object could be decoded
According to the Django documentation, I need to set ensure_ascii=False when working with non-ASCII data and JSON serializers, but I can't figure how to do it (since its being called from the syncdb function.
Any ideas how to use a UTF-8 encoded JASON file as a fixture?

load_data would not pass ensure_ascii option to serializer so you have two options:
convert data to ascii unicode escaped before loading it, ie:
import codecs
encoded = codecs.open('/tmp/tst.txt', 'r', 'utf-8').read().encode(
'ascii', 'backslashreplace')
open('/tmp/tst-encoded.txt', 'w').write(encoded)
write your own management command that would pass ensure_ascii
hope this helps.

Related

scheduler produces empty files

I'm using pythonanywhere for a simple scheduled task.
I want to download data from a link once a day and save csv files. Later once i have a decent time series I'll figure out how I actually want to manage the data. It's not much data so don't need anything fancy like a database.
My script takes the data from the google sheets link, adds a log column and a time column, then writes a csv with the date in the filename.
It works exactly as I want it to when I run it manually in pythonanywhere, but the scheduler is just creating empty csv files albeit with the correct name.
Any ideas what's up? I don't understand the log file. Surely the error should happen when it is run manually?
script:
import pandas as pd
import time
import datetime
def write_today(df):
date = time.strftime("%Y-%m-%d")
df.to_csv('Properties_'+date+'.csv')
url = 'https://docs.google.com/spreadsheets/d/19h2GmLN-2CLgk79gVxcazxtKqS6rwW36YA-qvuzEpG4/export?format=xlsx'
df = pd.read_excel(url, header=1).rename(columns={'Unnamed: 1':'code'})
source = pd.read_excel(url).columns[0]
df['source'] = source
df['time'] = datetime.datetime.now()
write_today(df)
the scheduler is set up as so:
log file:
Traceback (most recent call last):
File "/home/abmoore/load_data.py", line 24, in <module>
write_today(df)
File "/home/abmoore/load_data.py", line 16, in write_today
df.to_csv('Properties_'+date+'.csv')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1638, in _save
self._save_header()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1634, in _save_header
writer.writerow(encoded_labels)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)

Your problem there is the UnicodeDecodeError -- you have some non-ascii data in your spreadsheet, and the pandas to_csv function defaults to ascii encoding. try specifying utf8 instead:
def write_today(df):
filename = 'Properties_{date}.csv'.format(date=time.strftime("%Y-%m-%d"))
df.to_csv(filename, encoding='utf8')
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Error translating Tornado template with gettext

I've got this site running on top with Tornado and its template engine that I want to Internationalize, so I thought on using gettext to help me with that.
Since my site is already in Portuguese, my message.po (template) file has all msgid's in portuguese as well (example):
#: base.html:30 base.html:51
msgid "Início"
msgstr ""
It was generated with xgettext:
xgettext -i *.html -L Python --from-code UTF-8
Later I used Poedit to generate the translation file en_US.po and later compile it as en_US.mo.
Stored in my translation folder:
translation/en_US/LC_MESSAGES/site.mo
So far, so good.
I've created a really simple RequestHandler that would render and return the translated site.
import os
import logging
from tornado.web import RequestHandler
import tornado.locale as locale
LOG = logging.getLogger(__name__)
class SiteHandler(RequestHandler):
def initialize(self):
locale.load_gettext_translations(os.path.join(os.path.dirname(__file__), '../translations'), "site")
def get(self, page):
LOG.debug("PAGE REQUESTED: %s", page)
self.render("site/%s.html" %page)
As far as I know that should work perfectly, but somehow I've encountered some issues:
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
TypeError: not all arguments converted during string formatting
ERROR:views.site:Could not load template
Traceback (most recent call last):
File "/Users/ademarizu/Dev/git/new_plugin/site/src/main/py/views/site.py", line 20, in get
self.render("site/%s.html" %page)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 664, in render
html = self.render_string(template_name, **kwargs)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 771, in render_string
return t.generate(**namespace)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/template.py", line 278, in generate
return execute()
File "site/home_html.generated.py", line 11, in _tt_execute
_tt_tmp = _("Início") # site/base.html:30
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/locale.py", line 446, in translate
return self.gettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 406, in ugettext
return self._fallback.ugettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 407, in ugettext
return unicode(message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
Any help?
Ah, I'm running python 2.7 btw!

1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
This is what the set_default_locale method is for. Call tornado.locale.set_default_locale('pt') (or pt_BR, etc) once at startup to tell tornado that your template source is in Portuguese.
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
Remember that in Python 2, strings containing non-ascii characters need to be marked as unicode. Instead of _("Início"), use _(u"Início").

couchbase python sdk ascii exception

First of all, this is the exception
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy\middleware.py", line 62, in _process_chain
return process_chain(self.methods[methodname], obj, *args)
File "C:\Python27\lib\site-packages\scrapy\utils\defer.py", line 65, in process_chain
d.callback(input)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 383, in callback
self._startRunCallbacks(result)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 491, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 578, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "D:\ScrapyProjects\General_Spider_code_version_4\General_Spider_code_version_4\pipelines.py", line 14, in process_item
connection.set(fileName, dict(item)) #write the item to the couchbase database
File "C:\Python27\lib\site-packages\couchbase-1.2.5-py2.7-win-amd64.egg\couchbase\connection.py", line 331, in set
persist_to, replicate_to)
File "C:\Python27\lib\site-packages\couchbase-1.2.5-py2.7-win-amd64.egg\couchbase\_bootstrap.py", line 99, in _json_encode_wrapper
return json.dumps(*args, ensure_ascii=False, separators=(',', ':'))
File "C:\Python27\lib\json\__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "C:\Python27\lib\json\encoder.py", line 210, in encode
return ''.join(chunks)
couchbase.exceptions.ValueFormatError: <Couldn't encode value, inner_cause='ascii' codec can't decode byte 0xe2 in position 5: ordinal not in range(128), C Source=(src\convert.c,131), OBJ={'bathrooms': 1.0, 'furnished': 'No', 'ad_title': 'Large Studio For Rent in IMPZ just 45K/4chqs(KK)', 'agent_fees': -1, 'size': 550.0, 'category': 'Apartment', 'company_rera_number': '12913', 'agent_company': 'AL ANAS REAL ESTATE BROKER', 'ded_licence_number': '700590', 'source': 'dubizzleproperty', 'location': 'UAE \xe2\x80\xaa>\xe2\x80\xaa Dubai \xe2\x80\xaa>\xe2\x80\xaa IMPZ International Media Production Zone ; 3.1 km from Meadows Town Centre \xc2\xa0', 'image_links': [u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/73ff34e2a38c7b104401c9e5c54b03628971053f/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/24ec831f6b4afb47fecc1c3e0991cf3090c90c24/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/77fee11394090aaea2d668cfe2754b92d6e36264/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/5d4113319ccbabcdd65b0ffe7302da59b374b5fe/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/8070689f309759d5860e97aa35d3f0eac425dc1d/main.jpeg', u'http://87421a79fde09fda7e57-79445249ccb41a60f7b99c8ef6df8604.r12.cf3.rackcdn.com/4_async/2015/2/18/8e86702847e69d485d147629dd2e48e1ad831e63/main.jpeg'], 'latitude': -1, 'description': 'Central A/C & Heating , Balcony , Shared Pool , Built in Wardrobes , Walk-in Closet , Shared Gym , Security , Built in Kitchen Appliances', 'bedrooms': 'Studio', 'rent_is_paid': 'Quarterly', 'action': 'Rent', 'link': 'http://dubai.dubizzle.com/property-for-rent/residential/apartmentflat/2015/2/18/large-studio-for-rent-in-impz-just-45k4chq-2/?back=ZHViYWkuZHViaXp6bGUuY29tL3Byb3BlcnR5LWZvci1yZW50L3Jlc2lkZW50aWFsL2FwYXJ0bWVudGZsYXQv&pos=1', 'longitude': -1, 'property_reference': '', 'yearly_cost': 45000.0, 'agent_mobile': -1, 'posting_date': '2015-02-19'}>
I am trying to store a dictionary on a couchbase. i am using this code
connection.set(fileName, dict(item))
to transfer the item to dictionary. as you see from the error message. i have a unicode values, which is according to python sdk couchbase is fine, could you help me please?

Your values are not unicode. Keep in mind that a str object containing valid unicode escape sequences does not automatically make it "Unicode" in Python parlance. You need to ensure the strings are properly Unicode.
This seems to work with the normal json.dumps() function (without any arguments); whereas the python client passes (by default) the ensure_ascii=False parameter to decrease the data size (JSON itself can be in UTF-8 encoding, and is not limited to ASCII).
Thus, a workaround may be to set your own encoding function for JSON which does not pass the ensure_ascii parameter; like so:
import json
import couchbase
couchbase.set_json_converters(json.dumps, json.loads)
Though this workaround is not recommended as it may inflate your document size slightly.

Different behaviour for io in pickle with string content

When working with pickled data I encountered a different behavior for the io.open and __builtin__.open. Consider the following simple example:
import pickle
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, open(fn, 'w'))
a = pickle.load(open(fn, 'r'))
This works as expected. But running this code here:
import pickle
import io
payload = 'foo'
fn = 'test.pickle'
pickle.dump(payload, io.open(fn, 'w'))
a = pickle.load(io.open(fn, 'r'))
gives the following Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "D:/**.py", line 15, in <module>
pickle.dump(payload, io.open(fn, 'w'))
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 224, in dump
self.save(obj)
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "D:\WinPython-32bit-2.7.8.1\python-2.7.8\lib\pickle.py", line 488, in save_string
self.write(STRING + repr(obj) + '\n')
TypeError: must be unicode, not str
As I want to be future-compatible, how can I circumwent this misbehavior? Or, what else am I doing wrong here?
I stumbled over this when dumping dictionaries with keys of type string.
My python version is:
'2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]'

The difference is not supprising, because io.open() explicitly deals with Unicode strings when using text mode. The documentation is quite clear about this:
Note: Since this module has been designed primarily for Python 3.x, you have to be aware that all uses of “bytes” in this document refer to the str type (of which bytes is an alias), and all uses of “text” refer to the unicode type. Furthermore, those two types are not interchangeable in the io APIs.
and
Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as unicode strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.
You need to open files in binary mode. The fact that it worked without with the built-in open() at all is actually more luck than wisdom; if your pickles contained data with \n and/or \r bytes the pickle loading may well fail. The Python 2 default pickle happens to be a text protocol but the output should still be considered as binary.
In all cases, when writing pickle data, use binary mode:
pickle.dump(payload, open(fn, 'wb'))
a = pickle.load(open(fn, 'rb'))
or
pickle.dump(payload, io.open(fn, 'wb'))
a = pickle.load(io.open(fn, 'rb'))

json proper encoding and decoding not working with flask, python

I'm taking a text-input from html being json object and trying to work upon that. But when I'm trying the following code, I'm getting error/(page not rendering) in encoding and decoding JSON.
#app.route('/', methods=['POST'])
def my_form_post():
text = request.form['text']
#getting text-input as text = {'a':'1','b':'2'}
json_input = json.dumps(text)
ordered_json = json.loads(text, object_pairs_hook=ordereddict.OrderedDict)
print ordered_json
processed_text = htmlConvertor(ordered_json)
#rep(jso)
return render_template("my-form.html",processed_text=processed_text)
But when I'm tying to do so with a local JSON variable as jso everything working fine. The same input when I provide with html-input, it's giving an error and I can;t even see the error except displaying Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
#app.route('/', methods=['POST'])
def my_form_post():
jso = '''{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}'''
json_input = json.dumps(jso)
ordered_json = json.loads(jso, object_pairs_hook=ordereddict.OrderedDict)
print ordered_json
processed_text = htmlConvertor(ordered_json)
#rep(jso)
return render_template("my-form.html",processed_text=processed_text)
UPDATE:
Everything working fine now, but for the integers, it isn't working.
For eg:
{"name":"yo","price":"250"}
works perfectly but
{"name":"yo","price":250}
ain't.
What's the solution for that? Any specific answer or I would have to check for integer in python and then convert it to string before applying any JSON related methods and functioning.

Not sure if this is your problem, but {'a':'1','b':'2'} is not a valid JSON object because of the single quotes:
>>> json.loads("{'a':'1','b':'2'}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 171, in JSONObject
raise ValueError(errmsg("Expecting property name", s, end))
ValueError: Expecting property name: line 1 column 1 (char 1)
If instead you use double quotes everything works fine:
>>> json.loads("{\"a\":\"1\",\"b\":\"2\"}")
{u'a': u'1', u'b': u'2'}
Also note that to get stack traces intead of code 500 errors when there is an exception you have to start your flask server as follows:
app.run(debug = True)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using UTF-8 encoded JSON fixture file in Django - django

Related

scheduler produces empty files

Error translating Tornado template with gettext

couchbase python sdk ascii exception

Different behaviour for io in pickle with string content

json proper encoding and decoding not working with flask, python

Categories

Resources