Windows line breaks in plain text rendered using Django template - django

I am attempting to export data to a plain text file using Django 1.10 (Python 3.5) views/templates. This text file must be OS agnostic in that Windows users ought to have no trouble viewing the file. Unfortunately, when Django renders my template which has \r\n (Windows friendly) line breaks in the file, the line breaks are magically converted into \n (Mac/Linux friendly) line breaks. What gives?
Here's how I'm attempting to render the plain text file:
from django.template import loader, Context
def myview(request):
my_data = get_my_data()
response = HttpResponse(content_type='text/plain')
response['Content-Disposition'] = 'attachment; filename="export.txt"'
template = loader.get_template('export.txt') # <- this file has \r\n line breaks
context = Context({'data': my_data})
response.write(template.render(context))
return response
Upon downloading the exported file using Chrome or Edge in Windows and opening in Notepad, the line breaks aren't respected, and upon viewing the file in Notepad++ (and showing EOL characters), only the \n character is there! Any help would be greatly appreciated :)

I have faced the same issue, unfortunately this seems to happen in Django's template loader/engine somewhere. If you initiate a Template directly without going through get_template() (or render_to_string() and similar methods which uses the same calls), this does not happen. Here is a simplified version of my testcase:
>>> from django import template
>>> t = template.Template('{% for line in dataset %}{{ line.field1 }}\t{{ line.field2 }}\r\n{{ line.field3 }}\t{{ line.field4 }}\r\n{% endfor %}')
>>> dataset = {'field1': 'F1', 'field2': 'F2', 'field3': 'F3', 'field4': 'F4'}
>>> c = template.Context({'dataset': dataset})
>>> t.render(c)
This outputs the template correctly with \r\n in place:
u'F1\tF2\r\nF3\tF4\r\n'
However if I load the same template from a file, using get_template() like you do, it strips them:
>>> from django.template import loader
>>> t = loader.get_template('test.txt')
>>> t.render(c)
u'F1\tF2\nF3\tF4\n'
I tried wading through the Django code in order to identify where this occurs, but with time constraints on my hand I ended up doing the "quick'n'dirty" fix instead:
>>> t = loader.get_template('test.txt')
>>> out = t.render(c).replace('\n', '\r\n')
Beware, this replaces any \n you might have in the actual data fields too, so use with caution....

Related

Reading multiple files in a directory with pyyaml

I'm trying to read all yaml files in a directory, but I am having trouble. First, because I am using Python 2.7 (and I cannot change to 3) and all of my files are utf-8 (and I also need them to keep this way).
import os
import yaml
import codecs
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
def yaml_dump(filepath, data):
with open(filepath, 'w') as file_descriptor:
yaml.dump(data, file_descriptor)
if __name__ == "__main__":
filepath = os.listdir(os.getcwd())
data = yaml_reader(filepath)
print data
When I run this code, python gives me the message:
TypeError: coercing to Unicode: need string or buffer, list found.
I want this program to show the content of the files. Can anyone help me?
I guess the issue is with filepath.
os.listdir(os.getcwd()) returns the list of all the files in the directory. so you are passing the list to codecs.open() instead of filename
There are multiple problems with your code, apart from that it is invalide Python, in the way you formatted this.
def yaml_reader(filepath):
with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
however it is not necessary to do the decoding, PyYAML is perfectly capable of processing UTF-8:
def yaml_reader(filepath):
with open(filepath, "rb") as file_descriptor:
data = yaml.load_all(file_descriptor)
return data
I hope you realise your trying to load multiple documents and always get a list as a result in data even if your file contains one document.
Then the line:
filepath = os.listdir(os.getcwd())
gives you a list of files, so you need to do:
filepath = os.listdir(os.getcwd())[0]
or decide in some other way, which of the files you want to open. If you want to combine all files (assuming they are YAML) in one big YAML file, you need to do:
if __name__ == "__main__":
data = []
for filepath in os.listdir(os.getcwd()):
data.extend(yaml_reader(filepath))
print data
And your dump routine would need to change to:
def yaml_dump(filepath, data):
with open(filepath, 'wb') as file_descriptor:
yaml.dump(data, file_descriptor, allow_unicode=True, encoding='utf-8')
However this all brings you to the biggest problem: that you are using PyYAML, that will mangle your YAML, dropping flow-style, comment, anchor names, special int/float, quotes around scalars etc. Apart from that PyYAML has not been updated to support YAML 1.2 documents (which has been the standard since 2009). I recommend you switch to using ruamel.yaml (disclaimer: I am the author of that package), which supports YAML 1.2 and leaves comments etc in place.
And even if you are bound to use Python 2, you should use the Python 3 like syntax e.g. for print that you can get with from __future__ imports.
So I recommend you do:
pip install pathlib2 ruamel.yaml
and then use:
from __future__ import absolute_import, unicode_literals, print_function
from pathlib import Path
from ruamel.yaml import YAML
if __name__ == "__main__":
data = []
yaml = YAML()
yaml.preserve_quotes = True
for filepath in Path('.').glob('*.yaml'):
data.extend(yaml.load_all(filepath))
print(data)
yaml.dump(data, Path('your_output.yaml'))

PyYAML shows "ScannerError: mapping values are not allowed here" in my unittest

I am trying to test a number of Python 2.7 classes using unittest.
Here is the exception:
ScannerError: mapping values are not allowed here
in "<unicode string>", line 3, column 32:
... file1_with_path: '../../testdata/concat1.csv'
Here is the example the error message relates to:
class TestConcatTransform(unittest.TestCase):
def setUp(self):
filename1 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat1.pkl'
self.df1 = pd.read_pickle(filename1)
filename2 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat2.pkl'
self.df2 = pd.read_pickle(filename2)
self.yamlconfig = u'''
--- !ConcatTransform
file1_with_path: '../../testdata/concat1.csv'
file2_with_path: '../../testdata/concat2.csv'
skip_header_lines: [0]
duplicates: ['%allcolumns']
outtype: 'dataframe'
client: 'testdata'
addcolumn: []
'''
self.testconcat = yaml.load(self.yamlconfig)
What is the the problem?
Something not clear to me is that the directory structure I have is:
app
app/etl
app/tests
The ConcatTransform is in app/etl/concattransform.py and TestConcatTransform is in app/tests. I import ConcatTransform into the TestConcatTransform unittest with this import:
from app.etl import concattransform
How does PyYAML associate that class with the one defined in yamlconfig?
A YAML document can start with a document start marker ---, but that has to be at the beginning of a line, and yours is indented eight positions on the second line of the input. That causes the --- to be interpreted as the beginning of a multi-line plain (i.e. non-quoted) scalar, and within such a scalar you cannot have a : (colon + space). You can only have : in quoted scalars. And if your document does not have a mapping or sequence at the root level, as yours doesn't, the whole document can only consists of a single scalar.
If you want to keep your sources nicely indented like you have now, I recommend you use dedent from textwrap.
The following runs without error:
import ruamel.yaml
from textwrap import dedent
yaml_config = dedent(u'''\
--- !ConcatTransform
file1_with_path: '../../testdata/concat1.csv'
file2_with_path: '../../testdata/concat2.csv'
skip_header_lines: [0]
duplicates: ['%allcolumns']
outtype: 'dataframe'
client: 'testdata'
addcolumn: []
''')
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_config)
You should get into the habit to put the backslash (\) at the end of your first triple-quotes, so your YAML document. If you do that, your error would have actually indicated line 2 because the document doesn't start with an empty line anymore.
During loading the YAML parser encouncters the tag !ConcatTransform. A constructor for an object is probably registered with the PyYAML loader, associating that tag with the using PyYAML's add_constructor, during the import.
Unfortunately they registered their constructor with the default, non-safe, loader, which is not necessary, they could have registered with the SafeLoader, and thereby not force users to risk problems with non-controlled input.

jinja2 ignoring last new line?

I'm creating script that generates specific files using jinja2 as template engine. It creates file I expect, except for the last line. In template I have specified last empty line, but when file is created it does not have that line.
Template looks like this:
# -*- coding: utf-8 -*-
from openerp import fields, models, api
class {{ class_name }}(models.{{ model_type }}):
"""{{ class_docstring }}"""
_{{ def_type }} = '{{ model }}'
# Here is actually empty line. Note comment does not exist on template. It is just empty line.
So in total there are 10 lines defined in this template. But file that is created using this template will only have 9 lines (that last line will not be created).
Is this expected behavior or it should create me that last line as I am expecting?
Here data and methods that handle rendering:
from jinja2 import Environment, FileSystemLoader
PATH = os.path.dirname(os.path.abspath(__file__))
TEMPLATE_ENVIRONMENT = Environment(
autoescape=True,
loader=FileSystemLoader(os.path.join(PATH, 'templates')),
trim_blocks=False)
...
...
#staticmethod
def render_template(t, context):
# For now it only supports standard templates.
template_filename = TEMPLATE_FILES_MAPPING[t]
return TEMPLATE_ENVIRONMENT.get_template(template_filename).render(
context)
The keep_trailing_newline option may be what you're looking for:
By default, Jinja2 also removes trailing newlines. To keep single
trailing newlines, configure Jinja to keep_trailing_newline.
You can add it to the environment:
TEMPLATE_ENVIRONMENT = Environment(
...
keep_trailing_newline=True)
Another option is to finish the template with two newlines and let jinja2 strip one of them:
File contents
File contents
...
# Actual empty line (without this comment) which is kept by jinja2
# Actual empty line (without this comment) which is stripped by jinja2

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Error translating Tornado template with gettext

I've got this site running on top with Tornado and its template engine that I want to Internationalize, so I thought on using gettext to help me with that.
Since my site is already in Portuguese, my message.po (template) file has all msgid's in portuguese as well (example):
#: base.html:30 base.html:51
msgid "Início"
msgstr ""
It was generated with xgettext:
xgettext -i *.html -L Python --from-code UTF-8
Later I used Poedit to generate the translation file en_US.po and later compile it as en_US.mo.
Stored in my translation folder:
translation/en_US/LC_MESSAGES/site.mo
So far, so good.
I've created a really simple RequestHandler that would render and return the translated site.
import os
import logging
from tornado.web import RequestHandler
import tornado.locale as locale
LOG = logging.getLogger(__name__)
class SiteHandler(RequestHandler):
def initialize(self):
locale.load_gettext_translations(os.path.join(os.path.dirname(__file__), '../translations'), "site")
def get(self, page):
LOG.debug("PAGE REQUESTED: %s", page)
self.render("site/%s.html" %page)
As far as I know that should work perfectly, but somehow I've encountered some issues:
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
TypeError: not all arguments converted during string formatting
ERROR:views.site:Could not load template
Traceback (most recent call last):
File "/Users/ademarizu/Dev/git/new_plugin/site/src/main/py/views/site.py", line 20, in get
self.render("site/%s.html" %page)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 664, in render
html = self.render_string(template_name, **kwargs)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/web.py", line 771, in render_string
return t.generate(**namespace)
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/template.py", line 278, in generate
return execute()
File "site/home_html.generated.py", line 11, in _tt_execute
_tt_tmp = _("Início") # site/base.html:30
File "/Users/ademarizu/Dev/virtualEnvs/execute/lib/python2.7/site-packages/tornado/locale.py", line 446, in translate
return self.gettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 406, in ugettext
return self._fallback.ugettext(message)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gettext.py", line 407, in ugettext
return unicode(message)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
Any help?
Ah, I'm running python 2.7 btw!
1 - How do I tell Tornado that my template has its text in Portuguese so it won't go looking for a pt locale which I don't have?
This is what the set_default_locale method is for. Call tornado.locale.set_default_locale('pt') (or pt_BR, etc) once at startup to tell tornado that your template source is in Portuguese.
2 - When asking for the site with en_US locale, it loads ok but when Tornado is going to translate, it throws an encoding exception.
Remember that in Python 2, strings containing non-ascii characters need to be marked as unicode. Instead of _("Início"), use _(u"Início").