Support german translation in export csv in django - django

I have the german translation on my website and I want to export data in CSV . Unfortunately it prints this "â€Ernährungssouveränität" instead of the german character "Ernährungssouveränität".
what can I do?
I tested utf-8 but it didn't work.
response = HttpResponse(content_type="text/csv", charset="utf-8")

Related

flask's send_file truncating burmese characters

I've created a pdf file from a template then edited it via pdfrw and I wanted to return the output pdf to the user. The problem is, the user of my form will be from Myanmar(using Burmese Characters), the output pdf file is displaying the Burmese characters correctly but it seems that flask's send_mail is truncating/not displaying the fields with values in Burmese. I tried to input characters in Japanese, Korean, and Chinese, so far these characters are displaying correctly. Tried also with Hindi (Indian Characters) with the same result with Burmese(truncated).
pdf_file = create_pdf(form, 'claim-form')
return send_file(os.path.realpath(pdf_file.name), download_name='python.pdf',\
as_attachment=True, mimetype="Content-Type: application/pdf; charset=UTF-8")

scraping chinese characters python

I learnt how to scrap website from https://automatetheboringstuff.com. I wanted to scrap http://www.piaotian.net/html/3/3028/1473227.html in which the contents is in chinese and write its contents into a .txt file. However, the .txt file contains random symbols which I assume is a encoding/decoding problem.
I've read this thread "how to decode and encode web page with python?" and figured the encoding method for my site is "gb2312" and "windows-1252". I tried decoding in those two encoding methods but failed.
Can someone kindly explain to me the problem with my code? I'm very new to programming so please let me know my misconceptions as well!
Also, when I remove the "html.parser" from the code, the .txt file turns out to be empty instead of having at least symbols. Why is this the case?
import bs4, requests, sys
reload(sys)
sys.setdefaultencoding("utf-8")
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
content = novelSoup.select("br")
novelFile = open("novel.txt", "w")
for i in range(len(content)):
novelFile.write(str(content[i].getText()))
novel = requests.get("http://www.piaotian.net/html/3/3028/1473227.html")
novel.raise_for_status()
novel.encoding = "GBK"
novelSoup = bs4.BeautifulSoup(novel.text, "html.parser")
out:
<br>
一元宗,坐落在青峰山上,绵延极长,现在是盛夏时节,天空之中,太阳慢慢落了下去,夕阳将影子拉的很长。<br/>
<br/>
一片不是很大的小湖泊边上,一个约莫着十七八岁的青衣少年坐在湖边,抓起湖边的一块石头扔出,顿时在湖边打出几朵浪花。<br/>
<br/>
叶希文有些茫然,他没想到,他居然穿越了,原本叶希文只是二十一世纪的地球上一个普通的大学生罢了,一个月了,他才后知后觉的反应过来,这不是有人和他进行恶作剧,而是,他真的穿越了。<br/>
Requests will automatically decode content from the server. Most
unicode charsets are seamlessly decoded.
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text. You can find out
what encoding Requests is using, and change it, using the r.encoding
property:
>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'
If you change the encoding, Requests will use the new value of
r.encoding whenever you call r.text.

Django csv output in Excel

Hi I have a simple view which returns a csv file of a queryset which is generated from a mysql db using utf-8 encoding:
def export_csv(request):
...
response = HttpResponse(mimetype='text/csv')
response['Content-Disposition'] = 'attachment; filename=search_results.csv'
writer = csv.writer(response, dialect=csv.excel)
for item in query_set:
writer.writerow(smart_str(item))
return response
return render(request, 'search_results.html', context)
This works fine as a CSV file, and can be opened in text editors, LibreOffice etc. without problem.
However, I need to supply a file which can be opened in MS Excel in Windows without errors. If I have strings with latin characters in the queryset such as 'Española' then the output in Excel is 'Española'.
I tried this blogpost but it didn't help. I also know abut the xlwt package, but I am curious if there is a way of correcting the output, using the CSV method I have at the moment.
Any help much appreciated.
Looks like there is not a uniform solution for all version of Excel.
Your best bet migth be to go with openpyxl, but this is rather complicated and requiers
separate handling of downloads for excel users which is not optimal.
Try adding byte order marks at the beginnign (0xEF, 0xBB, 0xBF) of file. See microsoft-excel-mangles-diacritics-in-csv-files
There is another similar post.
You might give python-unicodecsv a go. It replaces the python csv module which doesn't handle Unicode too gracefully.
Put the unicodecsv folder somehwere you can import it or install via setup.py
Import it into your view file, eg :
import unicodecsv as csv
I found out there are 3 things to do for Excel to open unicode csv files properly:
Use utf-16-le charset
Insert utf-16 byte order mark to the beginning of exported file
Use tabs instead of commas in csv
So, this should make it work in Python 3.7 and Django 2.2
import codecs
...
def export_csv(request):
...
response = HttpResponse(content_type='text/csv', charset='utf-16-le')
response['Content-Disposition'] = 'attachment; filename=search_results.csv'
response.write(codecs.BOM_UTF16_LE)
writer = csv.writer(response, dialect='excel-tab')
for item in query_set:
writer.writerow(smart_str(item))
return response

django escapejs and simplejson

I'm trying to encode a Python array into json using simplejson.dumps:
In [30]: s1 = ['test', '<script>']
In [31]: simplejson.dumps(s1)
Out[31]: '["test", "<script>"]'
Works fine.
But I want to escape the strings first (using escapejs from Django) before calling simplejson.dumps:
In [35]: s_esc
Out[35]: [u'test', u'\\u003Cscript\\u003E']
In [36]: print simplejson.dumps(s_esc)
["test", "\\u003Cscript\\u003E"]
My problem is: I want the escaped string to be: ["test", "\u003Cscript\u003E"] instead of ["test", "\\u003Cscript\\u003E"]
I can use replace:
In [37]: print simplejson.dumps(s_esc).replace('\\\\', '\\')
["test", "\u003Cscript\u003E"]
But is this a good approach? I just want to escape the strings first before encoding them to json. So there will be no syntax errors when I use them in template.
Thanks. :)
simplejson 2.1.0 and later include a JSONEncoderForHTML encoder that does exactly what you want. To use it in your example:
>>> s1 = ['test', '<script>']
>>> simplejson.dumps(s1, cls=simplejson.encoder.JSONEncoderForHTML)
'["test", "\\u003cscript\\u003e"]'
I ran into this recently where I didn't have control over the code that was generating the data structures, so I couldn't escape the strings as they were being assembled. JSONEncoderForHTML solved the problem neatly at the point of output.
Of course, you'll need to have simplejson 2.1.0 or later. (Django used to come with an older version, and Django 1.5 deprecated django.utils.simplejson entirely.) If you can't upgrade for some reason, the JSONEncoderForHTML code is relatively small and could probably be pulled into earlier code or used with Python 2.6+'s json package -- though I haven't tried this myself
You're doing the operations in the wrong order. You should dump your data to a JSON string, then escape that string. You can do the escaping with the addslashes Django filter.

How can i add bengali font in a django web application

I want to create a quiz web application using django where the quiz question will be in Bengali. How can i generate the bengali font in my desired site.
Please if anyone tell me .. i will be grateful to him
Django is not involved in font rendering on the browser. Embedding a particular font via CSS is still problematical until today. You may find answers at How to embed fonts in HTML?.
In Django, you specify only the character codes. You may have to use unicode escape sequences in strings, unless you can enter Bengali characters directly. Bengali characters reside in the unicode range U+0981 to U+09FA. I assume that your target audience will have glyphs installed for those characters, so there may be no need to provide an embedded font at all.
You can use the following script to display a list of the defined Bengali unicode characters.
import sys
import unicodedata as ucd
try:
chr = unichr
except NameError:
# Python 3
unicode = str
def filter_bengali():
for i in range(256, sys.maxunicode + 1):
try:
name = ucd.name(chr(i))
if 'BENGALI' in name:
print(unicode('U+{0:04X} {1:<60} {2}').format(i, name, chr(i)))
except ValueError:
pass
if __name__ == '__main__':
filter_bengali()