My django code works in chrome and firefox but in IE the webpage displays unreadable charactars. The following is my code setting:
DEFAULT_CHARSET = 'utf8'
FILE_CHARSET = 'utf8'
and the template files are saved as utf8 format, but my template file has some other language besides english. That non-english part is not readable.
Should I change some setting of django ? Most of the visitors of my website may use IE, so this is a big problem. Any suggestions?
did you add this meta to your base html?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
Related
I am creating a ColdFusion page with some Japanese characters. I included the following in the top of the page.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If I explicitly include Japanese characters in the output, they look fine. However, if I output them using, say:
<cfoutput>#variables.TitleInJapanese#</cfoutput>
The output is garbled as though the encoding is not recognized. I have tried <cfcontent> and <cfprocessingdirective> tags to no avail.
If I open the .cfm source file, the Japanese characters that are assigned to the variables look as they should in my text editor. It's the content that is generated using <cfoutput> that is giving me trouble. Any suggestions would be welcome. Thanks!
Correction: The page I have created will not display any Japanese characters, explicit or referenced. However, other files using <cfinclude> within the page that have Japanese characters render just fine.
I am trying to parse a russian website using lxml. However ,I got an issue with displaying russian characters, that i am unable to overcome myself.
Let's take this html piece for example:
Квест в реальности «Карты, деньги, два стола»
I am using this piece to parse it:
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].encode('utf-8').strip()
and this is what i get:
├É┬Ü├É┬▓├É┬Á├Ĺ┬ü├Ĺ┬é ├É┬▓ ├Ĺ┬Ç├É┬Á├É┬░├É┬╗├Ĺ┬î├É┬Ż├É┬ż├Ĺ┬ü├Ĺ┬é├É┬Ş ├é┬ź├É┬Ü├É┬░├Ĺ┬Ç├Ĺ┬é├Ĺ┬ő, ├É┬┤├É┬Á├É┬Ż├Ĺ┬î├É┬│├É┬Ş, ├É┬┤├É┬▓├É┬░ ├Ĺ┬ü├Ĺ┬é├É┬ż├É┬╗├É┬░├é┬╗
In database however instead of cyrillic i see this:
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
Oh and btw for reference:
this piece:
title = item.xpath('div[3]/div[2]/a')[0]
print etree.tostring(title)
returns me this :
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
Not sure if it is database related of something to do with python encoding. Any help appreciated :)
Thanks in advance.
EDIT: i am using MySQL and Django ORM
Django settings:
DATABASE_OPTIONS = {
"charset": "utf8_general_ci",
"init_command": "SET storage_engine=INNODB"
}
Webpage :
<!DOCTYPE html>
<html lang="en" prefix="og: http://ogp.me/ns#" class="">
<head>
<title>Интересные события в Москве в январе - феврале 2016</title>
<meta charset="utf-8">
Cyrillic code page does not exist/not setted up on your server. So you can`t view russian characters in terminal even in UTF-8. But python stil work with unicode properly.
By this command:
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].encode('utf-8').strip()
you get unicode string and encode it to bytes (str in python2). And save bytes in database.
When you load string from database python uses default code page (probably Latin-1) and you get this:
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
So, you should store unicode string in database (don't use encode)
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].strip()
P.S. I don't understand how encode('Latin-1') helps (from comments), but problem is solved :)
I'm using a globalized Kendo template in which I globalize the title of the button:
-bunch of redundant code deleted-
class="btn-ico del" title="#Resources.AdminResources.DeleteStr">
This works fine in English, Italian, Japanese and Polish however in German the word for delete happens to have an umlaut (Löschen) and I get the following error:
Uncaught Error: Invalid template:'
This is how the browser renders it:
class="btn-ico del" title="L&';246;schen"
By default I have
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
set in my template; changing it to the iso-8851-1 encoding did not work.
Temporarily I changed Löschen to Loeschen but that is not elegant.
Kendo encodes the character and places a hash (#), just replace all hashes with escaped hashes
In C# I would do.. .Replace("#", "\#" )
Ascii decoding error
Text = "Hanuman (Sanskrit: हनुमान्, Hanumān), a Hindu deity who was an ardent devotee of Rama according to Hindus legends, and a central character in the Indian epic Ramayana."
I saved the text into MYSQL table to novarchar column, it inserts successfully.
when i retrieve this data in console, it is displaying correctly. But when i tried to retrieve it via django and display it in template ,it is showing as some ascii characters.
Displaying as "Hanuman (Sanskrit: हनà¥à¤®à¤¾à¤¨à¥, HanumÄn), is a Hindu deity who is an ardent devotee of Rama, a central character in the Indian epic Ramayana."
I guess you miss the content type meta tag in your template:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
I'm interested in using pugixml to parse HTML documents, but HTML has some optional closing tags. Here is an example: <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
Pugixml stops reading the HTML as soon as it encounters a tag that's not closed, but in HTML missing a closing tag does not necessarily mean that there is a start-end tag mismatch.
A simple test of parsing the HTML documentation of pugixml fails because the meta tag is the second line of the HTML document: http://pugixml.googlecode.com/svn/tags/latest/docs/quickstart.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>pugixml 1.0</title>
<link rel="stylesheet" href="pugixml.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
<link rel="home" href="quickstart.html" title="pugixml 1.0">
</head>
<!--- etc... -->
A lot of HTML documents in the wild would fail if I try to parse them with pugixml. Is there a way to avoid that? If there is no way to "fix" that, then is there another HTML parsing tool that's as about as fast as pugixml?
Update
It would also be great if the HTML parser also supports XPATH.
I ended up taking pugixml, converting it into an HTML parser and I created a github project for it: https://github.com/rofldev/pugihtml
For now it's not fully compliant with the HTML specifications, but it does a decent enough job at parsing HTML that I can use it. I'm working on making it compliant with the HTML specifications.
One way to address this is to do some pre-processing that converts the HTML to XHTML, then it would "officially" be considered XML and usable with XML tools. If you want to go that route, see this question: How to convert HTML to XHTML?