Ascii decoding error
Text = "Hanuman (Sanskrit: हनुमान्, Hanumān), a Hindu deity who was an ardent devotee of Rama according to Hindus legends, and a central character in the Indian epic Ramayana."
I saved the text into MYSQL table to novarchar column, it inserts successfully.
when i retrieve this data in console, it is displaying correctly. But when i tried to retrieve it via django and display it in template ,it is showing as some ascii characters.
Displaying as "Hanuman (Sanskrit: हनà¥à¤®à¤¾à¤¨à¥, HanumÄn), is a Hindu deity who is an ardent devotee of Rama, a central character in the Indian epic Ramayana."
I guess you miss the content type meta tag in your template:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Related
I have a html file which structured like this:
<!doctype html public "-//w3c//dtd html 4.0transitional//en">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Author" content="ERA">
<LINK REL=STYLESHEET TYPE="text/css" HREF="Style_Sheets/ERA_Internet_Printer.css">
</head>
<body>
<pre>
<font face="courier new" size=-4> 14V-IG-TEST-DATA - SERVC - EXEC# 4515
[11| Blubb,abcons, Port: 18 For: abcons
For period : GE 08/04/18 AND LE 11/04/18 OR GE 11/04/18 AND LE 11/05/18
01:45:40 11-04-18 - Page # 1
Serial#........................ 564561215
Make Desc...................... VW
Carline........................ MUX
Year........................... 2015
Cust# ........................ 512
License#....................... 78365HH
Open RO........................ R25625
EOR............................ EOR
Serial#........................ 2151512315
Make Desc...................... VOLKSWAGEN
Carline........................ VOLKSWAGEN
Year........................... 2017
Cust# ........................ 552
License#....................... DPA2151
Open RO........................ T52165
EOR............................ EOR
2 records listed.
</pre>
</body>
</html>
I want to get the Information out of the file like "Key.......... Value".
So I've created a custom classifier in AWS Glue with Grok to get the Info.
The classifier is configured like this:
Custom Classifier
So the Grok Pattern is configured as followed:
%{KEY:mykey}%{GREEDYDATA:myvalue}
with the custom Pattern:
KEY ([a-zA-Z# 1-9]+\.+ )
Every Grok online debugger (like https://grokdebug.herokuapp.com/) get the information out of the data structure with this configuration. But when I start the crawler in Glue with the custom classifier, it won't find any tables or structures.
What am I doing wrong?
I think you're running into the problem I answered here: https://github.com/aws-samples/aws-glue-samples/issues/4
There's a buried sentence in AWS documentation that states "To reclassify data to correct an incorrect classifier, create a new crawler with the updated classifier"
Simply updating the classifier and re-running the crawler will not use the updated classifier.
I am creating a ColdFusion page with some Japanese characters. I included the following in the top of the page.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
If I explicitly include Japanese characters in the output, they look fine. However, if I output them using, say:
<cfoutput>#variables.TitleInJapanese#</cfoutput>
The output is garbled as though the encoding is not recognized. I have tried <cfcontent> and <cfprocessingdirective> tags to no avail.
If I open the .cfm source file, the Japanese characters that are assigned to the variables look as they should in my text editor. It's the content that is generated using <cfoutput> that is giving me trouble. Any suggestions would be welcome. Thanks!
Correction: The page I have created will not display any Japanese characters, explicit or referenced. However, other files using <cfinclude> within the page that have Japanese characters render just fine.
I am trying to parse a russian website using lxml. However ,I got an issue with displaying russian characters, that i am unable to overcome myself.
Let's take this html piece for example:
Квест в реальности «Карты, деньги, два стола»
I am using this piece to parse it:
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].encode('utf-8').strip()
and this is what i get:
├É┬Ü├É┬▓├É┬Á├Ĺ┬ü├Ĺ┬é ├É┬▓ ├Ĺ┬Ç├É┬Á├É┬░├É┬╗├Ĺ┬î├É┬Ż├É┬ż├Ĺ┬ü├Ĺ┬é├É┬Ş ├é┬ź├É┬Ü├É┬░├Ĺ┬Ç├Ĺ┬é├Ĺ┬ő, ├É┬┤├É┬Á├É┬Ż├Ĺ┬î├É┬│├É┬Ş, ├É┬┤├É┬▓├É┬░ ├Ĺ┬ü├Ĺ┬é├É┬ż├É┬╗├É┬░├é┬╗
In database however instead of cyrillic i see this:
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
Oh and btw for reference:
this piece:
title = item.xpath('div[3]/div[2]/a')[0]
print etree.tostring(title)
returns me this :
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
Not sure if it is database related of something to do with python encoding. Any help appreciated :)
Thanks in advance.
EDIT: i am using MySQL and Django ORM
Django settings:
DATABASE_OPTIONS = {
"charset": "utf8_general_ci",
"init_command": "SET storage_engine=INNODB"
}
Webpage :
<!DOCTYPE html>
<html lang="en" prefix="og: http://ogp.me/ns#" class="">
<head>
<title>Интересные события в Москве в январе - феврале 2016</title>
<meta charset="utf-8">
Cyrillic code page does not exist/not setted up on your server. So you can`t view russian characters in terminal even in UTF-8. But python stil work with unicode properly.
By this command:
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].encode('utf-8').strip()
you get unicode string and encode it to bytes (str in python2). And save bytes in database.
When you load string from database python uses default code page (probably Latin-1) and you get this:
ÐвеÑÑ Ð² ÑеалÑноÑÑи «ÐаÑÑÑ, денÑги, два ÑÑола»
So, you should store unicode string in database (don't use encode)
title = root.xpath('//*[#id="event-id-41600"]/div[3]/div[2]/a/text()')[0].strip()
P.S. I don't understand how encode('Latin-1') helps (from comments), but problem is solved :)
I'm using a globalized Kendo template in which I globalize the title of the button:
-bunch of redundant code deleted-
class="btn-ico del" title="#Resources.AdminResources.DeleteStr">
This works fine in English, Italian, Japanese and Polish however in German the word for delete happens to have an umlaut (Löschen) and I get the following error:
Uncaught Error: Invalid template:'
This is how the browser renders it:
class="btn-ico del" title="L&';246;schen"
By default I have
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
set in my template; changing it to the iso-8851-1 encoding did not work.
Temporarily I changed Löschen to Loeschen but that is not elegant.
Kendo encodes the character and places a hash (#), just replace all hashes with escaped hashes
In C# I would do.. .Replace("#", "\#" )
My django code works in chrome and firefox but in IE the webpage displays unreadable charactars. The following is my code setting:
DEFAULT_CHARSET = 'utf8'
FILE_CHARSET = 'utf8'
and the template files are saved as utf8 format, but my template file has some other language besides english. That non-english part is not readable.
Should I change some setting of django ? Most of the visitors of my website may use IE, so this is a big problem. Any suggestions?
did you add this meta to your base html?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>