I want to write some Turkish character to pdf with reportlab.
i used fallowing code to do this.
c = Canvas("test.pdf")
data="ğçİöşü"
p = Paragraph(data.decode('utf-8'), style=styNormal)
but it doesn’t show my data at pdf.
ouput:
■ç■ö■ü
As explained in this answer to a similar question, you need to use a font which supports your characters.
In short, try this:
pdfmetrics.registerFont(TTFont('Verdana', 'Verdana.ttf'))
c.setFont("Verdana", 8)
Make sure your file is UTF-8 encoded and I'd also recommend making sure the data variable is UTF-8 by doing
data = u"ğçİöşü"
Related
I am having trouble converting '\xc3\xd8\xe8\xa7\xc3\xb4\xd' (which is a Thai text) to a readable format. I get this value from a smart card, and it basically was working for Windows but not in Linux.
If I print in my Python console, I get:
����ô
I tried to follow some google hints but I am unable to accomplish my goal.
Any suggestion is appreciated.
Your text does not seem to be a Unicode text. Instead, it looks like it is in one of Thai encodings. Hence, you must know the encoding before printing the text.
For example, if we assume your data is encoded in TIS-620 (and the last character is \xd2 instead of \xd) then it will be "รุ่งรดา".
To work with the non-Unicode strings in Python, you may try: myString.decode("tis-620") or even sys.setdefaultencoding("tis-620")
Currently I am using libharu to create pdf file. In the file I have some Japanese characters and they are saving as utf-8 first.
After that, I am using HPDF_UseJPEncodings(m_pdf), HPDF_UseJPFonts(m_pdf) and m_fontStandard = HPDF_GetFont(m_pdf, "MS-Mincho", "90msp-RKSJ-H") to encode.
However, 90msp-RKSJ-H is cmap and not for utf-8, does anyone know how to convert utf-8 to cmap for 90msp-RKSJ-H?
Thank you
Hey why dont you refer Libharu's support group on google.
Here's the link for the code you want.
https://groups.google.com/forum/?fromgroups=#!topic/libharu/YzXoH_K3OAI
I hope it solves your purpose.
If I have a string in the form:

What is the best regex I can use to parse these elements in an array? (so I can write away the correct image)
update: i understand base64 encoding but the question is actually how to parse these kind of embedded icons in webpages. since i dont know if people are using e.g. base62 ... or other image strings or even other formats to embed images. etc... i also see examples in pages where the identifier is image/x-icon but he string actually contains a png.
UPDATE just some giveback to share the code where I used this: http://plugins.svn.wordpress.org/wp-favicons/trunk/filters/search/filter_extract_from_page.php
Though I still have some questions e.g. IF only base64 is used etc... but time will tell in practice.
Can you see the base64 at the beginning? You don't need regex. You need to decode this base64 string into a byte stream and then save it as an image.
I have now saved the following text into a file icon.txt:
iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABmJLR0QAAAAAAAD5Q7t
/AAAA2UlEQVQ4y8WSvQvCMBDFX2rFUvuFSAUFBQfBwUXQVfFfFpzdRV2c7O5UKmihX9E6RZo2pXbyTbmX3C+5uwD
/FskG+76WsvX65n
/3Lm0pdU214HOAbHIWwvzeYPL1p4cT4QCi5DIxEINIdWt+Hs9cXAtg3UOkIJAUpT5ADiho8kbD0NG0LB6Q76xIevwCpW+0bBvj7Y5wgCpI148RBxTmYo7Z1RGPkSk
/kc4jgme0oHoJlmFUOC+8lUEMN0ASvyBpGha++IXCJrJyKJGhjIalyZVyNqufP9j
/9AH0S0vqrU+YMgAAAABJRU5ErkJggg==
And processed:
base64 -d icon.txt > icon.png
and it shows a red heart icon, 16x16 pixels.
This is the way you can decode it in the command line. Most programming languages offer good libraries to decode it directly in your program.
EDIT: If you use PHP, then have a look at base64_decode().
I'm writing unit tests for a model with an attribute that's interpreted as markdown. I'd like to test that if the markdown is invalid, then the object is invalid - but it's such a forgiving syntax that everything I've tried so far turns out to be valid markdown! What's an example of some invalid markdown?
I haven't used markdown extensively but i was under the impression that it is impossible to write "invalid" markdown only markdown that wont do what you want it to. As in instead of throwing an error when it doesn't know what to do it just treats it as plain text.
On a different path one could probably write a script to try and identify things that the user probably didn't intend, for example if someone entered **test* they probably intended *test* or **test**
All strings are valid markdown.
If all text is markdown and vice versa, then I suppose one example of invalid markdown would be invalid text in the encoding that you are using, i.e. invalid UTF-8, invalid ASCII or invalid ISO-8859-1.
I'm using libcurl to fetch some HTML pages.
The HTML pages contain some character references like: סלקום
When I read this using libxml2 I'm getting: ׳₪׳¨׳˜׳ ׳¨
is it the ISO-8859-1 encoding?
If so, how do I convert it to UTF-8 to get the correct word.
Thanks
EDIT: I got the solution, MSalters was right, libxml2 does use UTF-8.
I added this to eclipse.ini
-Dfile.encoding=utf-8
and finally I got Hebrew characters on my Eclipse console.
Thanks
Have you seen the libxml2 page on i18n ? It explains how libxml2 solves these problems.
You will get a ס from libxml2. However, you said that you get something like ׳₪׳¨׳˜׳ ׳¨. Why do you think that you got that? You get an XMLchar*. How did you convert that pointer into the string above? Did you perhaps use a debugger? Does that debugger know how to render a XMLchar* ? My bet is that the XMLchar* is correct, but you used a debugger that cannot render the Unicode in a XMLchar*
To answer your last question, a XMLchar* is already UTF-8 and needs no further conversion.
No. Those entities correspond t the decimal value of the Unicode sequence number of your characters. See this page for example.
You can therefore store your Unicode values as integers and use an algorithm to transform those integers to an UTF-8 multibyte character. See UTF-8 specification for this.
This answer was given in the assumpltion that the encoded text is returned as UTF-16, which as it turns out, isn't the case.
I would guess the encoding is UTF-16 or UCS2. Specify this as input for iconv. There might also be an endian issue, have a look here
The c-style way would be (no checking for clarity):
iconv_t ic = iconv_open("UCS-2", "UTF-8");
iconv(ic, myUCS2_Text, inputSize, myUTF8-Text, outputSize);
iconv_close(ic);