German text not proper on PDF created by Libharu C++

German text not proper on PDF created by Libharu C++ - c++

I have to write German text on a pdf created by Libharu. I assign German Text to a string variable (i.e. std::string TestString = "VariableGesamtlänge";) and then put that text to a pdf. My simple code is following:
//-----UTF8 Encoding
HPDF_UseUTFEncodings(pdf);
HPDF_SetCurrentEncoder(pdf, "UTF-8");
const char *fontname = HPDF_LoadTTFontFromFile(pdf, "FreeSans.ttf", HPDF_TRUE);
HPDF_Font font = HPDF_GetFont(pdf, fontname, "UTF-8");
HPDF_Page_SetFontAndSize(page, font, 24);
std::string TestString = "VariableGesamtlänge";
DrawText(page, font, TestString.c_str(), y);
Problem: I get two square boxes instead of ä. I am using VS2010.

'ä' is not an ASCII character. It may be stored as a single character (in which case, which one?), or it may be stored as multiple characters (in which case, which ones?).
You have told the HPDF functions that you are going to pass text around as UTF-8 (which is an entirely sensible choice). This means 'ä' is represented by 0xC3 0xA4.
The source file is almost certainly encoded in 8-bit text, using (probably) code-page 1252. So 'ä' will be the single character 0xE4. You either need to tell the compiler to store strings as UTF-8, or it may be possible to re-encode the source files in UTF-8.
Your final option is to store the text in a (UTF-8) file, and read it from there.

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.

You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.

TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.

How to use extended character set in reading ini file? (C++ lang.)

I face one little problem. I am from country that uses extended character set in language (specifically Latin Extended-A due to characters like š,č,ť,ý,á,...).
I have ini file containing these characters and I would like to read them into program. Unfortunatelly, it is not working with getPrivateProfileStringW or ...A.
Here is part of source code. I hope it will help someone to find solution, because I am getting a little desperate. :-)
SOURCE CODE:
wchar_t pcMyExtendedString[200];
GetPrivateProfileStringA(
"CATEGORY_NAME",
"SECTION_NAME",
"error",
pcMyExtendedString,
200,
PATH_TO_INI_FILE
);
INI FILE:
[CATEGORY_NAME]
SECTION_NAME= ľščťžýáíé
Characters ý,á,í,é are readed correctly - they are from character set Latin-1 Supplement. Their hexa values are correct (0xFD, 0xE1, 0xED,...).
Characters ľ,š,č,ť,ž are readed incorrectly - they are from character set Latin Extended-A Their hexa values are incorrect (0xBE, 0x9A, 0xE8,...). Expected are values like 0x013E, 0x0161, 0x010D, ...
How could be this done? Is it possible or should I avoid these characters at all?

GetPrivateProfileString doesn't do any character conversion. If the call succeed, it will gives you exactly what is in the file.
Since you want to have unicode characters, your file is probably in UTF-8 or UTF-16. If your file is UTF-8, you should be able to read it with GetPrivateProfileStringA, but it will give you a char array that will contain the correct UTF-8 characters (that is, not 0x013E, because 0x013E is not UTF-8).
If your file is UTF-16, then GetPrivateProfileStringW should work, and give you the UTF-16 codes (0x013E, 0x0161, 0x010D, ...) in a wchar_t array.
Edit: Actually your file is encoded in Windows-1250. This is a single byte encoding, so GetPrivateProfileStringA works fine, and you can convert it to UTF-16 if you want by using MultiByteToWideChar with 1250 as code page parameter.

Try saving the file in UTF-8 - CodePage 65001 encoding, most likely your file would be in Western European (Windows) - CodePage 1252.

German characters are displayed incorrectly with SetDlgItemText

I read a german text from an sqlite database with C++, (the text looks good with the database viewer). But when I display it in a dialog with SetDlgItemText the text looks like this (see the picture).
CString strWarning(pStmt->GetColumnCString(nCol));
SetDlgItemText(IDC_WARNING_MESSAGE, strWarning);

Your string looks like it's encoded as UTF-8, which Windows doesn't handle.
You'll need to convert it to UTF-16 and ensure that you're calling the wide version of SetDlgItemText, either by changing your project's character set option to Use Unicode Character Set or specifying SetDlgItemTextW.
You can convert your string from UTF-8 to UTF-16 with the MultiByteToWideChar function.

Read Chinese Characters in Dicom Files

I have just started to get a feel of Dicom standard. I am trying to write a small program, that would read a dicom file and dump the information to a text file. I have a dataset that has the patient names in Chinese. How can I read and store these names?
Currently, I am reading the names as Char* from the dicom file, converting this char* to wchar* using code page "950" for Chinese and writing to a text file. Instead of seeing Chinese characters I see * ? and % in my text file. What am I missing?
I am working in C++ on Windows.

If the text file contains UTF-16, have you included a BOM?

There may be multiple issues at hand.
First, do you know the character encoding of the Chinese name, e.g. Big5 or GB*? See http://en.wikipedia.org/wiki/Chinese_character_encoding
Second, do you know the encoding of your output text file? If it is ascii, then you probably won't ever be able to view the Chinese characters. In which case, I would suggest changing it to unicode (i.e. UTF-8).
Then, when you read the Chinese name, convert the raw bytes and write out the result. For example, if the DICOM stores it in Big5, and your text file is UTF-8, you will need a Big5->UTF-8 converter.

Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

Problem is categorized in two steps:
Problem Step 1. Access 97 db containing XML strings that are encoded in UTF-8.
The problem boils down to this: the Access 97 db contains XML strings that are encoded in UTF-8. So I created a patch tool for separate conversion for the XML strings from UTF-8 to Unicode. In order to covert UTF8 string to Unicode, I have used function
MultiByteToWideChar(CP_UTF8, 0, PChar(OriginalName), -1, #newName, Size);.(where newName is array as declared "newName : Array[0..2048] of WideChar;" ).
This function works good on most of the cases, I have checked it with Spainsh, Arabic, characters. but I am working on Greek and Chineese Characters it is choking.
For some greek characters like "Î•Ï…Î³. ÎšÎ±ÏÎ±Î²Î¹Î¬" (as stored in Access-97), the resultant new string contains null charaters in between, and when it is stored to wide-string the characters are getting clipped.
For some chineese characters like "?Â¢Â»?Âµ?"(as stored in Access-97), the result is totally absurd like "?¢»?µ?".
Problem Step 2. Access 97 db Text Strings, Application GUI takes unicode input and saved in Access-97
First I checked with Arabic and Spainish Characters, it seems then that no explicit characters encoding is required. But again the problem comes with greek and chineese characters.
I tried the above mentioned same function for the text conversion( Is It correct???), the result was again disspointing. The Spainsh characters which are ok with out conversion, get unicode character either lost or converted to regular Ascii Alphabets.
The Greek and Chineese characters shows similar behaviour as mentined in step 1.
Please guide me. Am I taking the right approach? Is there some other way around???
Well Right now I am confused and full of Questions :)

There is no special requirement for working with Greek characters. The real problem is that the characters were stored in an encoding that Access doesn't recognize in the first place. When the application stored the UTF8 values in the database it tried to convert every single byte to the equivalent byte in the database's codepage. Every character that had no correspondence in that encoding was replaced with ? That may mean that the Greek text is OK, while the chinese text may be gone.
In order to convert the data to something readable you have to know the codepage they are stored in. Using this you can get the actual bytes and then convert them to Unicode.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

German text not proper on PDF created by Libharu C++ - c++

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

How to use extended character set in reading ini file? (C++ lang.)

German characters are displayed incorrectly with SetDlgItemText

Read Chinese Characters in Dicom Files

Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

Categories

Resources