In a C++ program, I am trying to read data from MSSQL database using OLE DB. The column I am trying to read is a VARCHAR type. The data in the column is imported from a multi-value database. Sometimes the data in the column has a delimiter in it. The delimiter is a value marker (0Xfd). I covert the data read from the table to a char * like this:
retcode = WideCharToMultiByte(CP_UTF8, 0, (WCHAR*)pDBColumnAccess[nCol].pData, -1, (char *)pReadBuf, pDBColumnAccess[nCol].cbDataLen, NULL, NULL);
Everything is fine if the data does not contain above mentioned delimiter - value marker (0xfd). But when the delimiter is there, in the converted data the value marker is replaced by some junk characters.
Shouldn't I do a conversion to char * in the case of VARCHAR? Is it enough to just copy the data as it is without any coversion?
The WideCharToMultiByte converts from UTF-16, yet there is no such thing as 0xFD character in UTF-16. All characters are encoded as at least 2 bytes. Did you actually mean 0x00FD (or even 0xFD00)?
Also, UTF-8 (your "target" encoding since you specified CP_UTF8) does not guarantee that all characters will be encoded in just one byte.
According to UTF Converter:
UTF-16 00FD converts to UTF-8 C3 BD.
UTF-16 FD00 converts to UTF-8 EF B4 80.
Is that what you are getting?
Related
I have a web server developed in C++. In this web server, the data is received from the client side and stored in the database.
Some of this data is in Persian, which is converted to Unicode UTF-8 format.
as example:
data string is "سلام" in client side
when i get data, in webserver
"D8%B3%D9%84%D8%A7%D9%85"
I want to convert UTF-8 Code to c++ string, How can I do this conversion?
Your string is not UTF-8 encoded but uses a custom encoding similiar to HTTP url query params.
% indicates that the next two characters encode a single byte in hex. You will need to parse for % and if you encounter such a character, interpret the next two characters as a hexadecimal encoded byte. Otherwise you just copy the characters/bytes over.
Is there any way to detect std::string encoding?
My problem: I have an external web services which give data in different encodings. Also I have a library witch parse that data and store it in std::string. Than I want to display data in Qt GUI. The problem is that std::string can have different encodings. Some string can be converted using QString::fromAscii(), some QString::fromUtf8().
I haven't looked into it but I did use some Qt3.3 in the past.
ASCII vs Unicode + UTF-8
Utf8 is 8-bit, ascii 7-bit. I guess you can try to look into the values of string array
http://doc.qt.digia.com/3.3/qstring.html#ascii and http://doc.qt.digia.com/3.3/qstring.html#utf8
it seems ascii returns an 8-bit ASCII representation of the string, still I think it should have values from 0 to 127 or something like that. you must compare more characters in the string.
I'm trying to insert characters above ascii range 128 using a C++ program(characters are Ø, Å). It's working fine for ascii characters less than 128
Data type used in database is VARCHAR2
Those characters are inserted as question marks (????) to DB
If I set field value in DB with those characters through Toad and try to read using application they were read as question marks(????)
Can someone please give me an example code to how to insert strings which contains those characters(ascii value above 128).
I think problem with data type conversion. (Because in application level before insert to DB those characters display correctly.Also If I set field value through Toad and read from DB they are read as Question Marks. I can set field value in DB means DB column can hold those characters)
I'm using following to Define and Bind methods in my application
OCIDefineByPos(p_sql, &p_dfn, p_DBCon->p_err, iPos,
(dvoid*)p->un_DataArray.pzValue, (sword)iSize, SQLT_STR, (dvoid*)p->un_FlagArray.pssValue, 0, 0,
OCI_DEFAULT);
OCIBindByName(p_sql, &p_bnd, p_DBCon->p_err, (text *) zName,
-1, (dvoid *) zValue, iSize, SQLT_STR, 0, 0, 0, 0, 0, OCI_DEFAULT);
Can someone help me
Or If you have some sample program that can insert ascii values up to 256 please share it with me
WE8MSWIN1252 correspond to "MS Windows Code Page 1252 8-bit West European".
You have to convert your strings to the Windows Code Page 1252 before inserting them in the db.
For instance, on Windows, if your strings are in utf8 then convert them to utf16 with MultiByteToWideChar and then back to CodePage1252 using WideCharToMultiByte
This may be because, there ARE no ascii characters above 128. ASCII is a 7-bit encoding.
In order to add non-ascii characters (there are no Ø, Å in ASCII), you'll need to use a different encoding to put them in. Most sane applications nowadays use utf8.
Problem is categorized in two steps:
Problem Step 1. Access 97 db containing XML strings that are encoded in UTF-8.
The problem boils down to this: the Access 97 db contains XML strings that are encoded in UTF-8. So I created a patch tool for separate conversion for the XML strings from UTF-8 to Unicode. In order to covert UTF8 string to Unicode, I have used function
MultiByteToWideChar(CP_UTF8, 0, PChar(OriginalName), -1, #newName, Size);.(where newName is array as declared "newName : Array[0..2048] of WideChar;" ).
This function works good on most of the cases, I have checked it with Spainsh, Arabic, characters. but I am working on Greek and Chineese Characters it is choking.
For some greek characters like "Ευγ. ΚαÏαβιά" (as stored in Access-97), the resultant new string contains null charaters in between, and when it is stored to wide-string the characters are getting clipped.
For some chineese characters like "?¢»?µ?"(as stored in Access-97), the result is totally absurd like "?¢»?µ?".
Problem Step 2. Access 97 db Text Strings, Application GUI takes unicode input and saved in Access-97
First I checked with Arabic and Spainish Characters, it seems then that no explicit characters encoding is required. But again the problem comes with greek and chineese characters.
I tried the above mentioned same function for the text conversion( Is It correct???), the result was again disspointing. The Spainsh characters which are ok with out conversion, get unicode character either lost or converted to regular Ascii Alphabets.
The Greek and Chineese characters shows similar behaviour as mentined in step 1.
Please guide me. Am I taking the right approach? Is there some other way around???
Well Right now I am confused and full of Questions :)
There is no special requirement for working with Greek characters. The real problem is that the characters were stored in an encoding that Access doesn't recognize in the first place. When the application stored the UTF8 values in the database it tried to convert every single byte to the equivalent byte in the database's codepage. Every character that had no correspondence in that encoding was replaced with ? That may mean that the Greek text is OK, while the chinese text may be gone.
In order to convert the data to something readable you have to know the codepage they are stored in. Using this you can get the actual bytes and then convert them to Unicode.
I have a encoded character buffer array of size 512 in C, and a database field of varchar in MySQL. Is it possible to store the encoded character buffer into varchar?
I have tried this, but the problem which I face is that it only stores the limited area of the buffer into the database and ignore. What is the actual problem, and how do I solve this problem?
It is not clear what you mean by encoded.
If you mean that you have an arbitrary string of byte values, then varchar is a bad fit because it will attempt to trim trailing spaces. A better choice in such cases is to use varbinary fields.
If the string you are inserting contains control characters, you might be best converting that into a hex string and inserting it like follows:
create table xx (
v varbinary(512) not null );
insert into xx values ( 0x68656C6C6F20776F726C64);
This will prevent any component in the tool chain from choking on NUL characters and so forth.
What size is your varchar declared for the table?
Often varchar fields are set to 255 bytes, not characters. Starting with MySQL 5.0.3 you can have longer varchar fields.
Sounds like you need a varchar(512) field, is that what you have?
See http://dev.mysql.com/doc/refman/5.0/en/char.html