how to detect character encoding between two operating systems?

how to detect character encoding between two operating systems? - c++

I'm writing an application for BlackBerry 10 using z10 and normal (qt), not cascades. When I read a string from the virtual keyboard, using the keypressevent() function, I convert the first character from the QKeyEveny::text() to a std char. On the device if I do qDebug() «(int) converted_char characters like r are represented by 114. But on my laptop it is 40(i believe) .
Hence if I send a char from the phone to the laptop using tcp, the correct character is never printed, how can I fix this, what is going on?

Related

Write to the input of process strings received by socket

I have an application on the Windows platform that receives remote commands from applications running on the Linux platform.
The Linux applications are experiencing difficulties accessing directories or files that contain accented characters, they send the command to access such files/directories and the return is always: "directory/file not found".
I think the two applications are with different code page, I venture to say this because I previously had problems in linux applications, the directories and files with accented words came with strange symbols in std::cout, and after I added SetConsoleOutputCP (CP_UTF8) in the windows application the problem was solved, and finally the paths containing accents were readable, does this mean that the linux application has code page 65001? Anyway, the problem when sending strings containing the path to the directories/files still persists, whenever the linux application tries to access paths containing accented words it fails.
I'll try to show how the two applications communicate.
Windows Side:
In short, this is the part where the client receives the message from the linux application, and then writes in the process what was received. In this part when writing paths containing accented characters the application returns in the output that is not possible to find them.
BYTE buffer[4096];
DWORD BytesWritten;
int ret = SSL_read(stI->ssl, (char*)buffer, sizeof(buffer));
if (ret <= 0)
break;
if(!WriteFile(stI->hStdIn, buffer, ret, &BytesWritten, NULL))
break;
And then it reads the output of the process and sends the content to the Linux application.
BYTE buffer[4096];
DWORD BytesAvailable, BytesRead;
if (!ReadFile(stI->hStdOut, buffer, min(sizeof(buffer), BytesAvailable), &BytesRead, NULL))
break;
ret = SSL_write(stI->ssl, (char*)buffer, BytesAvailable);
if (ret <= 0)
break;
Linux Side:
This part is very basic, the application reads a user input and then sends it to the windows application.
std::string inputBuffer;
ZH->console_input(inputBuffer, 33); // This function only controls the input and output of data with termios.
inputBuffer+='\n' // To simulate an enter in windows application
// Sends the typed path to the Windows application
SSL_write(session_data.ssl, inputBuffer.c_str(), strlen(inputBuffer.c_str()))
The part of receiving the data is basically the same as the windows application, it receives the data in a char variable and then print on the screen with std::cout.
The only difference is that the socket is set to NONBLOCK and I use the select function.
Any suggestions on how to solve this problem?

Your best bet is to use proper unicode encodings. Windows tends to use UTF-16 (uses 2 bytes to represent a character), Linux on the other hand uses UTF-8. This is typically uses a single byte per character for ASCII and escapes non ascii characters (\uxxxx where x represents a hex digit). If you do a proper conversion from Windows UTF-16 to UTF-8, things should work correctly.
C++11 and Boost do provide some Unicode support, but for gold standard support, take a look at ICU.
Sockets however just transmit bytes so they have nothing to do with Unicode conversions.

Can I decode € (euro sign) as a char and not as a wstring/wchar?

Let's try explain my problem. I have to receive a message from a server (programmed in delphi) and do some things with that message in the client side (which is the side I programm, in c++).
Let's say that the message is: "Hello €" that means that I have to work with std::wstring as €(euro sign) needs 2 bytes instead of 1 byte, so knowing that I have made all my work with wstrings and if I set the message it works fine. Now, I have to receive the real one from the server, and here comes the problem.
The person on the server side is sending that message as a string. He uses a EncodeString() function in delphi and he says that he is not gonna change it. So my question is: If I Decode that string into a string in c++, and then I convert it into a wstring, will it work? Or will I have problems and have other message on my string var instead of "Hello €".
If yes, if I can receive that string with no problem, then I have another problem. The function that I have to use to decode the string is void DecodeString(char *buffer, int length);
so normally if you receive a text, you do something like:
char Text[255];
DescodeString(Text, length); // length is a number decoded before
So... can I decode it with no problem and have in Text the "Hello €" message? with that I'll just need to convert it and get the wstring.
Thank you
EDIT:
I'll add another example. If i know that the server is going to send me always a text of length 30 max, in the server they do something like:
EncodeByte(lengthText);
EncodeString(text)
and in the client you do:
int length;
char myText[30];
DecodeByte(length);
DecodeString(myText,length);
and then, you can work with myText as a string lately.
Hope that helps a little more. I'm sorry for not having more information but I'm new in that work and I don't know much more about the server.
EDIT 2
Trying to summarize... The thing is that I have to receive a message and do something with it, with the tool I said I have to decode it. So as de DecodeString() needs a char and I need a wstring, I just need a way to get the data received by the server, decode it with decodeString() and get it into a wstring, but I don't really know if its possible, and if it is, I'm not sure about how to do it and what type of vars use to get it
EDIT 3
Finally! I know what code pages are using. Seems that the client uses the ANSI ones and that the server doesn't, so.. I'll have to tell to the person who does that part to change it to the ANSI ones. Thanks everybody for helping me with my big big ignorance about the existence of code pages.

Since you're using wstring, I guess that you are on Windows (wstring isn't popular on *nix).
If so, you need the Delphi app to send you UTF-16, which you can use in the wstring constructor. Example:
char* input = "\x0ac\x020"; // UTF-16 encoding for euro sign
wchar_t* input2 = reinterpret_cast<wchar_t*>(input);
wstring ws(input2);
If you're Linux/Mac, etc, you need to receive UTF-32.
This method is far from perfect though. There can be pitfalls and edge cases for unicodes beyond 0xffff (chinese, etc). Supporting that probably requires a PhD.

Windows usage of char * functions with UTF-16

I port one application from Linux to Windows.
On Linux I use libmagic library from which I wouldn't be glad to rid of on Windows.
The problem is that I need pass name of file that is held in UTF-16 encoding to such function:
int magic_load(magic_t cookie, const char *filename);
Unfortunately it accepts only const char *filename. My first idea was to convert UTF-16 string to local encoding, but there are some problems - like string can contain e.g. Chinese symbols and local encoding may be Russian.
As result we will get trash on the output and program will not reach its aim.
Converting into UTF-8 doesn't help either, because this is Windows and Windows holds file name in UTF-16.
But I somehow need make that function able to open file with Unicode name.
I came only to one very very bad solution:
1. I have a filename
2. I can copy file with unicode name to file with ASCII name like "1.mp3"
3. open it with libmagic functions and get what I want
4. remove temporarily file
But I understand how this solution is bad and how it could make my application slower, so I wonder, perhaps are there some better ways to do it?
Thanks in advance for any tips, 'cause I'm really confused with it.

Use 8.3 file names to access the files.
In addition to long file names up to 255 characters in length, Windows also generates an MS-DOS-compatible (short) file name in 8.3 format.
http://support.microsoft.com/kb/142982

Linux send unicode character to active application

Ok, so I'm trying to develop an app using C++ and Qt4 for Linux that will map certain key sequences to special Unicode characters. Also, I'm trying to make it bilingual, so the special Unicode character sent depends on the selected language. Example: AltGr+s will send ß or ș, depending whether German or Romanian is selected. On Windows, I have achieved this using AutoHotKey. However, I couldn't get IronAHK to work on Linux so I have written myself a nice Qt Application for it, using Qxt to register "global" shortcuts. I have tried this snippet:
void mainWnd::sendKeypress( unsigned int keycode )
{
Display *display = QX11Info::display();
Window curr_focus;
int revert_to;
XGetInputFocus( display, &curr_focus, &revert_to );
XTestFakeKeyEvent( display, keycode, true, 0 );
XTestFakeKeyEvent( display, keycode, false, 1 );
XFlush( display );
}
copied from another application(where it works), but here it seems to do nothing. Also, there might be a problem with the fact that the characters I'm trying to send aren't found on a US 101 Keyboard, that I currently use on my laptop(and as the layout in the OS).
So my question is: how do I make the app send a Unicode character to whichever app has focus, inserting a special character(sort of like KCharMap)? Remember, these are special characters which are not found on a normal US Keyboard. Thanks in advance.

Can't read unicode (japanese) from a file

Hi I have a file containing japanese text, saved as unicode file.
I need to read from the file and display the information to the stardard output.
I am using Visual studio 2008
int main()
{
wstring line;
wifstream myfile("D:\sample.txt"); //file containing japanese characters, saved as unicode file
//myfile.imbue(locale("Japanese_Japan"));
if(!myfile)
cout<<"While opening a file an error is encountered"<<endl;
else
cout << "File is successfully opened" << endl;
//wcout.imbue (locale("Japanese_Japan"));
while ( myfile.good() )
{
getline(myfile,line);
wcout << line << endl;
}
myfile.close();
system("PAUSE");
return 0;
}
This program generates some random output and I don't see any japanese text on the screen.

Oh boy. Welcome to the Fun, Fun world of character encodings.
The first thing you need to know is that your console is not unicode on windows. The only way you'll ever see Japanese characters in a console application is if you set your non-unicode (ANSI) locale to Japanese. Which will also make backslashes look like yen symbols and break paths containing european accented characters for programs using the ANSI Windows API (which was supposed to have been deprecated when Windows XP came around, but people still use to this day...)
So first thing you'll want to do is build a GUI program instead. But I'll leave that as an exercise to the interested reader.
Second, there are a lot of ways to represent text. You first need to figure out the encoding in use. Is is UTF-8? UTF-16 (and if so, little or big endian?) Shift-JIS? EUC-JP? You can only use a wstream to read directly if the file is in little-endian UTF-16. And even then you need to futz with its internal buffer. Anything other than UTF-16 and you'll get unreadable junk. And this is all only the case on Windows as well! Other OSes may have a different wstream representation. It's best not to use wstreams at all really.
So, let's assume it's not UTF-16 (for full generality). In this case you must read it as a char stream - not using a wstream. You must then convert this character string into UTF-16 (assuming you're using windows! Other OSes tend to use UTF-8 char*s). On windows this can be done with MultiByteToWideChar. Make sure you pass in the right code page value, and CP_ACP or CP_OEMCP are almost always the wrong answer.
Now, you may be wondering how to determine which code page (ie, character encoding) is correct. The short answer is you don't. There is no prima facie way of looking at a text string and saying which encoding it is. Sure, there may be hints - eg, if you see a byte order mark, chances are it's whatever variant of unicode makes that mark. But in general, you have to be told by the user, or make an attempt to guess, relying on the user to correct you if you're wrong, or you have to select a fixed character set and don't attempt to support any others.

Someone here had the same problem with Russian characters (He's using basic_ifstream<wchar_t> wich should be the same as wifstream according to this page). In the comments of that question they also link to this which should help you further.
If understood everything correctly, it seems that wifstream reads the characters correctly but your program tries to convert them to whatever locale your program is running in.

Two errors:
std::wifstream(L"D:\\sample.txt");
And do not mix cout and wcout.
Also check that your file is encoded in UTF-16, Little-Endian. If not so, you will be in trouble reading it.

wfstream uses wfilebuf for the actual reading and writing of the data. wfilebuf defaults to using a char buffer internally which means that the text in the file is assumed narrow, and converted to wide before you see it. Since the text was actually wide, you get a mess.
The solution is to replace the wfilebuf buffer with a wide one.
You probably also need to open the file as binary.
const size_t bufsize = 128;
wchar_t buffer[bufsize];
wifstream myfile("D:\\sample.txt", ios::binary);
myfile.rdbuf()->pubsetbuf(buffer, 128);
Make sure the buffer outlives the stream object!
See details here: http://msdn.microsoft.com/en-us/library/tzf8k3z8(v=VS.80).aspx

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js