Inputting a string containing greek characters in linux

Inputting a string containing greek characters in linux - c++

I have a function which returns a string.
I have to define that string with greek characters in the function itself and should return that string.
I am working on Linux platform and my code is in C++.
My function is as follows:
string gen_string()
{
string str = "αγρω";
return str;
}
But I am not able to give the input.
When I try to copy paste the greek characters I want, it is appearing as some garbage characters.
Can some one please help me with this?
Thanks in advance.
EDIT:
Thanks for all your response.
Its not about using the wstring or string.
When I copy the string to the vim to give it as input, it is appearing as something like this.
▒~^▒~T▒~A▒~A201604¸▒~B▒žMDF_F▒~S123▒~T▒~B▒▒~B▒
I also tried by keeping the text in the file and opening the text file from vim.
But still it's the same.

string is only for ASCII characters, I believe.
You have international, likely Unicode characters. Consider using std::wstring for a multibyte "wide" string.

If you mean copy from some text to the terminal input then how to do this depends on the terminal. If it's a gnome terminal you need to specify UTF-8 in the locale settings though I'm not sure if that would get you the Greek alphabet.
locale command will list the current locale setting in locale.conf. You likely want to change the LANG setting. A way to do this system wide is
localectl set-locale LANG=en_country_code.UTF-8
Change country_code. It's US for the United States but I don't know what the Greek code is. You may need to be root. To change it just for yourself modify
~/.config/locale.conf
(or $XDG_CONFIG_HOME/locale.conf or $HOME/.config/locale.conf).
whichever gets you to the locale.conf file. On most systems all of them do.

Related

Reading input from file with Chinese Characters that got mangled

I'm getting stuck trying to convert an input string in char* to Chinese character encoding. An application accepts a Chinese string input ex: "啊说到" and when it is written into a file it turns into this "°¡Ëµµ½". I'm able to take this input and feed it to _mbstowcs_s_l() but the solution needs to be locale independent, so I'm forced to use either mbstowcs() or WideCharToMultiByte() but it looks like both would work for me if the input did already went through MBCS to UTF-8, which in our case isnt.
The project is using Multibyte Character Set, and I'm struggling to understand what is going on. One other thing is the input is coming from a different application and stores it into file.
The application that accepted the Chinese input is an MFC set to Multibyte Char Set and the os was set to regional Chinese Simplified, UI accepts the input and is placed on a CString, that is coped to a char*. This is that part where I don't know whats going on in the encoding, this application stores it into a file, then we read it using the other application, the string is read unto char*, thats when the characters seems to take the "°¡Ëµµ½".
Question is, how can I turn this encoded char"°¡Ëµµ½" back to its Chinese encoding "啊说到", with out setting the locale in _mbstowcs_s_l()? The problem is, we could be reading strings from other regional settings and the application wouldn't just know what character map to use unless we tell it to.

Convert \xc3\xd8\xe8\xa7\xc3\xb4\xd to human readable format

I am having trouble converting '\xc3\xd8\xe8\xa7\xc3\xb4\xd' (which is a Thai text) to a readable format. I get this value from a smart card, and it basically was working for Windows but not in Linux.
If I print in my Python console, I get:
����ô
I tried to follow some google hints but I am unable to accomplish my goal.
Any suggestion is appreciated.

Your text does not seem to be a Unicode text. Instead, it looks like it is in one of Thai encodings. Hence, you must know the encoding before printing the text.
For example, if we assume your data is encoded in TIS-620 (and the last character is \xd2 instead of \xd) then it will be "รุ่งรดา".
To work with the non-Unicode strings in Python, you may try: myString.decode("tis-620") or even sys.setdefaultencoding("tis-620")

Convert path to \\

Okay, after two days of searching the web and MSDN, I didn't found any real solution to this problem, so I'm gonna ask here in hope I've overlooked something.
I have open dialog window, and after I get location from selected file, it gives the string in following way C:\file.exe. For next part of mine program I need C:\\file.exe. Is there any Microsoft function that can solve this problem, or some workaround?
ofn.lpstrFile = fileName;
char fileNameStr[sizeof(fileName)+1] = "";
if (GetOpenFileName(&ofn))
strcpy(fileNameStr, fileName);
DeleteFile(fileName); // doesn't works, invalid path
I've posted only this part of code, because everything else works fine and isn't relevant to this problem. Any assistence is greatly appreciated, as I'm going mad in last two days.

You are confusing the requirement in C and C++ to escape backslash characters in string literals with what Windows requires.
Windows allows double backslashes in paths in only two circumstances:
Paths that begin with "\\?\"
Paths that refer to share names such as "\\myserver\foo"
Therefore, "C:\\file.exe" is never a valid path.
The problem here is that Microsoft made the (disastrous) decision decades ago to use backslashes as path separators rather than forward slashes like UNIX uses. That decision has been haunting Windows programmers since the early 1980s because C and C++ use the backslash as an escape character in string literals (and only in literals).
So in C or C++ if you type something like DeleteFile("c:\file.exe") what DeleteFile will see is "c:ile.exe" with an unprintable 0xf inserted between the colon and "ile.exe". That's because the compiler sees the backslash and interprets it to mean the next character isn't what it appears to be. In this case, the next character is an f, which is a valid hex digit. Therefore, the compiler converts "\f" into the character 0xf, which isn't valid in a file name.
So how do you create the path "c:\file.exe" in a C/C++ program? You have two choices:
"c:/file.exe"
"c:\\file.exe"
The first choice works because in the Win32 API (and only the API, not the command line), forward slashes in paths are accepted as path separators. The second choice works because the first backslash tells the compiler to treat the next character specially. If the next character is a hex digit, that's what you will get. If the next character is another backslash, it will be interpreted as exactly that and your string will be correct.

The library Boost.Filesystem "provides portable facilities to query and manipulate paths, files, and directories".
In short, you should not use strings as file or path names. Use boost::filesystem::path instead. You can still init it from a string or char* and you can convert it back to std::string, but all manipulations and decorations will be done correctly by the class.

Im guessing you mean convert "C:\file.exe" to "C:\\file.exe"
std::string output_string;
for (auto character : input_string)
{
if (character == '\\')
{
output_string.push_back(character);
}
output_string.push_back(character);
}
Please note it is actually looking for a single backslash to replace, the double backslash used in the code is to escape the first one.

what locale does wstring support?

In my program I used wstring to print out text I needed but it gave me random ciphers (those due to different encoding scheme). For example, I have this block of code.
wstring text;
text.append(L"Some text");
Then I use directX to render it on screen. I used to use wchar_t but I heard it has portability problem so I switched to swtring. wchar_t worked fine but it seemed only took English character from what I can tell (the print out just totally ignore the non-English character entered), which was fine, until I switch to wstring: I only got random ciphers that looked like Chinese and Korean mixed together. And interestingly, my computer locale for non-unicode text is Chinese. Based on what I saw I suspected that it would render Chinese character correctly, so then I tried and it does display the charactor correctly but with a square in front (which is still kind of incorrect display). I then guessed the encoding might depend on the language locale so I switched the locale to English(US) (I use win8), then I restart and saw my Chinese test character in the source file became some random stuff (my file is not saved in unicode format since all texts are English) then I tried with English character, but no luck, the display seemed exactly the same and have nothing to do with the locale. But I don't understand why it doesn't display correctly and looked like asian charactor (even I use English locale).
Is there some conversion should be done or should I save my file in different encoding format? The problem is I wanted to display English charactore correctly which is the default.

In the absence of code that demonstrates your problem, I will give you a correspondingly general answer.
You are trying to display English characters, but see Chinese characters. That is what happens when you pass 8 bit ANSI text to an API that receives UTF-16 text. Look for somewhere in your program where you cast from char* to wchar_t*.

First of all what is type of file you are trying to store text in?Normal txt files stores in ANSI by default (so does excel). So when you are trying to print a Unicode character to a ANSI file it will print junk. Two ways of over coming this problem is:
try to open the file in UTF-8 or 16 mode and then write
convert Unicode to ANSI before writing in file. If you are using windows then MSDN provides particular API to do Unicode to ANSI conversion and vice-verse. If you are using Linux then Google for conversion of Unicode to ANSI. There are lot of solution out there.
Hope this helps!!!

std::wstring does not have any locale/internationalisation support at all. It is just a container for storing sequences of wchar_t.
The problem with wchar_t is that its encoding is unspecified. It might be Unicode UTF-16, or Unicode UTF-32, or Shift-JIS, or something completely different. There is no way to tell from within a program.
You will have the best chances of getting things to work if you ensure that the encoding of your source code is the same as the encoding used by the locale under which the program will run.
But, the use of third-party libraries (like DirectX) can place additional constraints due to possible limitations in what encodings those libraries expect and support.

Bug solved, it turns out to be the CASTING problem (not rendering problem as previously said).
The bugged text is a intermediate product during some internal conversion process using swtringstream (which I forgot to mention), the code is as follows
wstringstream wss;
wstring text;
textToGenerate.append(L"some text");
wss << timer->getTime()
text.append(wss.str());
Right after this process the debugger shows the text as a bunch of random stuff but later somehow it converts back so it's readable. But the problem appears at rendering stage using DirectX. I somehow left the casting for wchar_t*, which results in the incorrect rendering.
old:
LPCWSTR lpcwstrText = (LPCWSTR)textToDraw->getText();
new:
LPCWSTR lpcwstrText = (*textToDraw->getText()).c_str();
By changing that solves the problem.
So, this is resulted by a bad cast. As some kind people provided correction to my statement.

Remove a sub string in a Korean string in C++

I have a Korean string: "태권소녀 1". And now I want to remove a substring, " 1" (a space and '1' character). How can I do it in C++?
With the English string it works ok, but I cannot do it with Korean yet.
Thank you so much if you can give me some ideas.

thestring.erase(thestring.find(" 1"),2);
assuming, it's there. This is not the code to use it's a hint about what to look up in the documentation.
The problem you have is probably to determine the size in bytes of the particular string in characters. It depends on the encoding, but generally you may want to look at the family of functions with mb in their names (which stands for multibyte).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js