How to store unicode character in wstring on linux? - c++

#include <iostream>
using namespace std;
int main() {
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
Whys this doesn't work? And how solve this?

It doesn't work because in the default C locale, there is no character which corresponds to U+00A2.
If you're using a standard ubuntu install, it is most likely that your user locale uses a larger character set than US-ASCII, quite possibly Unicode encoded with UTF-8. So you just need to switch to the locale specified in the environment, as follows:
#include <iostream>
/* locale is needed for std::setlocale */
#include <locale>
#include <string>
int main() {
/* The following switches to the locale specified
* by the LC_ALL environment variable.
*/
std::setlocale (LC_ALL, "");
std::wstring str = L"\u00A2";
std::wcout << str;
return 0;
}
If you use std::string instead of std::wstring and std::cout instead of std::wcout, then you don't need the setlocale because no translation is needed (provided the console expects UTF-8).

Related

c++How to use utf-8 and/or utf-32 in c++ without wchar without stuff like ICU (gcc/cl compilers)?

I have tried many things but they don't seem to work.
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "en-US.utf8");
std::string uni = u8"ユチエフー8";
std::cout << uni;
std::cin.get();
return 0;
}
Please note that I want utf-8/utf-32 to work not in just this situation but most of the situations without using things like wide characters.

Create file in arabic name in c++

I want to create file having arabic name in C++. Below is the program that I tried.
#include <iostream>
#include <fstream>
#include <string>
int main() {
const char *path="D:\\user\\c++\\files\\فثسف.txt";
std::ofstream file(path); //open in constructor
std::string data("Hello World");
file << data;
return 0;
}
But file gets created with junk characters: ÙثسÙ.txt.
I am using windows platform and g++ compiler.
The default encoding for string literals can be specified with the -fexec-charset compiler option for gcc / g++.
In C++11 and later, can also use the u8, u, and U prefixes to strings to specify UTF8, UTF16, and UTF32 encodings:
const char * utf8literal = u8"This is an unicode UTF8 string! 剝Ц";
const char16_t * utf16literal = u"This is an unicode UTF16 string! 剝Ц";
const char32_t * utf32literal = U"This is an unicode UTF32 string! 剝Ц";
Using the above prefixes can upset some functions who aren't expecting these specific types of strings though; in general it may be better to set the compiler option.
There's a great writeup about this topic on this blog post: http://cppwhispers.blogspot.com/2012/11/unicode-and-your-application-3-of-n.html
I hope this helps. :)
Use UTF8:
#include <iostream>
#include <fstream>
#include <filesystem>
int main()
{
namespace fs = std::filesystem;
fs::path path { u8"فثسف.txt" };
std::ofstream file { path };
file << "Hello World";
return 0;
}
Using <filesystem> library may require additional compiler/linker options. GNU implementation requires linking with -lstdc++fs and LLVM implementation requires linking with -lc++fs

Is it possible to unify std::wstring behavior in VSVC and GCC?

Here a little code that reads a line from UFT-8 file:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
When I build it with the Visual Studio Visual C++ compiler, I got the next result:
test τεστ тест
as expected.
By when I use MinGW with the GCC compiler, I got
琀攀猀琀 쐃딃쌃쐃 䈄㔄䄄䈄
As you understand, it's not the expected result.
Does any simple way exist to fix the output for GCC to the expected string?
OR
Does any simple way exist to use UTF-8 for both MSVC and GCC?
Answer (thanks for Igor Tandetnik and Remy Lebeau):
Seems, we must specify endian mode explicitly, because MSVC and GCC have different defaults. So
new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>()
should be used.
Fixed code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
For your second question, one option is to limit the use of utf16 and std::w-prefixed stuff to the cases when you need to exchange utf16-encoded strings with the operating system. This happens when you receive arguments in wmain, open file with _wfopen, call Windows API function, etc. Otherwise, you would store, get from the user and return to the user utf8 strings using char type (char*, std::string, etc). Conversion between utf8 and utf16 can be done with MultiByteToWideChar and WideCharToMultiByte, bypassing the retarded c++ encoding api. The place where this does not work well is console input/output. Overall, you can output utf8 to the console if the user sets chcp 65001 and a ttf font. At least in Windows 7, you will also have to make sure not to split a character between two write calls, otherwise it will not print correctly (this also implies you cannot use std::cout, because msvcrt will call putc for every byte separately, and you'll need to use puts, fprintf, etc instead); I heard that this was fixed in Windows 10, but cannot confirm. Reading utf8 from the console with file api does not work as far as I know; if you want that, you'd need to detect that stdin is attached to a console and use console api instead.

FindFirstFile() show address

I used function FindFirstFile() but i received only memory address - not a file name.
#include <stdafx.h>
#include <iostream>
#include <windows.h>
using namespace std;
int main()
{
WIN32_FIND_DATA File_Data;
FindFirstFile(TEXT("C:\\Users\\user\\Desktop\\temp\\*.tmp"), &File_Data);
cout<<File_Data.cFileName;
cin.get();
return 0;
}
Can anybody help me?
You are probably compiling using the Unicode character set, which means that windows API's will default all character strings to the wide version (wchar_t vs char). Try using the wide output version of cout (wcout):
wcout<<File_Data.cFileName;

clang: converting const char16_t* (UTF-16) to wstring (UCS-4)

I'm trying to convert UTF-16 encoded strings to UCS-4
If I understand correctly, C++11 provides this conversion through codecvt_utf16.
My code is something like:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));
wcout << ws << endl;
return 0;
}
Note: the explicit push_backs to get around the fact that my version of clang (Xcode 4.2) doesn't have unicode string literals.
When the code is run, I get terminate exception. Am I doing something illegal here? I was thinking it should work because the const char* that I passed to wstring_convert is UTF-16 encoded, right? I have also considered endianness being the issue, but I have checked that it's not the case.
Two errors:
1) from_bytes() overload that takes the single const char* expects a null-terminated byte string, but your very second byte is '\0'.
2) your system is likely little-endian, so you need to convert from UTF-16LE to UCS-4:
#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>
using namespace std;
int main()
{
u16string s;
s.push_back('h');
s.push_back('e');
s.push_back('l');
s.push_back('l');
s.push_back('o');
wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
wchar_t> conv;
wstring ws = conv.from_bytes(
reinterpret_cast<const char*> (&s[0]),
reinterpret_cast<const char*> (&s[0] + s.size()));
wcout << ws << endl;
return 0;
}
Tested with Visual Studio 2010 SP1 on Windows and CLang++/libc++-svn on Linux.