In my C++17 project, I have a std::string which is known to contain UTF-8 encoded data. Is there any way to force fmt to treat its data as UTF-8 such that this works as expected?
fmt::print("{:-^11}", "あいう");
// should print "----あいう----", currently prints "-あいう-"
UTF-8 handling in {fmt} was recently improved and your example now works with the master branch:
#include <fmt/core.h>
int main() {
fmt::print("{:-^11}", "あいう");
}
prints
----あいう----
Pass field width as the next argument and calculate it yourself:
#include <fmt/format.h>
#include <cstring>
int main() {
fmt::print("{:-^{}}", "あいう", 8 + std::strlen("あいう"));
}
Related
I am integrating a library that requires little endian for length. It's formatted with little endian and then a custom serialized object. How do I convert 4 byte char into a int? The little endian tells me the size of the serialized object to read.
so if I receive "\x00\x00\x00H\x00" I would like to be able to get the decimal value out.
my library looks like
char buffer_size[size_desc]
m_socket->receive(boost::asio::buffer(buffer, size_desc));
int converted_int = some_function(buffer); <-- not sure what to do here
char buffer_obj[converted_int];
m_socket->receive(boost::asio::buffer(buffer, size_desc));
For a simple solution you could do couple of tricks,
Reverse with a cast:
// #include <stdafx.h>
#include <cassert>
#include <iomanip>
#include <iostream>
#include <algorithm>
#include <string>
int main()
{
char buff[4] = {3,2,1,0};
std::cout << (*reinterpret_cast<int*>(&buff[0])) << "\n";
std::reverse(buff, buff+4);
std::cout << (*reinterpret_cast<int*>(&buff[0]));
return 0;
};
Boost also comes with an endianness library:
https://www.boost.org/doc/libs/1_74_0/libs/endian/doc/html/endian.html#buffers
You can use the built in types, like:
big_int32_t
little_int16_t
I want to create file having arabic name in C++. Below is the program that I tried.
#include <iostream>
#include <fstream>
#include <string>
int main() {
const char *path="D:\\user\\c++\\files\\فثسف.txt";
std::ofstream file(path); //open in constructor
std::string data("Hello World");
file << data;
return 0;
}
But file gets created with junk characters: ÙثسÙ.txt.
I am using windows platform and g++ compiler.
The default encoding for string literals can be specified with the -fexec-charset compiler option for gcc / g++.
In C++11 and later, can also use the u8, u, and U prefixes to strings to specify UTF8, UTF16, and UTF32 encodings:
const char * utf8literal = u8"This is an unicode UTF8 string! 剝Ц";
const char16_t * utf16literal = u"This is an unicode UTF16 string! 剝Ц";
const char32_t * utf32literal = U"This is an unicode UTF32 string! 剝Ц";
Using the above prefixes can upset some functions who aren't expecting these specific types of strings though; in general it may be better to set the compiler option.
There's a great writeup about this topic on this blog post: http://cppwhispers.blogspot.com/2012/11/unicode-and-your-application-3-of-n.html
I hope this helps. :)
Use UTF8:
#include <iostream>
#include <fstream>
#include <filesystem>
int main()
{
namespace fs = std::filesystem;
fs::path path { u8"فثسف.txt" };
std::ofstream file { path };
file << "Hello World";
return 0;
}
Using <filesystem> library may require additional compiler/linker options. GNU implementation requires linking with -lstdc++fs and LLVM implementation requires linking with -lc++fs
Here a little code that reads a line from UFT-8 file:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
When I build it with the Visual Studio Visual C++ compiler, I got the next result:
test τεστ тест
as expected.
By when I use MinGW with the GCC compiler, I got
琀攀猀琀 쐃딃쌃쐃 䈄㔄䄄䈄
As you understand, it's not the expected result.
Does any simple way exist to fix the output for GCC to the expected string?
OR
Does any simple way exist to use UTF-8 for both MSVC and GCC?
Answer (thanks for Igor Tandetnik and Remy Lebeau):
Seems, we must specify endian mode explicitly, because MSVC and GCC have different defaults. So
new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>()
should be used.
Fixed code:
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <locale>
#include <fstream>
#include <codecvt>
int main()
{
_setmode(_fileno(stdout), _O_U8TEXT);
auto inputFileStream = std::wifstream("input.txt");
const auto utf8Locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::little_endian>());
inputFileStream.imbue(utf8Locale);
std::wstring line;
std::getline(inputFileStream, line);
std::wcout << line << std::endl;
inputFileStream.close();
return 0;
}
For your second question, one option is to limit the use of utf16 and std::w-prefixed stuff to the cases when you need to exchange utf16-encoded strings with the operating system. This happens when you receive arguments in wmain, open file with _wfopen, call Windows API function, etc. Otherwise, you would store, get from the user and return to the user utf8 strings using char type (char*, std::string, etc). Conversion between utf8 and utf16 can be done with MultiByteToWideChar and WideCharToMultiByte, bypassing the retarded c++ encoding api. The place where this does not work well is console input/output. Overall, you can output utf8 to the console if the user sets chcp 65001 and a ttf font. At least in Windows 7, you will also have to make sure not to split a character between two write calls, otherwise it will not print correctly (this also implies you cannot use std::cout, because msvcrt will call putc for every byte separately, and you'll need to use puts, fprintf, etc instead); I heard that this was fixed in Windows 10, but cannot confirm. Reading utf8 from the console with file api does not work as far as I know; if you want that, you'd need to detect that stdin is attached to a console and use console api instead.
I am iterating through all the files in a folder and just want their names in a string. I want to get a string from a std::filesystem::path. How do I do that?
My code:
#include <string>
#include <iostream>
#include <filesystem>
namespace fs = std::experimental::filesystem;
int main()
{
std::string path = "C:/Users/user1/Desktop";
for (auto & p : fs::directory_iterator(path))
std::string fileName = p.path;
}
However I get the following error:
non-standard syntax; use '&' to create a pointer to a member.
To convert a std::filesystem::path to a natively-encoded string (whose type is std::filesystem::path::value_type), use the string() method. Note the other *string() methods, which enable you to obtain strings of a specific encoding (e.g. u8string() for an UTF-8 string).
C++17 example:
#include <filesystem>
#include <string>
namespace fs = std::filesystem;
int main()
{
fs::path path{fs::u8path(u8"愛.txt")};
std::string path_string{path.u8string()};
}
C++20 example (better language and library UTF-8 support):
#include <filesystem>
#include <string>
namespace fs = std::filesystem;
int main()
{
fs::path path{u8"愛.txt"};
std::u8string path_string{path.u8string()};
}
The examples given in the accepted answer, using the UTF-8 operations, are fine and a good guideline. There is just one error in the introductory explanation given in the answer, which Windows/MSVC developers should be aware of:
The string() method does not return the natively-encoded string (which would be std::wstring() on Windows), but rather it always returns a std::string. It also tries to convert the path to the local encoding, which is not always possible, if the path contains a unicode character not representable in the current code page and then the method throws an exception!
If you actually want the behavior that is described in the answer (method returns native string, i.e., std::string on Linux and std::wstring on Windows), you would have to use the native() method or the implicit conversion based on std::filesystem::path::operator string_type(), but as #tambre correctly pointed out in the examples, you should consider using the UTF-8 versions throughout.
In C++ 17 and above, you can use .generic_string() to convert the path to a string: https://en.cppreference.com/w/cpp/filesystem/path/generic_string.
The following is an example that gets the current working directory and converts it into a string.
#include <string>
#include <filesystem>
using std::filesystem::current_path;
int main()
{
filesystem::path directoryPath = current_path();
string stringpath = directoryPath.generic_string();
}
I need to print a csv file with numbers.
When the file is printed , I have numbers with dots, but I need them with commas.
Here an example.
If I print this number in terminal using locale method, I obtain a number with comma, but in the file I have the same number but with dot. I do not understand why.
How could I do?
#include <iostream>
#include <locale>
#include <string> // std::string, std::to_string
#include <fstream>
using namespace std;
int main()
{
double x = 2.87;
std::setlocale(LC_NUMERIC, "de_DE");
std::cout.imbue(std::locale(""));
std::cout << x << std::endl;
ofstream outputfile ("out.csv");
if (outputfile.is_open())
{
outputfile <<to_string(x)<<"\n\n";
}
return 0;
}
Thanks in advance.
Locales are system-specific. You probably just made a typo; try "de-DE", which will probably work (at least it does on my Windows).
However, if your program is not inherently German-centric, then abusing the German locale just for the side effect of getting a specific decimal point character is bad programming style, I think.
Here is an alternative solution using std::numpunct::do_decimal_point:
#include <string>
#include <fstream>
#include <locale>
struct Comma final : std::numpunct<char>
{
char do_decimal_point() const override { return ','; }
};
int main()
{
std::ofstream os("out.csv");
os.imbue(std::locale(std::locale::classic(), new Comma));
double d = 2.87;
os << d << '\n'; // prints 2,87 into the file
}
This code specifically states that it just wants the standard C++ formatting with only the decimal point character replaced with ','. It makes no reference to specific countries or languages, or system-dependent properties.
Your issue is that std::to_string() uses the C locale libraries. It appears that "de_DE" is not a valid locale on your machine (or Coliru for that matter), leading to the default C locale being used and using .. The solution is to use "de_DE.UTF-8". As an aside, using "" for std::locale will not always produce commas; instead, it will depend on the locale set for your machine.