What the purpose of imbue in C++? - c++

I'm working with some code today, and I saw:
extern std::locale g_classicLocale;
class StringStream : public virtual std::ostringstream
{
public:
StringStream() { imbue(g_classicLocale); }
virtual ~StringStream() {};
};
Then I came in face of imbue. What is the purpose of the imbue function in C++? What does it do? Are there any potential problems in using imbue (non-thread safe, memory allocation)?

imbue is inherited by std::ostringstream from std::ios_base and it sets the locale of the stream to the specified locale.
This affects the way the stream prints (and reads) certain things; for instance, setting a French locale will cause the decimal point . to be replaced by ,.

C++ streams perform their conversions to and from (numeric) types according to a locale, which is an object that summarizes all the localization information needed (decimal separator, date format, ...).
The default for streams is to use the current global locale, but you can set to a stream a custom locale using the imbue function, which is what your code does here - I suppose it's setting the default C locale to produce current locale-independent text (this is useful e.g. for serialization purposes).

Related

C++17 to_string() converts floats with comma

This is happening inside a big project so I cannot really post a minimal reproducible example but I'll try asking anyway. I'm building a list of benchmarks application integrated with a framework we're working on and on one of them the convertion we require to make (float -> string) with to_string appears to reproduce a comma separated result.
| Monitored values:
| [ my_time_monitor.average = 61720,000000 ]
This is the function responsible:
std::string operating_point_parser::operator()(const int32_t num_threads, const float exec_time_ms) const {
return "{\"compute\":[{\"knobs\":{\"num_threads\":" + std::to_string(num_threads) + "},\"metrics\":{\"exec_time_ms\":[" + std::to_string(exec_time_ms) + ",0]}}]}";
}
Since as I said the same exact function is being called by other applications which don't show this unexpected behavior, my guess is that some internal compilation flags are messing around.
set( CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 -DENABLE_THREADS -DHAVE_RANDOM -DHAVE_UNISTD_H -DHAVE_SYS_FILE_H -DHAVE_SYS_MMAN_H -DHAVE_CONFIG_H -DVIPSDATASET_PATH=\"\\\"${CMAKE_CURRENT_SOURCE_DIR}/dataset/orion_18000x18000.v\\\"\"" )
set( CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DHAVE_RANDOM -DHAVE_UNISTD_H -DIM_PREFIX=\"\\\"${CMAKE_CURRENT_SOURCE_DIR}/dataset\\\"\" -DIM_EXEEXT=\"\\\"\\\"\" -DIM_LIBDIR=\"\\\"${CMAKE_INSTALL_PREFIX}/lib\\\"\" -DGETTEXT_PACKAGE=\"\\\"vips7\"\\\" -DHAVE_SYS_FILE_H -DHAVE_SYS_MMAN_H -DHAVE_CONFIG_H")
If you want to take a look at the full application code here's the link. The operating_point_parse::operator() is called inside margot::compute::push_custom_monitor_values().
As stated in the Notes of https://en.cppreference.com/w/cpp/string/basic_string/to_string, to_string relies on the current locale for formatting purposes:
std::to_string relies on the current locale for formatting purposes,
and therefore concurrent calls to std::to_string from multiple threads
may result in partial serialization of calls. C++17 provides
std::to_chars as a higher-performance locale-independent alternative.
So, if you want to have dots instead of commans, you have to adjust the current locale.
Or instead of changing the global locale with std::locale::global(...) you could use a stringstream and imbue() the locale on that stream only, for ex.
stringstream ss;
ss.imbue( locale you want )
ss << ... write what you need
ss.str(); // get formatted string
std::to_string uses the currently active locale for formatting.
You can set the active locale using a C locale name using:
const char* locale = "C";
std::locale::global(std::locale(locale));
Meaning of locale name is specified in the C standard (quote from C11 draft):
7.11.1.1 The setlocale function
A value of "C" for locale specifies the minimal environment for C translation; a value of "" for locale specifies the locale-specific native environment. Other implementation-defined strings may be passed as the second argument to setlocale.
That value is going to be formatted inside a JSON string
In this case, and more generally when ever you wish to format using style that shouldn't depend on the global locale, should avoid std::to_string.
What would you recommend since you advised on avoiding it?
Anything that doesn't use locale, or lets you specify the locale to use instead of using the global locale. For example:
std::format("{}", 0.42); // doesn't use locale
std::format(std::locale("C"), "{}", 0.42); // use specific locale
Another example is a stringstream with imbued locale as suggested in the other answer.

how to detect and handle unsupported locales in algorithms?

I have a function with the following signature:
template <typename Container>
void write_cards_as_xml(const Container& cards, std::ostream& os);
Internally it calls:
boost::property_tree::ptree root;
...
boost::property_tree::write_xml(os, root);
The write_xml function does not know anything about encodings. By default it assumes UTF-8 But does not do any conversions. It's up to the locale of os. I'm not sure how to handle unsupported non-UTF-8 locales. Can I detect if it is not UTF-8? Should I throw if not? Should I replace the locale temporarily to my prefered encoding? I' m using boost locale.
The Standard library has no platform independent way to detect if a locale is UTF-8. There's only a name method which returns a platform dependent name. Even if it is a POSIX name there's no guarantee that the encoding is part of the locale's name.
Boost.Locale offers an additional facet called boost::locale::info holding detailed information about the current locale.
https://www.boost.org/doc/libs/1_70_0/libs/locale/doc/html/locale_information.html
You can obtain the info like this:
std::use_facet<boost::locale::info>(some_locale).utf8()
If there is no info facet std::use_face throws std::bad_cast. In this case it's not a Boost locale and you're out of luck. Throwing is a reasonable behavior in this case. You could catch the bad_cast and throw a more informative exception instead. If there's an info facet you can inspect the return value of utf8(). If it returns false the current locale is not compatible and you should throw, too. Otherwise your algorithm can run without problems.

Change narrow string encoding or missing std::filesystem::path::imbue

I'm on Windows and I'm constructing std::filesystem::path from std::string. According to constructor reference (emphasis mine):
If the source character type is char, the encoding of the source is assumed to be the native narrow encoding (so no conversion takes place on POSIX systems)
If I understand correctly, this means string content will be treated as encoded in ANSI under Windows. To treat it as encoded in UTF-8, I need to use std::filesystem::u8path() function. See the demo: http://rextester.com/PXRH65151
I want constructor of path to treat contents of narrow string as UTF-8 encoded. For boost::filesystem::path I could use imbue() method to do this:
boost::filesystem::path::imbue(std::locale(std::locale(), new std::codecvt_utf8_utf16<wchar_t>()));
However, I do not see such method in std::filesystem::path. Is there a way to achieve this behavior for std::filesystem::path? Or do I need to spit u8path all over the place?
My solution to this problem is to fully alias the std::filesystem to a different namespace named std::u8filesystem with classes and methods that treat std::string as UTF-8 encoded. Classes inherit their corresponding in std::filesystem with same name, without adding any field or virtual method to offer full API/ABI interoperability. Full proof of concept code here, tested only on Windows so far and far to be complete. The following snippet shows the core working of the helper:
std::wstring U8ToW(const std::string &string);
namespace std
{
namespace u8filesystem
{
#ifdef WIN32
class path : public filesystem::path
{
public:
path(const std::string &string)
: fs::path(U8ToW(path))
{
}
inline std::string string() const
{
return filesystem::path::u8string();
}
}
#else
using namespace filesystem;
#endif
}
}
For the sake of performance, path does not have a global way to define locale conversions. Since C++ pre-20 does not have a specific type for UTF-8 strings, the system assumes any char strings are narrow character strings. So if you want to use UTF-8 strings, you have to spell it out explicitly, either by providing an appropriate conversion locale to the constructor or by using u8path.
C++20 gave us char8_t, which is always presumed to be UTF-8. So if you consistently use char8_t-based strings (like std::u8string), path's implicit conversion will pick up on it and work appropriately.

Force encoding when writing txt file with ofstream

I writing a txt file using ofstream, from various reasons the file should have local encoding and not UTF8.
The machine which process the file has different localizations then the target local.
is there a way to force the encoding when writing a file?
regards,
Ilan
You can call std::ios::imbue on your ofstream object to modify the locale. This won't affect the global locale.
std::ofstream os("output.txt");
std::locale mylocale("");
os.imbue(mylocale);
os << 1.5f << std::endl;
os.close();
Pay attention to the argument of std::locale constructor, it is implementation dependant. For example, the German locale could be :
std::locale mylocale("de_DE");
or
std::locale mylocale("German");
Well, given that it's Windows, you'd not have UTF8 anyway. But exactly what are you writing? Usually, you have a std::string in memory and write that to disk. The only difference is that \n in memory is translated to CR/LF (\r\n) on disk. That's the same translation everywhere.
You might encounter a situation where you're writing a std::wstring. In that case, it's determined by the locale. The default locale is the C locale, aka std::locale("C") orstd::locale::classic(). The local encoding (which you seem to want) isstd::locale("")`.
Other locales exist; see here

Currency formatting with c++

Is there an obvious way to perform currency formatting in C++ ?
For example: 1978879 would become 1'978'879
Thanks
Short answer:
int value = 1978879;
std::cout.imbue(std::locale(""));
std::cout << value << std::endl;
Locales are responsible for formatting. Any stream can be imbued with a locale; by default they use the global locale, which by default is the "C" locale which doesn't use any thousands separators. By creating a locale instance with the empty string as the parameter we use the user's locale, which in your case will likely be Swiss.
You can also specify an explicit locale name, but the names are different depending on your platform (Linux/Windows), and not all systems support all locales.
If you want to get a string, the easiest way is probably to use a stringstream (from the <sstream> header):
std::ostringstream stream;
stream.imbue(std::locale(""));
stream << value;
std::string stringValue = stream.str();
Though you can also use the locale's facets directly, but that's more complicated.
You could also set the global locale, which will be used by all streams (unless they're specifically imbued with a different locale):
std::locale::global(std::local(""));
Take a look at the standard C++ localization library. It's not that straightforward but you can probably achieve that through the num_get/numpunct facets.