Force encoding when writing txt file with ofstream - c++

I writing a txt file using ofstream, from various reasons the file should have local encoding and not UTF8.
The machine which process the file has different localizations then the target local.
is there a way to force the encoding when writing a file?
regards,
Ilan

You can call std::ios::imbue on your ofstream object to modify the locale. This won't affect the global locale.
std::ofstream os("output.txt");
std::locale mylocale("");
os.imbue(mylocale);
os << 1.5f << std::endl;
os.close();
Pay attention to the argument of std::locale constructor, it is implementation dependant. For example, the German locale could be :
std::locale mylocale("de_DE");
or
std::locale mylocale("German");

Well, given that it's Windows, you'd not have UTF8 anyway. But exactly what are you writing? Usually, you have a std::string in memory and write that to disk. The only difference is that \n in memory is translated to CR/LF (\r\n) on disk. That's the same translation everywhere.
You might encounter a situation where you're writing a std::wstring. In that case, it's determined by the locale. The default locale is the C locale, aka std::locale("C") orstd::locale::classic(). The local encoding (which you seem to want) isstd::locale("")`.
Other locales exist; see here

Related

C++ UTF-8/ASCII to UTF-16 in MFC

How can I convert a (text) file from UTF-8/ASCII to UTF-16 before it will be displaying in a MFC program?
Because MFC uses 16 bits per character and the most (text) files on windows use UTF-8 or ASCII.
The simple answer is called MultiByteToWideChar and WideCharToMultiByte to do the reverse conversion. There's also CW2A and CA2W that are a little simpler to use.
However, I would strongly recommand against using these functions directly. You have the pain of handling character buffers manually with the risk of creating memory corruption or security holes.
It's much better to use a library based on std::string and/or iterators. For example, utf8cpp. This one has the advantage to be small, header-only and multiplatform.
Actually, you can do it very simply, using the CStdioFile and CString classes provided by MFC. The MFC library is a very powerful and comprehensive one (albeit notwithstanding some major oddities, and even bugs); but, if you're already using it, then use it to its fullest extent:
...
const wchar_t* inpPath = L"<path>\\InpFile.txt"; // These values are given just...
const wchar_t* outPath = L"<path>\\outFile.txt"; // ... for illustrative purposes!
CStdioFile inpFile(inpPath, CFile::modeRead | CFile::typeText);
CStdioFile outFile(outPath, CFile::modeWrite | CFile::modeCreate | CFile::typeText
| CFile::typeUnicode); // Note the Unicode flag - will create UTF-16LE file!
CString textBuff;
while (inpFile.ReadString(textBuff)) {
outFile.WriteString(textBuff);
outFile.WriteString(L"\n");
}
inpFile.Close();
outFile.Close();
...
Of course, you will need to change the code (a bit) if you want the input and output files to have the same path, but that wouldn't mean changing the basic premise!
With this approach, there is no concern for any library calls to convert character strings - just let MFC do it for you, when it's reading/writing it's (Unicode) CString object!
Note: Compiled and tested with MSVC (VS-2019), 64-bit, in Unicode mode.
EDIT: Maybe I misunderstood your question! If you don't want to actually convert the file, but just display the contents, then take away all references in my code to outFile and just do stuff with each textBuffer object you read. The CString class takes care of all the required ASCII/UTF-8/UTF-16LE conversions.

Using a filesystem::path, how do you open a file in a cross-platform way?

Let's say you have used the new std::filesystem (or std::experimental::filesystem) code to hunt down a file. You have a path variable that contains the full pathname to this variable.
How do you open that file?
That may sound silly, but consider the obvious answer:
std::filesystem::path my_path = ...;
std::ifstream stream(my_path.c_str(), std::ios::binary);
This is not guaranteed to work. Why? Because on Windows for example, path::string_type is std::wstring. So path::c_str will return a const wchar_t*. And std::ifstream can only take paths with a const char* type.
Now it turns out that this code will actually function in VS. Why? Because Visual Studio has a library extension that does permit this to work. But that's non-standard behavior and therefore not portable. For example, I have no idea if GCC on Windows provides the same feature.
You could try this:
std::filesystem::path my_path = ...;
std::ifstream stream(my_path.string().c_str(), std::ios::binary);
Only Windows confounds us again. Because if my_path happened to contain Unicode characters, then now you're reliant on setting the Windows ANSI locale stuff correctly. And even that won't necessarily save you if the path happens to have characters from multiple languages that cannot exist in the same ANSI locale.
Boost Filesystem actually had a similar problem. But they extended their version of iostreams to support paths directly.
Am I missing something here? Did the committee add a cross-platform filesystem library without adding a cross-platform way to open files in it?
Bo Persson pointed out that this is the subject of a standard library defect report. This defect has been resolved, and C++17 will ship, requiring implementations where path::value_type is not char to have their file stream types take const filesystem path::value_type*s in addition to the usual const char* versions.

Using ifstream when filename contains wide characters

Using C++Builder XE5 (bcc32) in Windows 7.
I am trying to open a file whose filename contains a wide character. The actual filename I'm testing with is C:\bΛx\foo.txt. The non-ASCII character there is U+039B .
I have this filename stored correctly in a std::wstring. However, trying:
std::ifstream f( filename.c_str() );
fails to open the file.
Of course, in Standard C++ fopen only takes char *. However, the Dinkumware C++ RTL implementation has an overload accepting wchar_t *. Unfortunately the implementation of that overload in ...\Embarcadero\RAD Studio\12.0\source\cpprtl\Source\dinkumware\source\fiopen.cpp does not call _wfopen. Instead it uses wcstombs to convert the string to UTF-8 and then calls fopen.
Checking the source for fopen, it calls the narrow version of an underlying function ___topen which ultimately passes the UTF-8 string to CreateFile.
When I inspect the attempt to open the file using Sysinternals Process Monitor, it shows that it did attempt to open the file with a UTF-8 file string, and the operating system rejected this with the result NAME COLLISION.
If I open the file using _wfopen( filename.c_str(), L"r" ) then all is well and I can read the file using C I/O functions, but I can't use C++ iostreams of course.
Is there any way to use std::ifstream to open a file with U+039B or other such characters in the filename?
Note that using std::wifstream doesn't work either (it still tries to open the UTF-8 version of the filename).
If I open the file using _wfopen( filename.c_str(), L"r" ) then all is well and I can read the file using C I/O functions, but I can't use C++ iostreams of course.
I don't see that "of course". Your problem is reduced to making an iostreams streambuf from a FILE*. Howard Hinnant answered here that there's no method provided by the Standard, but implementing a streambuf-derived class on top of FILE* is pretty straightforward. He even mentions some code that he feels would be a good starting point.
Note that this only makes sense for a text file. iostreams and binary files do not get along; there's a character encoding layer and ios_base::binary does not turn that off.

What the purpose of imbue in C++?

I'm working with some code today, and I saw:
extern std::locale g_classicLocale;
class StringStream : public virtual std::ostringstream
{
public:
StringStream() { imbue(g_classicLocale); }
virtual ~StringStream() {};
};
Then I came in face of imbue. What is the purpose of the imbue function in C++? What does it do? Are there any potential problems in using imbue (non-thread safe, memory allocation)?
imbue is inherited by std::ostringstream from std::ios_base and it sets the locale of the stream to the specified locale.
This affects the way the stream prints (and reads) certain things; for instance, setting a French locale will cause the decimal point . to be replaced by ,.
C++ streams perform their conversions to and from (numeric) types according to a locale, which is an object that summarizes all the localization information needed (decimal separator, date format, ...).
The default for streams is to use the current global locale, but you can set to a stream a custom locale using the imbue function, which is what your code does here - I suppose it's setting the default C locale to produce current locale-independent text (this is useful e.g. for serialization purposes).

Currency formatting with c++

Is there an obvious way to perform currency formatting in C++ ?
For example: 1978879 would become 1'978'879
Thanks
Short answer:
int value = 1978879;
std::cout.imbue(std::locale(""));
std::cout << value << std::endl;
Locales are responsible for formatting. Any stream can be imbued with a locale; by default they use the global locale, which by default is the "C" locale which doesn't use any thousands separators. By creating a locale instance with the empty string as the parameter we use the user's locale, which in your case will likely be Swiss.
You can also specify an explicit locale name, but the names are different depending on your platform (Linux/Windows), and not all systems support all locales.
If you want to get a string, the easiest way is probably to use a stringstream (from the <sstream> header):
std::ostringstream stream;
stream.imbue(std::locale(""));
stream << value;
std::string stringValue = stream.str();
Though you can also use the locale's facets directly, but that's more complicated.
You could also set the global locale, which will be used by all streams (unless they're specifically imbued with a different locale):
std::locale::global(std::local(""));
Take a look at the standard C++ localization library. It's not that straightforward but you can probably achieve that through the num_get/numpunct facets.