I have this task:
1. In current directory create file subMape.dat
2. Write into it all names of folders, that stored in C:\Program Files folder
3. Display on the screen data, that was written in subMape.dat
#include <iostream>
#include <windows.h>
using namespace std;
int main() {
WIN32_FIND_DATA findFileData;
DWORD bytesWritten = 0;
HANDLE f;
HANDLE c = CreateFileW(L"subMape.txt", GENERIC_READ | GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
//TCHAR lpBuffer[32];
DWORD nNumberOfBytesToRead = 32;
//DWORD lpNumberOfBytesRead;
DWORD lengthSum = 0;
if (c) {
cout << "CreateFile() succeeded!\n";
if(f = FindFirstFile(L"C:\\Program Files\\*", &findFileData)){
if(f != INVALID_HANDLE_VALUE) {
while (FindNextFile(f, &findFileData)){
lengthSum += bytesWritten;
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName), &bytesWritten, NULL);
}
}
FindClose(f);
}
else {
cout << "FindFirstFile() failed :(\n";
}
}
else {
cout << "CreateFile() failed :(\n";
}
cout << lengthSum << endl;
//SetFilePointer(c, lengthSum, NULL, FILE_BEGIN);
//ReadFile(c, lpBuffer, lengthSum, &lpNumberOfBytesRead, NULL);
//wprintf(lpBuffer);
CloseHandle(c);
return 0;
}
I'm using UNICODE, when it writes findFileData.cFileName - it writes string, where characters splitted with spaces. For example: folder name "New Folder" (strlen = 10) will be written into the file as "N e w T o" (strlen = 10). What do?
Your text file viewer or editor just isn't smart enough to figure out that you've written a utf-16 encoded text file. Most text editors need help, write the BOM to the file:
cout << "CreateFile() succeeded!\n";
wchar_t bom = L'\xfeff';
WriteFile(c, &bom, sizeof(bom), &bytesWritten, NULL);
You need to use something like WideCharToMultiByte() to convert the UNICODE string to ANSI (or UTF8).
The reason you see "space" is that the program you are using to list the file treats it as one byte per character. When using Unicode in windows you will get two, and the second byte is a '\0'.
You need to choose how you want to encode the data in the file.
The easiest you can do is to use UTF-16LE, since this is the native encoding on Windows. Then you only need to prepend a byte order marker to the beginning of the file. This encoding has an advantage over UTF-8 since it is easy to destinguish from extended ASCII encodings due to the observed zero-bytes. Its drawback is that you need the BOM and it occupies more disk space uncompressed.
UTF-8 has the advantage of being more compact. It is also fully compatible with pure ASCII and favoured by the programming community.
If you have do not need to use extended ASCII in any context, you should encode your data in UTF-8. If you do, use UTF-16LE.
Those who argue that a text that passes an UTF-8 validation is encoded in UTF-8 is right if the whole text is available, but wrong if it is not:
Consider an alphabetical list of swedish names. If I only check the first part of the list and it is Latin-1 (ISO/IEC 8859-1), it will also pass the UTF-8 test.
Then in the end comes "Örjansson" which breaks down into mojibake In fact 'Ö' will be an invalid UTF-8 bit sequence. On the other hand, since all letters used actually fits in one byte when using UTF-16LE, I can be fully confident that it is not UTF-8, and not Latin-1 either.
You should know that in windows the "native" uncidode format is UTF-16 which is used by the W-style functions ( CreateFileW ). With that in mind writing the file should give you a valid UTF-16 text, but editor may not recognize that, to make sure your program works use a text editor where you can specify the encoding by hand ( you know what it needs to be ) in case it doesn't recognize it, for this Notepad++ is a good choice.
As others already mentioned, writing the BOM is very helpful for text editors and ensures that your file will be read correctly.
You can use WideCharToMultiByte to convert the UTF-16 into UTF-8 for even more compatibility.
And why did you use CreateFileW directly and not the FindFirstFileW do you have UNICODE defined in your project? If you do the compiler would resolve CreateFile into CreateFileW for you.
Also, here
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName), &bytesWritten, NULL);
wcslen gives the number of characters which is not the same as the data size for a non ANSI text, it should be something like
wcslen(findFileData.cFileName)*sizeof(wchar_t)
When dealing with UTF-16 files, it is important to write a byte-order mark and to write the data with lengths in bytes not characters. wcslen returns the string length in characters but a character is two bytes when using wide strings. Here's a fixed version. It explicitly calls the wide version of the Win32 APIs so will work whether UNICODE/_UNICODE are defined or not.
#include <iostream>
#include <windows.h>
using namespace std;
int main()
{
WIN32_FIND_DATAW findFileData; // Use the wide version explicitly
DWORD bytesWritten = 0;
HANDLE f;
HANDLE c = CreateFileW(L"subMape.txt", GENERIC_READ | GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD lengthSum = 0;
if(c != INVALID_HANDLE_VALUE) {
cout << "CreateFile() succeeded!\n";
// Write A byte-order mark...make sure length is bytes not characters.
WriteFile(c, L"\uFEFF", sizeof(wchar_t), &bytesWritten, NULL);
lengthSum += bytesWritten;
f = FindFirstFileW(L"C:\\Program Files\\*", &findFileData);
if(f != INVALID_HANDLE_VALUE) {
while(FindNextFileW(f, &findFileData)) {
// Write filename...length in bytes
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName) * sizeof(wchar_t), &bytesWritten, NULL);
// Add the length *after* writing...
lengthSum += bytesWritten;
// Add a carriage return/line feed to make Notepad happy.
WriteFile(c, L"\r\n", sizeof(wchar_t) * 2, &bytesWritten, NULL);
lengthSum += bytesWritten;
}
FindClose(f); // This should be inside findFirstFile succeeded block.
}
else {
cout << "FindFirstFile() failed :(\n";
}
// these should be inside CreateFile succeeded block.
CloseHandle(c);
cout << lengthSum << endl;
}
else {
cout << "CreateFile() failed :(\n";
}
return 0;
}
Related
I'm using this function to add my program to startup. But it doesn't work and I don't know why weird ascii characters and words are showing up in startup applications. What am I doing wrong?
Instead this is being added to starup. U㫅萹㐀蠀渐晁Ɉ U㫆萺㝈耀 U㫆萺㝈耀 and C. Which has no file location and also no details.
HKEY NewVal;
char loggeduser[UNLEN + 1];
std::ostringstream fileinhome;
GetUserNameA(loggeduser, &len);
fileinhome << "C:\\Users\\" << loggeduser << "\\AppData\\Roaming\\snetwork\\snetwork.exe";
std::string fp = fileinhome.str();
const char* file = fp.c_str();
if (RegOpenKey(HKEY_CURRENT_USER, _T("Software\\Microsoft\\Windows\\CurrentVersion\\Run"), &NewVal) != ERROR_SUCCESS)
{
return;
}
if (RegSetValueEx(NewVal, _T("CLI-Social-Network"), 0, REG_SZ, (LPBYTE)file, sizeof(file)) != ERROR_SUCCESS)
{
return;
}
else {
// std::cout << "Program added to Startup.\n";
// Do nothing, Program was added to Startup
}
RegCloseKey(NewVal);
A possibility: You have UNICODE and/or _UNICODE defined, so RegSetValueEx is actually RegSetValueExW. Therefore, this function passes Unicode data into the buffer file. But file is an ASCII buffer, so the otherwise-valid Unicode data is incorrectly parsed as ASCII, leading to the strange output.
To fix, use std::wstring and W functions explicitly.
Unicode considerations aside, you can't use a const char * as a buffer for receiving data. You must allocate sufficient memory for the buffer first.
I need to read file asynchroneously
string read(string path) {
DWORD readenByte;
int t;
char* buffer = new char[512];
HANDLE hEvent = CreateEvent(NULL, FALSE, FALSE, "read");
OVERLAPPED overlap;
overlap.hEvent = hEvent;
HANDLE hFile = CreateFile(path.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(!hFile) {
Debug::error(GetLastError(), "fileAsync.cpp::read - ");
}
t = ReadFile(hFile, buffer, MAX_READ - 1, &readenByte, &overlap);
if(!t) {
Debug::error(GetLastError(), "fileAsync.cpp::read - ");
}
t = WaitForSingleObject(hEvent, 5000);
if(t == WAIT_TIMEOUT) {
Debug::error("fail to read - timeout, fileAsync.cpp::read");
}
buffer[readenByte] = '\0';
string str = buffer;
return str;
}
I've got the error at ReadFile - 38: reached the end of the file
How to read asynchroneusly file in c++ with use of winapi?
There are several bugs in your code that need to be addressed, some cause failure, others catastrophic failure.
The first bug leads to the error code you get: You have an uninitialized OVERLAPPED structure, instructing the following ReadFile call to read from the random file position stored in the Offset and OffsetHigh members. To fix this, initialize the data: OVERLAPPED overlap = {0};.
Next, you aren't opening the file for asynchronous access. To subsequently read asynchronously from a file, you need to call CreateFile passing FILE_FLAG_OVERLAPPED for dwFlagsAndAttributes. If you don't you're off to hunting a bug for months (see What happens if you forget to pass an OVERLAPPED structure on an asynchronous handle?).
The documentation for ReadFile explains, that lpNumberOfBytesRead parameter is not used for asynchronous I/O, and you should pass NULL instead. This should be immediately obvious, since an asynchronous ReadFile call returns, before the number of bytes transferred is known. To get the size of the transferred payload, call GetOverlappedResult once the asynchronous I/O has finished.
The next bug only causes a memory leak. You are dynamically allocating buffer, but never call delete[] buffer;. Either delete the buffer, or allocate a buffer with automatic storage duration (char buffer[MAX_READ] = {0};), or use a C++ container (e.g. std::vector<char> buffer(MAX_READ);).
Another bug is, where you try to construct a std::string from your buffer: The constructor you chose cannot deal with what would be an embedded NUL character. It'll just truncate whatever you have. You'd need to call a std::string constructor taking an explicit length argument. But even then, you may wind up with garbage, if the character encoding of the file and std::string do not agree.
Finally, issuing an asynchronous read, followed by WaitForSingleObject is essentially a synchronous read, and doesn't buy you anything. I'm assuming this is just for testing, and not your final code. Just keep in mind when finishing this up, that the OVERLAPPED structure need to stay alive for as long as the asynchronous read operation is in flight.
Additional recommendations, that do not immediately address bugs:
You are passing a std::string to your read function, that is used in the CreateFile call. Windows uses UTF-16LE encoding throughout, which maps to wchar_t/std::wstring when using Visual Studio (and likely other Windows compilers as well). Passing a std::string/const char* has two immediate drawbacks:
Calling the ANSI API causes character strings to be converted from MBCS to UTF-16 (and vice versa). This both needlessly wastes resources, as well as fails in very subtle ways, as it relies on the current locale.
Not every Unicode code point can be expressed using MBCS encoding. This means, that some files cannot be opened when using MBCS character encoding.
Use the Unicode API (CreateFileW) and UTF-16 character strings (std::wstring/wchar_t) throughout. You can also define the preprocessor symbols UNICODE (for the Windows API) and _UNICODE (for the CRT) at the compiler's command line, to not accidentally call into any ANSI APIs.
You are creating an event object that is only ever accessed through its HANDLE value, not by its name. You can pass NULL as the lpName argument to CreateEvent. This prevents potential name clashes, which is all the more important with a name as generic as "read".
1) You need to include the flag FILE_FLAG_OVERLAPPED in the 6th argument (dwFlagsAndAttributes) of the call to CreateFile. That is why most likely the overlapped read fails.
2) What is the value of MAX_READ? I hope it's less than 513 otherwise if the file is bigger than 512 bytes bad things will happen.
3) ReadFile with the overlapped structure pointer being not NULL will give you the error code 997 (ERROR_IO_PENDING) which is expected and thus you cannot evaluate the t after calling ReadFile.
4) In the case of asynchronous operation the ReadFile function does not store the bytes read in the pointer you pass in the call, you must query the overlapped result yourself after the operation is completed.
Here is a small working snippet, I hope you can build up from that:
#include <Windows.h>
#include <iostream>
#include <sstream>
class COverlappedCompletionEvent : public OVERLAPPED
{
public:
COverlappedCompletionEvent() : m_hEvent(NULL)
{
m_hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
if (m_hEvent == NULL)
{
auto nError = GetLastError();
std::stringstream ErrorStream;
ErrorStream << "CreateEvent() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
ZeroMemory(this, sizeof(OVERLAPPED));
hEvent = m_hEvent;
}
~COverlappedCompletionEvent()
{
if (m_hEvent != NULL)
{
CloseHandle(m_hEvent);
}
}
private:
HANDLE m_hEvent;
};
int main(int argc, char** argv)
{
try
{
if (argc != 2)
{
std::stringstream ErrorStream;
ErrorStream << "usage: " << argv[0] << " <filename>";
throw std::runtime_error(ErrorStream.str());
}
COverlappedCompletionEvent OverlappedCompletionEvent;
char pBuffer[512];
auto hFile = CreateFileA(argv[1], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, NULL);
if (hFile == NULL)
{
auto nError = GetLastError();
std::stringstream ErrorStream;
ErrorStream << "CreateFileA() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
if (ReadFile(hFile, pBuffer, sizeof(pBuffer), nullptr, &OverlappedCompletionEvent) == FALSE)
{
auto nError = GetLastError();
if (nError != ERROR_IO_PENDING)
{
std::stringstream ErrorStream;
ErrorStream << "ReadFile() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
}
::WaitForSingleObject(OverlappedCompletionEvent.hEvent, INFINITE);
DWORD nBytesRead = 0;
if (GetOverlappedResult(hFile, &OverlappedCompletionEvent, &nBytesRead, FALSE))
{
std::cout << "Read " << nBytesRead << " bytes" << std::endl;
}
CloseHandle(hFile);
}
catch (const std::exception& Exception)
{
std::cout << Exception.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
As I'm currently working on a program for a TeamSpeak server, I need to retrieve the names of the currently online users which I'm doing with sockets - that's working fine so far.In my UI I'm displaying all clients in a ListBox which is basically working. Nevertheless I'm having problems with wrong displayed characters and symbols in the ListBox.
I'm using the following code:
//...
auto getClientList() -> void{
i = 0;
queryString.str("");
queryString.clear();
queryString << clientlist << " \n";
send(sock, queryString.str().c_str(), strlen(queryString.str().c_str()), NULL);
TeamSpeak::getAnswer(1);
while(p_1 != -1){
p_1 = lastLog.find(L"client_nickname=", sPos + 1);
if(p_1 != -1){
sPos = p_1;
p_2 = lastLog.find(L" ", p_1);
temporary = lastLog.substr(p_1 + 16, p_2 - (p_1 + 16));
users[i].assign(temporary.begin(), temporary.end());
SendMessage(hwnd_2, LB_ADDSTRING, (WPARAM)NULL, (LPARAM)(LPTSTR)(users[i].c_str()));
i++;
}
else{
sPos = 0;
p_1 = 0;
break;
}
}
TeamSpeak::getAnswer(0);
}
//...
I've already checked lastLog, temporary and users[i] (by writing them to a file), but all of them have no encoding problem with characters or symbols (for example Andrè). If I add a string directly:SendMessage(hwnd_2, LB_ADDSTRING, (WPARAM)NULL, (LPARAM)(LPTSTR)L"Andrè", it is displayed correctly in the ListBox.What might be the issue here, is it a problem with my code or something else?
Update 1:I recently continued working on this problem and considered the word Olè! receiving it from the socket. The result I got, is the following:O (79) | l (108) | � (-61) | � (-88) | ! (33).How can I convert this char array to a wstring containing the correct characters?
Solution: As #isanae mentioned in his post, the std::wstring_convert-template did the trick for me, thank you very much!
Many things can go wrong in this code, and you don't show much of it. What's particularly lacking is the definition of all those variables.
Assuming that users[i] contains meaningful data, you also don't say how it is encoded. Is it ASCII? UTF-8? UTF-16? The fact that you can output it to a file and read it with an editor doesn't mean anything, as most editors are able to guess at encoding.
If it really is UTF-16 (the native encoding on Windows), then I see no reason for this code not to work. One way to check would be to break into the debugger and look at the individual bytes in users[i]. If you see every character with a value less than 128 followed by a 0, then it's probably UTF-16.
If it is not UTF-16, then you'll need to convert it. There are a variety of ways to do this, but MultiByteToWideChar may be the easiest. Make sure you set the codepage to same encoding used by the sender. It may be CP_UTF8, or an actual codepage.
Note also that hardcoding a string with non-ASCII characters doesn't help you much either, as you'd first have to find out the encoding of the file itself. I know some versions of Visual C++ will convert your source file to UTF-16 if it encounters non-ASCII characters, which may be what happened to you.
O (79) | l (108) | � (-61) | � (-88) | ! (33).
How can I convert this char array to a wstring containing the correct characters?
This is a UTF-8 string. It has to be converted to UTF-16 so Windows can use it.
This is a portable, C++11 solution on implementations where sizeof(wchar_t) == 2. If this is not the case, then char16_t and std::u16string may be used, but the most recent version of Visual C++ as of this writing (2015 RC) doesn't implement std::codecvt for char16_t and char32_t.
#include <string>
#include <codecvt>
std::wstring utf8_to_utf16(const std::string& s)
{
static_assert(sizeof(wchar_t)==2, "wchar_t needs to be 2 bytes");
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
return conv.from_bytes(s);
}
std::string utf16_to_utf8(const std::wstring& s)
{
static_assert(sizeof(wchar_t)==2, "wchar_t needs to be 2 bytes");
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> conv;
return conv.to_bytes(s);
}
Windows-only:
#include <string>
#include <cassert>
#include <memory>
#include <codecvt>
#include <Windows.h>
std::wstring utf8_to_utf16(const std::string& s)
{
// getting the required size in characters (not bytes) of the
// output buffer
const int size = ::MultiByteToWideChar(
CP_UTF8, 0, s.c_str(), static_cast<int>(s.size()),
nullptr, 0);
// error handling
assert(size != 0);
// creating a buffer with enough characters in it
std::unique_ptr<wchar_t[]> buffer(new wchar_t[size]);
// converting from utf8 to utf16
const int written = ::MultiByteToWideChar(
CP_UTF8, 0, s.c_str(), static_cast<int>(s.size()),
buffer.get(), size);
// error handling
assert(written != 0);
return std::wstring(buffer.get(), buffer.get() + written);
}
std::string utf16_to_utf8(const std::wstring& ws)
{
// getting the required size in bytes of the output buffer
const int size = ::WideCharToMultiByte(
CP_UTF8, 0, ws.c_str(), static_cast<int>(ws.size()),
nullptr, 0, nullptr, nullptr);
// error handling
assert(size != 0);
// creating a buffer with enough characters in it
std::unique_ptr<char[]> buffer(new char[size]);
// converting from utf16 to utf8
const int written = ::WideCharToMultiByte(
CP_UTF8, 0, ws.c_str(), static_cast<int>(ws.size()),
buffer.get(), size, nullptr, nullptr);
// error handling
assert(written != 0);
return std::string(buffer.get(), buffer.get() + written);
}
Test:
// utf-8 string
const std::string s = {79, 108, -61, -88, 33};
::MessageBoxW(0, utf8_to_utf16(s).c_str(), L"", MB_OK);
I'm trying to write a universal text editor which can open and display ANSI and Unicode in EditControl. Do I need to repeatedly call ReadFile() if I determine that the text is ANSI? Can't figure out how to perform this task. My attempt below does not work, it displays '?' characters in EditControl.
LARGE_INTEGER fSize;
GetFileSizeEx(hFile,&fSize);
int bufferLen = fSize.QuadPart/sizeof(TCHAR)+1;
TCHAR* buffer = new TCHAR[bufferLen];
buffer[0] = _T('\0');
DWORD wasRead = 0;
ReadFile(hFile,buffer,fSize.QuadPart,&wasRead,NULL);
buffer[wasRead/sizeof(TCHAR)] = _T('\0');
if(!IsTextUnicode(buffer,bufferLen,NULL))
{
CHAR* ansiBuffer = new CHAR[bufferLen];
ansiBuffer[0] = '\0';
WideCharToMultiByte(CP_ACP,0,buffer,bufferLen,ansiBuffer,bufferLen,NULL,NULL);
SetWindowTextA(edit,ansiBuffer);
delete[]ansiBuffer;
}
else
SetWindowText(edit,buffer);
CloseHandle(hFile);
delete[]buffer;
There are a few buffer length errors and oddities, but here's your big problem. You call WideCharToMultiByte incorrectly. That is meant to receive UTF-16 encoded text as input. But when IsTextUnicode returns false that means that the buffer is not UTF-16 encoded.
The following is basically what you need:
if(!IsTextUnicode(buffer,bufferLen*sizeof(TCHAR),NULL))
SetWindowTextA(edit,(char*)buffer);
Note that I've fixed the length parameter to IsTextUnicode.
For what it is worth, I think I'd read in to a buffer of char. That would remove the need for the sizeof(TCHAR). In fact I'd stop using TCHAR altogether. This program should be Unicode all the way - TCHAR is what you use when you compile for both NT and 9x variants of Windows. You aren't compiling for 9x anymore I imagine.
So I'd probably code it like this:
char* buffer = new char[filesize+2];//+2 for UTF-16 null terminator
DWORD wasRead = 0;
ReadFile(hFile, buffer, filesize, &wasRead, NULL);
//add error checking for ReadFile, including that wasRead == filesize
buffer[filesize] = '\0';
buffer[filesize+1] = '\0';
if (IsTextUnicode(buffer, filesize, NULL))
SetWindowText(edit, (wchar_t*)buffer);
else
SetWindowTextA(edit, buffer);
delete[] buffer;
Note also that this code makes no allowance for the possibility of receiving UTF-8 encoded text. If you want to handle that you'd need to take your char buffer and send to through MultiByteToWideChar using CP_UTF8.
A requirement for my software is that the encoding of a file which contains exported data shall be UTF8. But when I write the data to the file the encoding is always ANSI. (I use Notepad++ to check this.)
What I'm currently doing is trying to convert the file manually by reading it, converting it to UTF8 and writing the text to a new file.
line is a std::string
inputFile is an std::ifstream
pOutputFile is a FILE*
// ...
if( inputFile.is_open() )
{
while( inputFile.good() )
{
getline(inputFile,line);
//1
DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), -1, NULL, 0 );
wchar_t *pwcharText;
pwcharText = new wchar_t[ dwCount];
//2
MultiByteToWideChar( CP_ACP, 0, line.c_str(), -1, pwcharText, dwCount );
//3
dwCount = WideCharToMultiByte( CP_UTF8, 0, pwcharText, -1, NULL, 0, NULL, NULL );
char *pText;
pText = new char[ dwCount ];
//4
WideCharToMultiByte( CP_UTF8, 0, pwcharText, -1, pText, dwCount, NULL, NULL );
fprintf(pOutputFile,pText);
fprintf(pOutputFile,"\n");
delete[] pwcharText;
delete[] pText;
}
}
// ...
Unfortunately the encoding is still ANSI. I searched a while for a solution but I always encounter the solution via MultiByteToWideChar and WideCharToMultiByte. However, this doesn't seem to work. What am I missing here?
I also looked here on SO for a solution but most UTF8 questions deal with C# and php stuff.
On Windows in VC++2010 it is possible (not yet implemented in GCC, as far as i know) using localization facet std::codecvt_utf8_utf16 (i.e. in C++11). The sample code from cppreference.com has all basic information you would need to read/write UTF-8 file.
std::wstring wFromFile = _T("𤭢teststring");
std::wofstream fileOut("textOut.txt");
fileOut.imbue(std::locale(fileOut.getloc(), new std::codecvt_utf8_utf16<wchar_t>));
fileOut<<wFromFile;
It sets the ANSI encoded file to UTF-8 (checked in Notepad). Hope this is what you need.
On Windows, files don't have encodings. Each application will assume an encoding based on its own rules. The best you can do is put a byte-order mark at the front of the file and hope it's recognized.
AFAIK, fprintf() does character conversions, so there is no guarantee that passing UTF-8 encoded data to it will actually write the UTF-8 to the file. Since you already converted the data yourself, use fwrite() instead so you are writing the UTF-8 data as-is, eg:
DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), NULL, 0 );
if (dwCount == 0) continue;
std::vector<WCHAR> utf16Text(dwCount);
MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), &utf16Text[0], dwCount );
dwCount = WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), NULL, 0, NULL, NULL );
if (dwCount == 0) continue;
std::vector<CHAR> utf8Text(dwCount);
WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), &utf8Text[0], dwCount, NULL, NULL );
fwrite(&utf8Text[0], sizeof(CHAR), dwCount, pOutputFile);
fprintf(pOutputFile, "\n");
The type char has no clue of any encoding, all it can do is store 8 bits. Therefore any text file is just a sequence of bytes and the user must guess the underlying encoding. A file starting with a BOM indicates UTF 8, but using a BOM is not recommended any more. The type wchar_t in contrast is in Windows always interpreted as UTF 16.
So let's say you have a file encoded in UTF 8 with just one line: "Confucius says: Smile. 孔子说:微笑!😊." The following code snippet appends this text once more, then reads the first line and displays it in a MessageBoxW and MessageBoxA. Note that MessageBoxW shows the correct text while MessageBoxA shows some junk because it assumes my local codepage 1252 for the char* string.
Note that I have used the handy CA2W class instead of MultiByteToWideChar. Be careful, the CP_Whatever argument is optional and if omitted the local codepage is used.
#include <iostream>
#include <fstream>
#include <filesystem>
#include <atlbase.h>
int main(int argc, char** argv)
{
std::fstream afile;
std::string line1A = u8"Confucius says: Smile. 孔子说:微笑! 😊";
std::wstring line1W;
afile.open("Test.txt", std::ios::out | std::ios::app);
if (!afile.is_open())
return 0;
afile << "\n" << line1A;
afile.close();
afile.open("Test.txt", std::ios::in);
std::getline(afile, line1A);
line1W = CA2W(line1A.c_str(), CP_UTF8);
MessageBoxW(nullptr, line1W.c_str(), L"Smile", 0);
MessageBoxA(nullptr, line1A.c_str(), "Smile", 0);
afile.close();
return 0;
}