Related
I need to read file asynchroneously
string read(string path) {
DWORD readenByte;
int t;
char* buffer = new char[512];
HANDLE hEvent = CreateEvent(NULL, FALSE, FALSE, "read");
OVERLAPPED overlap;
overlap.hEvent = hEvent;
HANDLE hFile = CreateFile(path.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if(!hFile) {
Debug::error(GetLastError(), "fileAsync.cpp::read - ");
}
t = ReadFile(hFile, buffer, MAX_READ - 1, &readenByte, &overlap);
if(!t) {
Debug::error(GetLastError(), "fileAsync.cpp::read - ");
}
t = WaitForSingleObject(hEvent, 5000);
if(t == WAIT_TIMEOUT) {
Debug::error("fail to read - timeout, fileAsync.cpp::read");
}
buffer[readenByte] = '\0';
string str = buffer;
return str;
}
I've got the error at ReadFile - 38: reached the end of the file
How to read asynchroneusly file in c++ with use of winapi?
There are several bugs in your code that need to be addressed, some cause failure, others catastrophic failure.
The first bug leads to the error code you get: You have an uninitialized OVERLAPPED structure, instructing the following ReadFile call to read from the random file position stored in the Offset and OffsetHigh members. To fix this, initialize the data: OVERLAPPED overlap = {0};.
Next, you aren't opening the file for asynchronous access. To subsequently read asynchronously from a file, you need to call CreateFile passing FILE_FLAG_OVERLAPPED for dwFlagsAndAttributes. If you don't you're off to hunting a bug for months (see What happens if you forget to pass an OVERLAPPED structure on an asynchronous handle?).
The documentation for ReadFile explains, that lpNumberOfBytesRead parameter is not used for asynchronous I/O, and you should pass NULL instead. This should be immediately obvious, since an asynchronous ReadFile call returns, before the number of bytes transferred is known. To get the size of the transferred payload, call GetOverlappedResult once the asynchronous I/O has finished.
The next bug only causes a memory leak. You are dynamically allocating buffer, but never call delete[] buffer;. Either delete the buffer, or allocate a buffer with automatic storage duration (char buffer[MAX_READ] = {0};), or use a C++ container (e.g. std::vector<char> buffer(MAX_READ);).
Another bug is, where you try to construct a std::string from your buffer: The constructor you chose cannot deal with what would be an embedded NUL character. It'll just truncate whatever you have. You'd need to call a std::string constructor taking an explicit length argument. But even then, you may wind up with garbage, if the character encoding of the file and std::string do not agree.
Finally, issuing an asynchronous read, followed by WaitForSingleObject is essentially a synchronous read, and doesn't buy you anything. I'm assuming this is just for testing, and not your final code. Just keep in mind when finishing this up, that the OVERLAPPED structure need to stay alive for as long as the asynchronous read operation is in flight.
Additional recommendations, that do not immediately address bugs:
You are passing a std::string to your read function, that is used in the CreateFile call. Windows uses UTF-16LE encoding throughout, which maps to wchar_t/std::wstring when using Visual Studio (and likely other Windows compilers as well). Passing a std::string/const char* has two immediate drawbacks:
Calling the ANSI API causes character strings to be converted from MBCS to UTF-16 (and vice versa). This both needlessly wastes resources, as well as fails in very subtle ways, as it relies on the current locale.
Not every Unicode code point can be expressed using MBCS encoding. This means, that some files cannot be opened when using MBCS character encoding.
Use the Unicode API (CreateFileW) and UTF-16 character strings (std::wstring/wchar_t) throughout. You can also define the preprocessor symbols UNICODE (for the Windows API) and _UNICODE (for the CRT) at the compiler's command line, to not accidentally call into any ANSI APIs.
You are creating an event object that is only ever accessed through its HANDLE value, not by its name. You can pass NULL as the lpName argument to CreateEvent. This prevents potential name clashes, which is all the more important with a name as generic as "read".
1) You need to include the flag FILE_FLAG_OVERLAPPED in the 6th argument (dwFlagsAndAttributes) of the call to CreateFile. That is why most likely the overlapped read fails.
2) What is the value of MAX_READ? I hope it's less than 513 otherwise if the file is bigger than 512 bytes bad things will happen.
3) ReadFile with the overlapped structure pointer being not NULL will give you the error code 997 (ERROR_IO_PENDING) which is expected and thus you cannot evaluate the t after calling ReadFile.
4) In the case of asynchronous operation the ReadFile function does not store the bytes read in the pointer you pass in the call, you must query the overlapped result yourself after the operation is completed.
Here is a small working snippet, I hope you can build up from that:
#include <Windows.h>
#include <iostream>
#include <sstream>
class COverlappedCompletionEvent : public OVERLAPPED
{
public:
COverlappedCompletionEvent() : m_hEvent(NULL)
{
m_hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
if (m_hEvent == NULL)
{
auto nError = GetLastError();
std::stringstream ErrorStream;
ErrorStream << "CreateEvent() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
ZeroMemory(this, sizeof(OVERLAPPED));
hEvent = m_hEvent;
}
~COverlappedCompletionEvent()
{
if (m_hEvent != NULL)
{
CloseHandle(m_hEvent);
}
}
private:
HANDLE m_hEvent;
};
int main(int argc, char** argv)
{
try
{
if (argc != 2)
{
std::stringstream ErrorStream;
ErrorStream << "usage: " << argv[0] << " <filename>";
throw std::runtime_error(ErrorStream.str());
}
COverlappedCompletionEvent OverlappedCompletionEvent;
char pBuffer[512];
auto hFile = CreateFileA(argv[1], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL | FILE_FLAG_OVERLAPPED, NULL);
if (hFile == NULL)
{
auto nError = GetLastError();
std::stringstream ErrorStream;
ErrorStream << "CreateFileA() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
if (ReadFile(hFile, pBuffer, sizeof(pBuffer), nullptr, &OverlappedCompletionEvent) == FALSE)
{
auto nError = GetLastError();
if (nError != ERROR_IO_PENDING)
{
std::stringstream ErrorStream;
ErrorStream << "ReadFile() failed with " << nError;
throw std::runtime_error(ErrorStream.str());
}
}
::WaitForSingleObject(OverlappedCompletionEvent.hEvent, INFINITE);
DWORD nBytesRead = 0;
if (GetOverlappedResult(hFile, &OverlappedCompletionEvent, &nBytesRead, FALSE))
{
std::cout << "Read " << nBytesRead << " bytes" << std::endl;
}
CloseHandle(hFile);
}
catch (const std::exception& Exception)
{
std::cout << Exception.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
In some code I use the Win32 RegGetValue() API to read a string from the registry.
I call the aforementioned API twice:
The purpose of the first call is to get the proper size to allocate a destination buffer for the string.
The second call reads the string from the registry into that buffer.
What is odd is that I found that RegGetValue() returns different size values between the two calls.
In particular, the size value returned in the second call is two bytes (equivalent to one wchar_t) less than the first call.
It's worth noting that the size value compatible with the actual string length is the value returned by the second call (this corresponds to the actual string length, including the terminating NUL).
But I don't understand why the first call returns a size two bytes (one wchar_t) bigger than that.
A screenshot of program output and Win32 C++ compilable repro code are attached.
Repro Source Code
#include <windows.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void PrintSize(const char* const message, const DWORD sizeBytes)
{
cout << message << ": " << sizeBytes << " bytes ("
<< (sizeBytes/sizeof(wchar_t)) << " wchar_t's)\n";
}
int main()
{
const HKEY key = HKEY_LOCAL_MACHINE;
const wchar_t* const subKey = L"SOFTWARE\\Microsoft\\Windows\\CurrentVersion";
const wchar_t* const valueName = L"CommonFilesDir";
//
// Get string size
//
DWORD keyType = 0;
DWORD dataSize = 0;
const DWORD flags = RRF_RT_REG_SZ;
LONG result = ::RegGetValue(
key,
subKey,
valueName,
flags,
&keyType,
nullptr,
&dataSize);
if (result != ERROR_SUCCESS)
{
cout << "Error: " << result << '\n';
return 1;
}
PrintSize("1st call size", dataSize);
const DWORD dataSize1 = dataSize; // store for later use
//
// Allocate buffer and read string into it
//
vector<wchar_t> buffer(dataSize / sizeof(wchar_t));
result = ::RegGetValue(
key,
subKey,
valueName,
flags,
nullptr,
&buffer[0],
&dataSize);
if (result != ERROR_SUCCESS)
{
cout << "Error: " << result << '\n';
return 1;
}
PrintSize("2nd call size", dataSize);
const wstring text(buffer.data());
cout << "Read string:\n";
wcout << text << '\n';
wcout << wstring(dataSize/sizeof(wchar_t), L'*') << " <-- 2nd call size\n";
wcout << wstring(dataSize1/sizeof(wchar_t), L'-') << " <-- 1st call size\n";
}
Operating System: Windows 7 64-bit with SP1
EDIT
Some confusion seems to be arisen by the particular registry key I happened to read in the sample repro code.
So, let me clarify that I read that key from the registry just as a test. This is not production code, and I'm not interested in that particular key. Feel free to add a simple test key to the registry with some test string value.
Sorry for the confusion.
RegGetValue() is safer than RegQueryValueEx() because it artificially adds a null terminator to the output of a string value if it does not already have a null terminator.
The first call returns the data size plus room for an extra null terminator in case the actual data is not already null terminated. I suspect RegGetValue() does not look at the real data at this stage, it just does an unconditional data size + sizeof(wchar_t) to be safe.
(36 * sizeof(wchar_t)) + (1 * sizeof(wchar_t)) = 74
The second call returns the real size of the actual data that was read. That size would include the extra null terminator only if one had to be artificially added. In this case, your data has 35 characters in the path, and a real null terminator present (which well-behaved apps are supposed to do), thus the extra null terminator did not need to be added.
((35+1) * sizeof(wchar_t)) + (0 * sizeof(wchar_t)) = 72
Now, with that said, you really should not be reading from the Registry directly to get the CommonFilesDir path (or any other system path) in the first place. You should be using SHGetFolderPath(CSIDL_PROGRAM_FILES_COMMON) or SHGetKnownFolderPath(FOLDERID_ProgramFilesCommon) instead. Let the Shell deal with the Registry for you. This is consistent across Windows versions, as Registry settings are subject to be moved around from one version to another, as well as accounting for per-user paths vs system-global paths. These are the main reasons why the CSIDL API was introduced in the first place.
I have this task:
1. In current directory create file subMape.dat
2. Write into it all names of folders, that stored in C:\Program Files folder
3. Display on the screen data, that was written in subMape.dat
#include <iostream>
#include <windows.h>
using namespace std;
int main() {
WIN32_FIND_DATA findFileData;
DWORD bytesWritten = 0;
HANDLE f;
HANDLE c = CreateFileW(L"subMape.txt", GENERIC_READ | GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
//TCHAR lpBuffer[32];
DWORD nNumberOfBytesToRead = 32;
//DWORD lpNumberOfBytesRead;
DWORD lengthSum = 0;
if (c) {
cout << "CreateFile() succeeded!\n";
if(f = FindFirstFile(L"C:\\Program Files\\*", &findFileData)){
if(f != INVALID_HANDLE_VALUE) {
while (FindNextFile(f, &findFileData)){
lengthSum += bytesWritten;
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName), &bytesWritten, NULL);
}
}
FindClose(f);
}
else {
cout << "FindFirstFile() failed :(\n";
}
}
else {
cout << "CreateFile() failed :(\n";
}
cout << lengthSum << endl;
//SetFilePointer(c, lengthSum, NULL, FILE_BEGIN);
//ReadFile(c, lpBuffer, lengthSum, &lpNumberOfBytesRead, NULL);
//wprintf(lpBuffer);
CloseHandle(c);
return 0;
}
I'm using UNICODE, when it writes findFileData.cFileName - it writes string, where characters splitted with spaces. For example: folder name "New Folder" (strlen = 10) will be written into the file as "N e w T o" (strlen = 10). What do?
Your text file viewer or editor just isn't smart enough to figure out that you've written a utf-16 encoded text file. Most text editors need help, write the BOM to the file:
cout << "CreateFile() succeeded!\n";
wchar_t bom = L'\xfeff';
WriteFile(c, &bom, sizeof(bom), &bytesWritten, NULL);
You need to use something like WideCharToMultiByte() to convert the UNICODE string to ANSI (or UTF8).
The reason you see "space" is that the program you are using to list the file treats it as one byte per character. When using Unicode in windows you will get two, and the second byte is a '\0'.
You need to choose how you want to encode the data in the file.
The easiest you can do is to use UTF-16LE, since this is the native encoding on Windows. Then you only need to prepend a byte order marker to the beginning of the file. This encoding has an advantage over UTF-8 since it is easy to destinguish from extended ASCII encodings due to the observed zero-bytes. Its drawback is that you need the BOM and it occupies more disk space uncompressed.
UTF-8 has the advantage of being more compact. It is also fully compatible with pure ASCII and favoured by the programming community.
If you have do not need to use extended ASCII in any context, you should encode your data in UTF-8. If you do, use UTF-16LE.
Those who argue that a text that passes an UTF-8 validation is encoded in UTF-8 is right if the whole text is available, but wrong if it is not:
Consider an alphabetical list of swedish names. If I only check the first part of the list and it is Latin-1 (ISO/IEC 8859-1), it will also pass the UTF-8 test.
Then in the end comes "Örjansson" which breaks down into mojibake In fact 'Ö' will be an invalid UTF-8 bit sequence. On the other hand, since all letters used actually fits in one byte when using UTF-16LE, I can be fully confident that it is not UTF-8, and not Latin-1 either.
You should know that in windows the "native" uncidode format is UTF-16 which is used by the W-style functions ( CreateFileW ). With that in mind writing the file should give you a valid UTF-16 text, but editor may not recognize that, to make sure your program works use a text editor where you can specify the encoding by hand ( you know what it needs to be ) in case it doesn't recognize it, for this Notepad++ is a good choice.
As others already mentioned, writing the BOM is very helpful for text editors and ensures that your file will be read correctly.
You can use WideCharToMultiByte to convert the UTF-16 into UTF-8 for even more compatibility.
And why did you use CreateFileW directly and not the FindFirstFileW do you have UNICODE defined in your project? If you do the compiler would resolve CreateFile into CreateFileW for you.
Also, here
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName), &bytesWritten, NULL);
wcslen gives the number of characters which is not the same as the data size for a non ANSI text, it should be something like
wcslen(findFileData.cFileName)*sizeof(wchar_t)
When dealing with UTF-16 files, it is important to write a byte-order mark and to write the data with lengths in bytes not characters. wcslen returns the string length in characters but a character is two bytes when using wide strings. Here's a fixed version. It explicitly calls the wide version of the Win32 APIs so will work whether UNICODE/_UNICODE are defined or not.
#include <iostream>
#include <windows.h>
using namespace std;
int main()
{
WIN32_FIND_DATAW findFileData; // Use the wide version explicitly
DWORD bytesWritten = 0;
HANDLE f;
HANDLE c = CreateFileW(L"subMape.txt", GENERIC_READ | GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD lengthSum = 0;
if(c != INVALID_HANDLE_VALUE) {
cout << "CreateFile() succeeded!\n";
// Write A byte-order mark...make sure length is bytes not characters.
WriteFile(c, L"\uFEFF", sizeof(wchar_t), &bytesWritten, NULL);
lengthSum += bytesWritten;
f = FindFirstFileW(L"C:\\Program Files\\*", &findFileData);
if(f != INVALID_HANDLE_VALUE) {
while(FindNextFileW(f, &findFileData)) {
// Write filename...length in bytes
WriteFile(c, findFileData.cFileName, (DWORD)wcslen(findFileData.cFileName) * sizeof(wchar_t), &bytesWritten, NULL);
// Add the length *after* writing...
lengthSum += bytesWritten;
// Add a carriage return/line feed to make Notepad happy.
WriteFile(c, L"\r\n", sizeof(wchar_t) * 2, &bytesWritten, NULL);
lengthSum += bytesWritten;
}
FindClose(f); // This should be inside findFirstFile succeeded block.
}
else {
cout << "FindFirstFile() failed :(\n";
}
// these should be inside CreateFile succeeded block.
CloseHandle(c);
cout << lengthSum << endl;
}
else {
cout << "CreateFile() failed :(\n";
}
return 0;
}
Having trouble with the after effects sdk.
Basically I'm looping through all of the footage project items and trying to get the footage path from them. Here's the code I have inside of the loop.
AEGP_ItemType itemType = NULL;
ERR(suites.ItemSuite6()->AEGP_GetNextProjItem(projH, itemH, &itemH));
if (itemH == NULL) {
break;
}
ERR(suites.ItemSuite6()->AEGP_GetItemType(itemH, &itemType));
if (itemType == AEGP_ItemType_FOOTAGE) {
numFootage++;
AEGP_FootageH footageH;
ERR(suites.FootageSuite5()->AEGP_GetMainFootageFromItem(itemH, &footageH));
A_char newItemName[AEGP_MAX_ITEM_NAME_SIZE] = {""};
wchar_t footagePath[AEGP_MAX_PATH_SIZE];
ERR(suites.ItemSuite6()->AEGP_GetItemName(itemH, newItemName));
AEGP_MemHandle pathH = NULL;
ERR(suites.FootageSuite5()->AEGP_GetFootagePath(footageH, 0, AEGP_FOOTAGE_MAIN_FILE_INDEX, &pathH));
ERR(suites.MemorySuite1()->AEGP_LockMemHandle(pathH, reinterpret_cast<void**>(&footagePath)));
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
const std::string utf8_string = converter.to_bytes(footagePath);
std::ofstream tempFile;
tempFile.open ("C:\\temp\\log1.txt");
tempFile << utf8_string;
tempFile.close();
ERR(suites.MemorySuite1()->AEGP_UnlockMemHandle(pathH));
ERR(suites.MemorySuite1()->AEGP_FreeMemHandle(pathH));
}
I'm getting the footagePath
I then convert the UTF-16 (wchar_t) pointer to a UTF-8 string
Then I write that UTF-8 string to a temp file and it always outputs the following.
펐㛻
Can I please have some guidance on this? Thanks!
I was able to figure out the answer.
http://forums.adobe.com/message/5112560#5112560
This is what was wrong.
It was because the executing code was in a loop and I wasn't allocating strings with the new operator.
This was the line that needed a new on it.
wchar_t footagePath[AEGP_MAX_PATH_SIZE];
Another piece of information that would have been useful to know is that not ALL footage items have paths.
If the don't have a path it will return empty string.
This is the code I ended up with.
if (itemType == AEGP_ItemType_FOOTAGE) {
A_char* newItemName = new A_char[AEGP_MAX_ITEM_NAME_SIZE];
ERR(suites.ItemSuite6()->AEGP_GetItemName(newItemH, newItemName));
AEGP_MemHandle nameH = NULL;
AEGP_FootageH footageH = NULL;
char* footagePathStr = new char[AEGP_MAX_PATH_SIZE];
ERR(suites.FootageSuite5()->AEGP_GetMainFootageFromItem(newItemH, &footageH));
if (footageH) {
suites.FootageSuite5()->AEGP_ GetFootagePath(footageH, 0, AEGP_FOOTAGE_MAIN_FILE_INDEX, &nameH);
if(nameH) {
tries++;
AEGP_MemSize size = 0;
A_UTF16Char *nameP = NULL;
suites.MemorySuite1()->AEGP_GetMemHandleSize(nameH, &size);
suites.MemorySuite1()->AEGP_LockMemHandle(nameH, (void **)&nameP);
std::wstring test = L"HELLO";
std::string output;
int len = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)nameP, -1, NULL, 0, NULL, NULL);
if (len > 1) {
footagePathStr = new char[len];
int len2 = WideCharToMultiByte(CP_OEMCP, 0, (LPCWSTR)nameP, -1, footagePathStr, len, NULL, NULL);
ERR(suites.MemorySuite1()->AEGP_UnlockMemHandle(nameH));
suites.MemorySuite1()->AEGP_FreeMemHandle(nameH);
}
}
}
}
You've already had the data as smarter std::wstring, why did you convert it to byte array and then force it as simple std::string? In general, you should avoid converting strings via byte arrays. My knowledge on C++ STDLIB is a few years off now, but the problem may be in that the std::string class may simply still not have any UTF8 support.
Do you really need to store it as utf8? If it is just for logging, try using ofwstream (the wide one), remove the conversion and the 'string' completely, and just write the 'wstring' directly to the stream instead.
Also, it is completely possible that everything went correctly, and it is just your FILE VIEWER that goes rabid. Examine your log file with hexeditor and check if the beginning of the file contains the Unicode format markers like 0xFFFE etc:
if it has some, and you wrote data in not-identical encoding as the markers indicate, then that's the problem
if it has none, then try adding correct markers. Maybe your file-viewer simply did not notice it is unicode-of-that-type and misread the file. Unicode markers help the readers to decode data properly.
How can I compare a wstring, such as L"Hello", to a string? If I need to have the same type, how can I convert them into the same type?
Since you asked, here's my standard conversion functions from string to wide string, implemented using C++ std::string and std::wstring classes.
First off, make sure to start your program with set_locale:
#include <clocale>
int main()
{
std::setlocale(LC_CTYPE, ""); // before any string operations
}
Now for the functions. First off, getting a wide string from a narrow string:
#include <string>
#include <vector>
#include <cassert>
#include <cstdlib>
#include <cwchar>
#include <cerrno>
// Dummy overload
std::wstring get_wstring(const std::wstring & s)
{
return s;
}
// Real worker
std::wstring get_wstring(const std::string & s)
{
const char * cs = s.c_str();
const size_t wn = std::mbsrtowcs(NULL, &cs, 0, NULL);
if (wn == size_t(-1))
{
std::cout << "Error in mbsrtowcs(): " << errno << std::endl;
return L"";
}
std::vector<wchar_t> buf(wn + 1);
const size_t wn_again = std::mbsrtowcs(buf.data(), &cs, wn + 1, NULL);
if (wn_again == size_t(-1))
{
std::cout << "Error in mbsrtowcs(): " << errno << std::endl;
return L"";
}
assert(cs == NULL); // successful conversion
return std::wstring(buf.data(), wn);
}
And going back, making a narrow string from a wide string. I call the narrow string "locale string", because it is in a platform-dependent encoding depending on the current locale:
// Dummy
std::string get_locale_string(const std::string & s)
{
return s;
}
// Real worker
std::string get_locale_string(const std::wstring & s)
{
const wchar_t * cs = s.c_str();
const size_t wn = std::wcsrtombs(NULL, &cs, 0, NULL);
if (wn == size_t(-1))
{
std::cout << "Error in wcsrtombs(): " << errno << std::endl;
return "";
}
std::vector<char> buf(wn + 1);
const size_t wn_again = std::wcsrtombs(buf.data(), &cs, wn + 1, NULL);
if (wn_again == size_t(-1))
{
std::cout << "Error in wcsrtombs(): " << errno << std::endl;
return "";
}
assert(cs == NULL); // successful conversion
return std::string(buf.data(), wn);
}
Some notes:
If you don't have std::vector::data(), you can say &buf[0] instead.
I've found that the r-style conversion functions mbsrtowcs and wcsrtombs don't work properly on Windows. There, you can use the mbstowcs and wcstombs instead: mbstowcs(buf.data(), cs, wn + 1);, wcstombs(buf.data(), cs, wn + 1);
In response to your question, if you want to compare two strings, you can convert both of them to wide string and then compare those. If you are reading a file from disk which has a known encoding, you should use iconv() to convert the file from your known encoding to WCHAR and then compare with the wide string.
Beware, though, that complex Unicode text may have multiple different representations as code point sequences which you may want to consider equal. If that is a possibility, you need to use a higher-level Unicode processing library (such as ICU) and normalize your strings to some common, comparable form.
You should convert the char string to a wchar_t string using mbstowcs, and then compare the resulting strings. Notice that mbstowcs works on char */wchar *, so you'll probably need to do something like this:
std::wstring StringToWstring(const std::string & source)
{
std::wstring target(source.size()+1, L' ');
std::size_t newLength=std::mbstowcs(&target[0], source.c_str(), target.size());
target.resize(newLength);
return target;
}
I'm not entirely sure that that usage of &target[0] is entirely standard-conforming, if someone has a good answer to that please tell me in the comments. Also, there's an implicit assumption that the converted string won't be longer (in number of wchar_ts) than the number of chars of the original string - a logical assumption that still I'm not sure it's covered by the standard.
On the other hand, it seems that there's no way to ask to mbstowcs the size of the needed buffer, so either you go this way, or go with (better done and better defined) code from Unicode libraries (be it Windows APIs or libraries like iconv).
Still, keep in mind that comparing Unicode strings without using special functions is slippery ground, two equivalent strings may be evaluated different when compared bitwise.
Long story short: this should work, and I think it's the maximum you can do with just the standard library, but it's a lot implementation-dependent in how Unicode is handled, and I wouldn't trust it a lot. In general, it's just better to stick with an encoding inside your application and avoid this kind of conversions unless absolutely necessary, and, if you are working with definite encodings, use APIs that are less implementation-dependent.
Think twice before doing this — you might not want to compare them in the first place. If you are sure you do and you are using Windows, then convert string to wstring with MultiByteToWideChar, then compare with CompareStringEx.
If you are not using Windows, then the analogous functions are mbstowcs and wcscmp. The standard wide character C++ functions are often not portable under Windows; for instance mbstowcs is deprecated.
The cross-platform way to work with Unicode is to use the ICU library.
Take care to use special functions for Unicode string comparison, don't do it manually. Two Unicode strings could have different characters, yet still be the same.
wstring ConvertToUnicode(const string & str)
{
UINT codePage = CP_ACP;
DWORD flags = 0;
int resultSize = MultiByteToWideChar
( codePage // CodePage
, flags // dwFlags
, str.c_str() // lpMultiByteStr
, str.length() // cbMultiByte
, NULL // lpWideCharStr
, 0 // cchWideChar
);
vector<wchar_t> result(resultSize + 1);
MultiByteToWideChar
( codePage // CodePage
, flags // dwFlags
, str.c_str() // lpMultiByteStr
, str.length() // cbMultiByte
, &result[0] // lpWideCharStr
, resultSize // cchWideChar
);
return &result[0];
}