In C++, I am trying to display the codepoint of a wchar_t retrieved from std::wcin in MessageBoxW().
My source file is encoded in UTF-8.
If I declare my wchar_t in the source of my program, and give it an initial value, I get the display of the Unicode character and its codepoint in MessageBoxW().
However, if I retrieve the wchar_t from std::wcin, the Unicode character entered is not interpreted correctly.
Can you tell me what my error is?
I compile my code with MinGW GCC version 6.3 32-bit.
Do I need to use a particular C++ option, or C++ version?
Here is the code that works:
#include <Windows.h>
#include <stdio.h>
int main()
{
wchar_t c = L'−';
wchar_t *c1 = &c;
wchar_t buff[1024];
swprintf(buff, L"The code point of %c is %d.", c1, c);
MessageBoxW(NULL, buff, L"", MB_OK);
}
Here is the code that interprets an erroneous character, but it does not give any error when compiling:
#include <Windows.h>
#include <iostream>
#include <stdio.h>
int main()
{
wchar_t c;
std::wcout << "Enter a wchar";
std::wcin >> c;
wchar_t *c1 = &c;
wchar_t buff[1024];
swprintf(buff, L"The code point of %c is %d.", c1, c);
MessageBoxW(NULL, buff, L"", MB_OK);
}
Finally, I got it!
Many thanks to #RemyLebeau and #n.m for their help.
I only needed the last part of the code given by #RemyLebeau.
Here is the code that works here very well with any typed character.
P.S. It's missing the verification of the size of the string entered by the user, he must enter only one character.
Any idea of correction or improvement would be much appreciated.
#include <stdio.h>
#include <Windows.h>
#include <iostream>
int main()
{
std::wcout << "Enter a wchar";
std::wstring s;
wchar_t buffer[4] = {};
DWORD numRead = 0;
if (ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buffer, 4, &numRead, NULL))
{
s.append(buffer, numRead);
}
wchar_t buff[1024];
const wchar_t* c = s.c_str();
swprintf(buff, L"The codepoint of %s is %u.", c, *c);
MessageBoxW(NULL, buff, L"", MB_OK);
}
Best regards.
Related
I Know similar questions have been asked before but none of them Helped in my case.
Basically I want dstPath = %AppData% + "CURRENT EXE NAME"
but problem is with different string types and string concantation
SIMPLIFIED CODE :-
#include <stdio.h>
#include <string>
#include <filesystem>
#include <Shlwapi.h>
#include <Windows.h>
using namespace std;
int main()
{
TCHAR selfPath[MAX_PATH];
TCHAR dstPath[MAX_PATH];
if (GetModuleFileName(NULL, selfPath, MAX_PATH) == 0) // Getting exe File Location
printf("Error : %ul\n", GetLastError());
filesystem::path p(selfPath);
dstPath = strcat(getenv("APPDATA"), p.filename().string().c_str()); // Here Comes The Error
printf("Src : %s\n", selfPath);
printf("Dst : %s\n", dstPath);
return 0;
}
COMPILER COMMAND :-
g++ -Os -s -o ./builds/gcc-rat-x64.exe ./source/rat.cpp -std=c++17 -m64 -lshlwapi
COMPILER ERROR :-
error: incompatible types in assignment of 'char*' to 'TCHAR [260]' {aka 'char [260]'}
80 | dstPath = strcat(getenv("APPDATA"), p.filename().string().c_str());
You cannot assign to arrays. You should use strcpy() to copy C-style strings.
strcpy(dstPath, getenv("APPDATA"));
strcat(dstPath, p.filename().string().c_str());
Or the concatination can be done in one line via snprintf():
snprintf(dstPath, sizeof(dstPath), "%s%s", getenv("APPDATA"), p.filename().string().c_str());
Finally, TCHAR and GetModuleFileName can refer to UNICODE version of the API, according to the compilation option. Using ANSI version (char and GetModuleFileNameA) explicitly is safer to work with std::string and other APIs that require strings consists of char.
You are trying to use strcat to concatenate two strings and store the result in another one, but it does not work that way. The call strcat (str1, str2) adds the content of str2 at the end of str1. It also returns a pointer to str1 but I don't normally use it.
What you are trying to do should be done in three steps:
Make sure that dstPath contains an empty string
Concatenate to dstPath the value of the environment variable APPDATA
Concatenate to dstPath the value of filename
Something like this:
dstPath[0] = '\0';
strcat(dstPath, getenv("APPDATA"));
strcat(dstPath, p.filename().string().c_str());
You should also add checks not to overflow dstPath...
First off, you are mixing TCHAR and char APIs in a way you should not be. You really should not be using TCHAR at all in modern code. But, if you are going to use TCHAR, then at least use TCHAR- based functions/macros, like _tprintf() instead of printf(), _tcscat() instead of strcat(), etc.
The compiler error is because you are trying to assign the char* pointer returned by strcat() to your dstPath TCHAR[] array. You can't assign a pointer to an array like that. You should strcpy() the result of getenv() into dstPath first, and then strcat() your filename onto the end of it, eg:
#include <string>
#include <filesystem>
#include <Windows.h>
#include <Shlwapi.h>
#include <stdio.h>
#include <tchar.h>
TCHAR* _tgetenv(const TCHAR *varname)
{
#ifdef _UNICODE
return _wgetenv(varname);
#else
return getenv(varname);
#endif
}
std::basic_string<TCHAR> path2TStr(const std::filesystem::path &p)
{
#ifdef _UNICODE
return p.wstring();
#else
return p.string();
#endif
}
int main()
{
TCHAR selfPath[MAX_PATH];
TCHAR dstPath[MAX_PATH];
if (GetModuleFileName(NULL, selfPath, MAX_PATH) == 0) // Getting exe File Location
{
printf("Error : %ul\n", GetLastError());
return 0;
}
std::filesystem::path p(selfPath);
_tcscpy(dstPath, _tgetenv(_T("APPDATA")));
_tcscat(dstPath, path2TStr(p.filename()).c_str());
_tprintf(_T("Src : %s\n"), selfPath);
_tprintf(_T("Dst : %s\n"), dstPath);
return 0;
}
However, you really should be using SHGetFolderPath(CSIDL_APPDATA) or SHGetKnownFolderPath(FOLDERID_RoamingAppData) instead of using getenv("APPDATA").
And since you are using the <filesystem> library anyway, you really should just use std::filesystem::path for all of your path handling. It has operator/= and operator/ to concatenate path segments, and an operator<< for printing paths to a std::ostream, like std::cout. Don't use strcat() for concatenating path segments, it won't handle directory separators correctly, at least.
Try this instead:
#include <iostream>
#include <string>
#include <filesystem>
#include <stdexcept>
#include <Windows.h>
#include <Shlobj.h>
std::filesystem::path getSelfPath()
{
WCHAR wPath[MAX_PATH] = {};
if (!GetModuleFileNameW(NULL, wPath, MAX_PATH)) // Getting exe File Location
{
DWORD err = GetLastError();
throw std::runtime_error("Error : " << std::to_string(err));
}
return wPath;
}
std::filesystem::path getAppDataPath()
{
WCHAR wPath[MAX_PATH] = {};
HRESULT hRes = SHGetFolderPathW(NULL, CSIDL_APPDATA, NULL, SHGFP_TYPE_CURRENT, wPath); // Getting APPDATA Folder Location
if (hRes != S_OK)
throw std::runtime_error("Error : " << std::to_string(hRes));
return wPath;
}
int main()
{
try
{
auto selfPath = getSelfPath();
auto dstPath = getAppDataPath() / selfPath.filename();
std::cout << "Src : " << selfPath << "\n";
std::cout << "Dst : " << dstPath << "\n";
}
catch (const std::exception &e)
{
std::cerr << e.what() << "\n";
}
return 0;
}
#include <iostream>
#include <Windows.h>
#include <locale>
#include <string>
#include <codecvt>
typedef wchar_t* LPWSTR, *PWSTR;
template <typename Facet>
struct deletable_facet : Facet
{
using Facet::Facet;
};
int main(int argc, char *argv[])
{
std::cout << argv[0] << std::endl;
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
//std::wcout << converter.from_bytes(argv[0]) << std::endl; // range error
std::wstring_convert<deletable_facet<std::codecvt<wchar_t, char, std::mbstate_t>>> conv;
std::wstring ns = conv.from_bytes(argv[0]);
std::wcout << ns << std::endl;
wchar_t filename[MAX_PATH];
//GetModuleFileName(NULL,filename,MAX_PATH); // cant convert wstring_t* to char*
GetModuleFileNameW(NULL,filename,MAX_PATH);
std::wcout << filename << std::endl;
getchar();
return 0;
}
Output:
C:\Users\luka\Desktop\ⁿ?icΣ\unicode.exe
C:\Users\luka\Desktop\ⁿ?icΣ\unicode.exe
C:\Users\luka\Desktop\ⁿ
Actual name of the folder is üлicä
Ive been trying many many different ways for about 2 hours now, and as far as ive seen people suggested GetModuleFileName , but as you can see that returns a conversion error (typedef wchar_t* LPWSTR, *PWSTR; isnt fixing it).
So is there any way to to get the current folder path in unicode , and get the rest of the input arguments to unicode (non-latin characters)
The usage for GetModuleFileName is correct. You should see the expected result with MessageBoxW(0, filename, 0, 0);
The problem is in printing L"üлicä" on Windows console.
Try printing "üлicä" on the console:
int main(int argc, char *argv[])
{
DWORD count;
std::wstring str = GetCommandLineW() + (std::wstring)L"\n";
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), str.size(), &count, 0);
MessageBoxW(0, str.c_str(), 0, 0);
wchar_t filename[MAX_PATH];
GetModuleFileNameW(0, filename, MAX_PATH);
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), filename, wcslen(filename), &count, 0);
return 0;
}
In Visual Studio you can also use _setmode to enable usage of std::wcout/std::wcin
You also have optional entry point wmain(int argc, wchar_t *argv[]) which provides argv in UTF16 encoding.
The main entry point provides argv in ANSI encoding (not UTF8 encoding). ANSI can loose information, unlike Unicode.
This probably is related not to the program but the console, I suggest you try to output into a file and check if the encoding is correct.
You can do that using freopen:
int main(int argc, char *argv[]){
freopen("output-file-name.txt", "w", stdout);
/*rest of code*/
}
If problem persists, try using visual studio along with _setmode(..., _O_U16TEXT) just before using wcout as described here: https://stackoverflow.com/a/9051543/9541897
Here's an example that works with Windows. You'll have to find the right compiler/linker settings to support wmain on MinGW, but it will work. _setmode enables writing Unicode directly to the terminal, and should work as long as the font supports the characters. In my example I use some Chinese, which my font supports:
#include <Windows.h>
#include <iostream>
#include "fcntl.h"
#include "io.h"
int wmain(int argc, wchar_t* argv[])
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << argv[0] << std::endl;
wchar_t filename[MAX_PATH];
GetModuleFileNameW(NULL,filename,MAX_PATH);
std::wcout << filename << std::endl;
return 0;
}
Output:
马克.exe
C:\üлicä\马克.exe
Why are you typedefing LPWSTR and PWSTR manually? windows.h already handles that for you.
In any case, as #n.m. said in comments, the arguments for main() are NOT encoded in UTF-8 on Windows, so converting non-ASCII characters using a UTF8->UTF16 converter will not produce the correct output. Use the Win32 MultiByteToWideChar() function instead to convert the arguments, using CP_ACP as the codepage to convert from. Or, use wmain() instead, which provides arguments as wchar_t* instead of as char*.
That will get you the data you want. Then, you just need to deal with the issue of Unicode output to the console. As other answers point out, the Windows console does not support UTF-16 output via std::wcout by default, so you have to jump through some additional hoops to make it work correctly (there are many other questions on StackOverflow about that issue).
First off, I googled the hell out of this then searched the forums. In my ignorance of how the TEXT() function operates I cannot find an efficient way to search for an answer.
I writing a piece of code to search for a file that relies on inputting the directory you want to search. However, when I pass anything but a literal value to the function, like displayContent(_TEXT("c:\")), the software does not execute properly. It does not search for anything. Inserting breakpoints doesn't tell much, as the software closes anyways.
I would like to pass a variable to the the displayContent function by placing TEXT(variable) inside its argument like displayContent(_TEXT(*ptrDir)) but that is not compiling. Furthermore, when I simply place ptrDir inside of the argument of displayContent the software compiles but does not execute properly, as it asks for the directory to search but does not actually search it.
What's happening here? There has to be a way to pass a variable to displayContent that includes a string that's recieved from the user.
#include "stdafx.h"
#include <iostream>
#include <windows.h>
#include <tchar.h>
#include "Strsafe.h"
#include <string>
using namespace std;
typedef wchar_t WCHAR;
#define CONST const
typedef CONST WCHAR* LPCWSTR;
int displayContent(LPCWSTR lpszPath, int level = 0) {
wcout << lpszPath << endl;
getchar();
getchar();
WIN32_FIND_DATA ptrFileData;
HANDLE hFile = NULL;
BOOL bGetNext = TRUE;
wchar_t lpszNewPath[MAX_PATH];
if (lstrlen(lpszPath) > MAX_PATH)
return -1;
StringCchCopy(lpszNewPath, MAX_PATH, lpszPath);
StringCchCat(lpszNewPath, MAX_PATH, _TEXT("*.*"));
hFile = FindFirstFile(lpszNewPath, &ptrFileData);
while (bGetNext)
{
for (int i = 0; i < level; i++)
wcout << "-";
if (ptrFileData.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY
&& lstrlen(ptrFileData.cFileName) > 2)
{
wchar_t lpszFirstTimePath[MAX_PATH];
StringCchCopy(lpszFirstTimePath, MAX_PATH, lpszPath);
StringCchCat(lpszFirstTimePath, MAX_PATH, ptrFileData.cFileName);
StringCchCat(lpszFirstTimePath, MAX_PATH, _TEXT("\\"));
wcout << ">" << ptrFileData.cFileName << endl;
displayContent(lpszFirstTimePath, level + 2);
}
else
{
wcout << ">" << ptrFileData.cFileName << endl;
}
bGetNext = FindNextFile(hFile, &ptrFileData);
}
FindClose(hFile);
return 0;
}
int main(int argc, char* argv[])
{
WCHAR directory;
LPCWSTR ptrDir;
ptrDir = &directory;
cout << "Enter directory you wish to search: " << endl;
//cin >> directory;
directory = 'c:\\' ;
ptrDir = &directory;
displayContent(_TEXT(*ptrDir));
getchar();
getchar();
return 0;
}
The _TEXT (and equivalently, _T) macro is strictly for literals (string literals or character literals). It expands to L for a Unicode build, and to nothing for a narrow-character build. So, for a string like (say) "hello", you'll get L"hello" for a Unicode build and "hello" for a narrow-character build. This gives you a wide literal in a Unicode build and a narrow literal otherwise.
If you have a string in a variable, you can convert between wide and narrow characters with the MultiByteToWideChar and WideCharToMultibyte functions.
In this case, doing a conversion on the contents of a variable isn't really needed though. After eliminating some unnecessary complexity, and using a few standard library types where they make sense, I end up with code something like this:
#include <iostream>
#include <tchar.h>
#include <string>
#define UNICODE
#include <windows.h>
int displayContent(std::wstring const &path, int level = 0) {
WIN32_FIND_DATA FileData;
if (path.length() > MAX_PATH)
return -1;
std::wstring new_path = path + L"\\*.*";
HANDLE hFile = FindFirstFile(new_path.c_str(), &FileData);
do {
if ((FileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) && (FileData.cFileName[0] == L'.'))
continue;
std::wcout << std::wstring(level, L'-') << L">" << FileData.cFileName << L"\n";
if (FileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
displayContent(path + L"\\" + FileData.cFileName, level + 2);
} while (FindNextFile(hFile, &FileData));
FindClose(hFile);
return 0;
}
int main(int argc, char* argv[]) {
wchar_t current_dir[MAX_PATH];
GetCurrentDirectory(sizeof(current_dir), current_dir);
displayContent(current_dir);
return 0;
}
[Note: I've also changed it to start from the current directory instead of always starting at the root of the C drive, but if you want to change it back, that's pretty trivial--in fact, it simplifies the code a bit more).
I'm reading a UTF-8 encoded unicode text file, and outputting it into the console, but the displayed characters are not the same as in the text editor i used to create the file. Here is my code :
#define UNICODE
#include <windows.h>
#include <iostream>
#include <fstream>
#include <string>
#include "pugixml.hpp"
using std::ifstream;
using std::ios;
using std::string;
using std::wstring;
int main( int argc, char * argv[] )
{
ifstream oFile;
try
{
string sContent;
oFile.open ( "../config-sample.xml", ios::in );
if( oFile.is_open() )
{
wchar_t wsBuffer[128];
while( oFile.good() )
{
oFile >> sContent;
mbstowcs( wsBuffer, sContent.c_str(), sizeof( wsBuffer ) );
//wprintf( wsBuffer );// Same result as wcout.
wcout << wsBuffer;
}
Sleep(100000);
}
else
{
throw L"Failed to open file";
}
}
catch( const wchar_t * pwsMsg )
{
::MessageBox( NULL, pwsMsg, L"Error", MB_OK | MB_TOPMOST | MB_SETFOREGROUND );
}
if( oFile.is_open() )
{
oFile.close();
}
return 0;
}
There must be something i don't get about encoding.
The problem is that a mbstowcs doesn't actually use UTF-8. It uses an older style of "multibyte codepoints", which is not compatible with UTF-8 (although technically is is possible [I believe] to define a UTF-8 codepage, there is no such thing in Windows).
If you want to convert UTF-8 to UTF-16, you can use MultiByteToWideChar, with a codepage of CP_UTF8.
Wide strings don't mean UTF-8. In fact, it's quite the opposite: UTF-8 means Unicode Transformation Format (8 bits); it's a way to represent Unicode over 8-bit characters, so your normal chars. You should read it into normal strings (not wide strings).
Wide strings use wchar_t, which on Windows is 16 bits. The OS uses UTF-16 for its "wide" functions.
On Windows, UTF-8 strings can be converted to UTF-16 using MultiByteToWideChar.
I made a C++ char_t container that hold up to 6 8-bit char_t storing it in a std::vector. Converting it to and from wchar_t or appending it to a std::string.
Check it out here:
View UTF-8_String structures on Github
#include "UTF-8_String.h" //header from github link above
iBS::u8str raw_v;
iBS::readu8file("TestUTF-8File.txt",raw_v);
std::cout<<raw_v.str()<<std::endl;
Here is functions that converts wchar_t to a uint32_t in the u8char struct fond in header above.
#include <cwchar>
u8char& operator=(wchar_t& wc)
{
char temp[6];
std::mbstate_t state ;
int ret = std::wcrtomb((&temp[0]), wc, &state);
ref.resize(ret);
for (short i=0; i<ret; ++i)
ref[i]=temp[i];
return *this;
};
I find wifstream works very good, even in visual studio debugger shows UTF-8 words correctly (I'm reading traditional chinese words), from this post:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
// usage
std::wstring wstr2;
wstr2 = readFile("C:\\yourUtf8File.txt");
wcout << wstr2;
I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.
I use find() function for it:
wcin >> str;
wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
L'ф' is a Cyrillic letter.
But find() in same call always returns npos. In a case with Latin letters find() works correctly.
It is a problem of this function?
Or I incorrectly do something?
UPD
I use MinGW and save source in UTF-8.
I also set locale with setlocale(LC_ALL, "");.
Code same wcout << L'ф'; works coorectly.
But same
wchar_t w;
wcin >> w;
wcout << w;
works incorrectly.
It is strange. Earlier I had no problems with the encoding, using setlocale ().
The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:
std::wcout << std::hex << L"ф";
Before C++11, you could use non-ASCII characters in source code by using their hex values:
"\x05" "five"
C++11 adds the ability to specify their Unicode value, which in your case would be
L"\u03A6"
If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:
const char* phi_utf8 = "\u03A6";
const char16_t* phi_utf16 = u"\u03A6";
const char32_t* phi_utf16 = U"\u03A6";
You must set the encoding of the console.
This works:
#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>
using namespace std;
int main()
{
_setmode(_fileno(stdout), _O_U16TEXT);
_setmode(_fileno(stdin), _O_U16TEXT);
wstring str;
wcin >> str;
wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
system("pause");
return 0;
}
std::wstring::find() works fine. But you have to read the input string correctly.
The following code runs fine on Windows console (the input Unicode string is read using ReadConsoleW() Win32 API):
#include <exception>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <windows.h>
using namespace std;
class Win32Error : public runtime_error
{
public:
Win32Error(const char* message, DWORD error)
: runtime_error(message)
, m_error(error)
{}
DWORD Error() const
{
return m_error;
}
private:
DWORD m_error;
};
void ThrowLastWin32(const char* message)
{
const DWORD error = GetLastError();
throw Win32Error(message, error);
}
void Test()
{
const HANDLE hStdIn = GetStdHandle(STD_INPUT_HANDLE);
if (hStdIn == INVALID_HANDLE_VALUE)
ThrowLastWin32("GetStdHandle failed.");
static const int kBufferLen = 200;
wchar_t buffer[kBufferLen];
DWORD numRead = 0;
if (! ReadConsoleW(hStdIn, buffer, kBufferLen, &numRead, nullptr))
ThrowLastWin32("ReadConsoleW failed.");
const wstring str(buffer, numRead - 2);
static const wchar_t kEf = 0x0444;
wcout << ((str.find(kEf) != wstring::npos) ? L"EXIST" : L"NONE");
}
int main()
{
static const int kExitOk = 0;
static const int kExitError = 1;
try
{
Test();
return kExitOk;
}
catch(const Win32Error& e)
{
cerr << "\n*** ERROR: " << e.what() << '\n';
cerr << " (GetLastError returned " << e.Error() << ")\n";
return kExitError;
}
catch(const exception& e)
{
cerr << "\n*** ERROR: " << e.what() << '\n';
return kExitError;
}
}
Output:
C:\TEMP>test.exe
abc
NONE
C:\TEMP>test.exe
abcфabc
EXIST
That's probably an encoding issue. wcin works with an encoding different from your compiler's/source code's. Try entering the ф in the console/wcin -- it will work. Try printing the ф via wcout -- it will show a different character or no character at all.
There is no platform independent way to circumvent this, but if you are on windows, you can manually change the console encoding, either with the chchp commandline command or programmatically with SetConsoleCP() (input) and SetConsoleOutputCP() (output).
You could also change your source file's/compiler's encoding. How this is done depends on your editor/compiler. If you are using MSVC, this answer might help you: https://stackoverflow.com/a/1660901/2128694