printf works differently for unicode and multibyte - c++

I have a service which writes in a named pipe, it writes a DWORD, Its is then read by another process(both running in unicode)
When I try to receive the DWORD and convert it to a displayable string (TCHAR,char,wchar_t etc) and print it using printf in command prompt, I get uneven results with newline
HANDLE hOutput = GetStdHandle(STD_OUTPUT_HANDLE);
TCHAR szBuffer[SIZEOF_BUFFER];
DWORD dwRead;
for (;;)
{
if (!ReadFile(hRemoteOutPipe, szBuffer, SIZEOF_BUFFER, &dwRead, NULL) ||
dwRead == 0)
{
DWORD dwErr = GetLastError();
if (dwErr == ERROR_NO_DATA)
break;
}
szBuffer[dwRead / sizeof(TCHAR)] = _T('\0');
// Send it to our stdout
printf("%s",szBuffer);
fflush(stdout);
}
CloseHandle(hRemoteOutPipe);
hRemoteOutPipe = INVALID_HANDLE_VALUE;
::ExitThread(0);
the printf works fine for multibyte, but doesnt work fine for unicode, Kindly help me out

First off: TCHAR was an idea that made sense in 1995, not so in 2022. You get these weird errors. Secondly, you're tagging as C++, but you're not using std::cout. That is really the root cause of the problem: printf doesn't understand TCHAR, you'd need to use _tprintf.

TCHAR is a preprocessor macro that maps to either wchar_t or char depending on whether UNICODE is defined or not, respectively.
The %s placeholder expects a char* string in printf(), whereas it expects a wchar_t* string in wprintf().
Since you are using TCHAR strings, the output will not be what you expect when TCHAR is wchar_t, as that would be a type-mismatch in printf(). You would need to use _tprintf() in <tchar.h> instead, which is a preprocessor macro that maps to either wprintf() or printf() depending on UNICODE.

Related

Is there a way to get an application's path to add it to the registry automatically and run alongside Windows startup in C++?

I'm developing an application and would like to know if there's a way to get it's executable path automatically and run alongside Windows startup by adding it to the registry.
This is my function so far:
void Open(){
HKEY hKey;
WCHAR path[MAX_PATH]; //to store the directory
DWORD size = GetModuleFileNameW(NULL, path, MAX_PATH);
const char* StartName = "MyApplication";
LONG lnRes = RegOpenKeyEx( HKEY_CURRENT_USER,
"SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run",
0 , KEY_WRITE,
&hKey);
if( ERROR_SUCCESS == lnRes )
{
lnRes = RegSetValueEx( hKey,
StartName,
0,
REG_SZ,
(LPBYTE)path,
size );
}
RegCloseKey(hKey);
}
I'm using GetModuleFileName to get the path, but it returns me the path with a single backslash and in the registry it only recognizes the "D" drive. For example: D:\Usuario\Desktop\log\mariobros.exe
https://prnt.sc/vondsi (Here's a print from my registry)
I suspect that the problem is that for the code to be recognized as a single backslash it needs to have a double backslash. This is how I think it should've need to be: D:\\Usuario\\Desktop\\log\\mariobros.exe
Does anyone know what could I do here?
Thanks in advance.
You are clearly compiling with UNICODE undefined in your project, which means RegOpenKeyEx() and RegSetValueEx() are actually calling the ANSI functions RegOpenKeyExA() and RegSetValueExA(), respectively (as evident by you being able to pass char* strings to them without compiler errors).
But, you are retrieving the file path as a Unicode UTF-16 string and passing it as-is to RegSetValueExA(), so you end up with embedded nul characters written to the Registry when RegSetValueExA() misinterprets your UTF-16 string as an ANSI string and re-encodes each of its bytes individually to Unicode characters. Unicode characters in the ASCII range have nul bytes in them.
Since you are using a Unicode function to retrieve the file path, and because the Registry internally stores strings in Unicode form only, you should use the Registry's Unicode functions to match that same encoding.
Also, note that the return value of GetModuleFileName(A|W) does not include the null terminator in the output string's length, but RegSetValueEx(A|W) expects the cbSize parameter to include enough bytes for a null terminator for REG_(EXPAND_|MULTI_)SZ value types.
Try this:
void Open()
{
WCHAR path[MAX_PATH]; //to store the directory
DWORD size = GetModuleFileNameW(NULL, path, MAX_PATH);
if ((size > 0) && (size < MAX_PATH))
{
HKEY hKey;
LONG lnRes = RegOpenKeyExW(HKEY_CURRENT_USER,
L"SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run",
0, KEY_SET_VALUE,
&hKey);
if( ERROR_SUCCESS == lnRes )
{
lnRes = RegSetValueExW(hKey,
L"MyApplication",
0,
REG_SZ,
(LPBYTE)path,
(size + 1) * sizeof(WCHAR) );
RegCloseKey(hKey);
}
}
}
This looks like you are passing a wide string, while promising that it is a narrow string (evident by your C-style cast).
The second byte in a wide string is 0, and this terminates your narrow string.
Suggestion: use wide strings only while dealing with Win API.

Correct extraction of int from an edit window in WinAPI for logic testing

I have an edit box which has attribute ES_NUMBER and on a button press am attempting to check if the value of the edit box is between 2 and 15 (inclusive).
Having checked on StackOverflow, I found the strong recommendation to use strtol() as opposed to atoi(), however neither have successfully allowed me to perform the necessary check. Please see below the current code.
char buff[1024];
GetWindowText(hWndNoOfTeams, (LPWSTR)buff, 1024);
int i;
i = strtol(buff,NULL,10);
if ((i > 1)&&(i < 16)){
MessageBox(hWnd, (LPCWSTR)buff, L"MSGBOX", MB_OK);
}else{
MessageBox(hWnd, L"The number of teams must be greater than 1 and less than 16.", L"MSGBOX", MB_OK);
};
The test works correctly between 0 and 9, however, beyond that it always presents the second message box. I have a suspicion that the issue lies in the method of extraction of the integer from the string, as for all values the buff array contains the correct value.
Apologies if I have missed something that ought to be glaringly obvious.
GetWindowText(hWndNoOfTeams, (LPWSTR)buff, 1024);
Never cast anything to LPWSTR or LPSTR or anything related to that. The compiler is telling you that GetWindowText() is expecting a wide character string, which is an array of WCHARs, not an array of chars. The wide character APIs are loaded by default as all new Windows programs should be Unicode-aware.
There are separate conversion routines for wide strings, such as wcstol().
Look up the UNICODE and _UNICODE macros and Unicode handling on Windows for more information.
You are using the Unicode version of the Win32 API functions, so you need to use a Unicode character buffer and the Unicode version of strtol() to match:
WCHAR buff[1024] = {0};
GetWindowText(hWndNoOfTeams, buff, 1024);
int i = wcstol(buff, NULL, 10);
if ((i > 1) && (i < 16)) {
MessageBox(hWnd, buff, L"MSGBOX", MB_OK);
} else {
MessageBox(hWnd, L"The number of teams must be greater than 1 and less than 16.", L"MSGBOX", MB_OK);
};

Why obtained MachineGuid looks not alike a GUID but like Korean?

I created a simple function:
std::wstring GetRegKey(const std::string& location, const std::string& name){
const int valueLength = 10240;
auto platformFlag = KEY_WOW64_64KEY;
HKEY key;
TCHAR value[valueLength];
DWORD bufLen = valueLength*sizeof(TCHAR);
long ret;
ret = RegOpenKeyExA(HKEY_LOCAL_MACHINE, location.c_str(), 0, KEY_READ | platformFlag, &key);
if( ret != ERROR_SUCCESS ){
return std::wstring();
}
ret = RegQueryValueExA(key, name.c_str(), NULL, NULL, (LPBYTE) value, &bufLen);
RegCloseKey(key);
if ( (ret != ERROR_SUCCESS) || (bufLen > valueLength*sizeof(TCHAR)) ){
return std::wstring();
}
std::wstring stringValue(value, (size_t)bufLen - 1);
size_t i = stringValue.length();
while( i > 0 && stringValue[i-1] == '\0' ){
--i;
}
return stringValue;
}
And I call it like auto result = GetRegKey("SOFTWARE\\Microsoft\\Cryptography", "MachineGuid");
yet string looks like
㤴ㄷ㤵戰㌭㉣ⴱ㔴㍥㤭慣ⴹ㍥摢㘵〴㉡ㄵ\0009ca9-e3bd5640a251
not like RegEdit
4971590b-3c21-45e3-9ca9-e3bd5640a251
So I wonder what shall be done to get a correct representation of MachineGuid in C++?
RegQueryValueExA is an ANSI wrapper around the Unicode version since Windows NT. When building on a Unicode version of Windows, it not only converts the the lpValueName to a LPCWSTR, but it will also convert the lpData retrieved from the registry to an LPWSTR before returning.
MSDN has the following to say:
If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, and
the ANSI version of this function is used (either by explicitly
calling RegQueryValueExA or by not defining UNICODE before including
the Windows.h file), this function converts the stored Unicode string
to an ANSI string before copying it to the buffer pointed to by
lpData.
Your problem is that you are populating the lpData, which holds TCHARs (WCHAR on Unicode versions of Windows) with an ANSI string.
The garbled string that you see is a result of 2 ANSI chars being used to populate a single wchar_t. That explains the Asian characters. The portion that looks like the end of the GUID is because the print function blew past the terminating null since it was only one byte and began printing what is probably a portion of the buffer that was used by RegQueryValueExA before converting to ANSI.
To solve the problem, either stick entirely to Unicode, or to ANSI (if you are brave enough to continue using ANSI in the year 2014), or be very careful about your conversions. I would change GetRegKey to accept wstrings and use RegQueryValueExW instead, but that is a matter of preference and what sort of code you plan on using this in.
(Also, I would recommend you have someone review this code since there are a number of oddities in the error checking, and a hard coded buffer size.)

C++ string encoding UTF8 / unicode

I am trying to be able to send character "Т" (not a normal capital t, unicode decimal value 1058) from C++ to VB
However, with this method below Message is returned to VB and it appears as "Т", which is the above character encoded in ANSI.
#if defined(_MSC_VER) && _MSC_VER > 1310
# define utf8(str) ConvertToUTF8(L##str)
const char * ConvertToUTF8(const wchar_t * pStr) {
static char szBuf[1024];
WideCharToMultiByte(CP_UTF8, 0, pStr, -1, szBuf, sizeof(szBuf), NULL, NULL);
return szBuf;
}
#else
# define utf8(str) str
#endif
BSTR _stdcall chatTest()
{
BSTR Message;
CString temp("temp test");
temp+=utf8("\u0422");
int len = temp.GetLength();
Message = SysAllocStringByteLen ((LPCTSTR)temp, len+1 );
return Message;
}
If I just do temp+=("\u0422"); without the utf8 function. It sends the data as "?" and its actually a question mark (sometimes unicode characters show up as question marks in VB, but still have the correct unicode decimal value.. this is not the case here... it changes it to a question mark.
In VB if I output the String variable that has data from Message when it is "Т" to a text file it appears as the "Т".
So as far as I can tell its in UTF8 in C++, then somehow gets converted to ANSI in VB (or before its sent?), and then when outputted to a file its changed back to UTF8?
I just need to keep the "Т" intact when sending from C++ to VB. I know VB strings can hold that character because from another source within VB I am able to store it (it appears as a "?", but has the proper unicode decimal value).
Any help is greatly appreciated.
Thanks
A BSTR is not UTF-8, it's UTF-16 which is what you get with the L"" prefix. Take out the UTF-8 conversion and use CStringW. And use LPCWSTR instead of LPCTSTR.

MultiByteToWideChar or WideCharToMultiByte and txt files

I'm trying to write a universal text editor which can open and display ANSI and Unicode in EditControl. Do I need to repeatedly call ReadFile() if I determine that the text is ANSI? Can't figure out how to perform this task. My attempt below does not work, it displays '?' characters in EditControl.
LARGE_INTEGER fSize;
GetFileSizeEx(hFile,&fSize);
int bufferLen = fSize.QuadPart/sizeof(TCHAR)+1;
TCHAR* buffer = new TCHAR[bufferLen];
buffer[0] = _T('\0');
DWORD wasRead = 0;
ReadFile(hFile,buffer,fSize.QuadPart,&wasRead,NULL);
buffer[wasRead/sizeof(TCHAR)] = _T('\0');
if(!IsTextUnicode(buffer,bufferLen,NULL))
{
CHAR* ansiBuffer = new CHAR[bufferLen];
ansiBuffer[0] = '\0';
WideCharToMultiByte(CP_ACP,0,buffer,bufferLen,ansiBuffer,bufferLen,NULL,NULL);
SetWindowTextA(edit,ansiBuffer);
delete[]ansiBuffer;
}
else
SetWindowText(edit,buffer);
CloseHandle(hFile);
delete[]buffer;
There are a few buffer length errors and oddities, but here's your big problem. You call WideCharToMultiByte incorrectly. That is meant to receive UTF-16 encoded text as input. But when IsTextUnicode returns false that means that the buffer is not UTF-16 encoded.
The following is basically what you need:
if(!IsTextUnicode(buffer,bufferLen*sizeof(TCHAR),NULL))
SetWindowTextA(edit,(char*)buffer);
Note that I've fixed the length parameter to IsTextUnicode.
For what it is worth, I think I'd read in to a buffer of char. That would remove the need for the sizeof(TCHAR). In fact I'd stop using TCHAR altogether. This program should be Unicode all the way - TCHAR is what you use when you compile for both NT and 9x variants of Windows. You aren't compiling for 9x anymore I imagine.
So I'd probably code it like this:
char* buffer = new char[filesize+2];//+2 for UTF-16 null terminator
DWORD wasRead = 0;
ReadFile(hFile, buffer, filesize, &wasRead, NULL);
//add error checking for ReadFile, including that wasRead == filesize
buffer[filesize] = '\0';
buffer[filesize+1] = '\0';
if (IsTextUnicode(buffer, filesize, NULL))
SetWindowText(edit, (wchar_t*)buffer);
else
SetWindowTextA(edit, buffer);
delete[] buffer;
Note also that this code makes no allowance for the possibility of receiving UTF-8 encoded text. If you want to handle that you'd need to take your char buffer and send to through MultiByteToWideChar using CP_UTF8.