C++ char conversion

C++ char conversion - c++

I googled it for 2 hours now, and i can't find an answer for my problem: i need to get a registry REG_SZ value and pass it to a char*.
char host_val[1024];
DWORD hostVal_size = 1024;
char* hostName;
DWORD dwType = REG_SZ;
RegOpenKeyEx(//no problem here);
if( RegQueryValueEx( hKey, TEXT("HostName"), 0, &dwType, (LPBYTE)&host_val, &hostVal_size ) == ERROR_SUCCESS )
{
//hostName = host_val;
}
How should i do this conversion hostName = host_val?

The resulting host_val is a possibly non-null-terminated string (see "Remarks"), so you should copy it to a newly allocated string with memcpy, and ensure it's null-terminated:
hostName = new char[hostVal_size + 1];
// host_val may or may not be null-terminated
memcpy(hostName, host_val, hostVal_size);
hostName[hostVal_size] = '\0';
You will need to delete[] the hostName later.

use the ANSI version of the function
RegQueryValueExA
that way you don't need to convert.

If you're compiling with Unicode you're copying a Unicode string (that is possibly NOT terminated) into a narrow char buffer. the first character in the unicode string will be 0x3100 (accounting for the endianness on your machine, which is likely little-endian, and the fact that you said the IP address is 192....)
That value stuffed into the char[] array will report back as a single-char-null-terminated string. You have two options.
Use RegQueryValueExA, everything else stays the same, or
Change your char[] array to a wchar_t[] array, do what you're currently doing, then convert to narrow using WideCharToMultiByte(docs are in the SDK).
For obvious reasons, I'd take the former of those two options.

Related

Why GetServiceDisplayNameW() and GetServiceDisplayNameA() returns different required buffer sizes in character?

Here is a sample code (only sample code to understand it easily, no error handling, no close handles, and so on):
SC_HANDLE hSCManager = ::OpenSCManager(nullptr, nullptr, 0);
DWORD buffSize = 0;
::GetServiceDisplayName(hSCManager, m_serviceName, nullptr, &buffSize);
LPTSTR buff = new TCHAR[++buffSize];
VERIFY(::GetServiceDisplayName(hSCManager, m_serviceName, buff, &buffSize));
My sample service has the display name of "notepad starter" (15 characters).
Switching between build configuations, GetServiceDisplayName() returns a buffer size of 30 under ANSI (GetServiceDisplayNameA), and 15 under UNICODE (GetServiceDisplayNameW).
Documentation for this API says it returns the buffer size in characters excluding the null terminator (not well documented, but I'm expecting the buffer size to include the null terminator in the second call).
Why is it returning different buffer sizes in different build configurations?

at first GetServiceDisplayName take handle to the service control manager database ( hSCManager) as first parameter, but not handle to service (hService) - so you not need open service for this task. and you not need SC_MANAGER_ALL_ACCESS here, but 0 is enough.
however your main error in next. you allocate buffer new TCHAR[buffSize + 1] - so buffSize + 1 in character - and this is correct because GetServiceDisplayName return size of the service's display name, excluding the null-terminating character - so we need extra one character space for terminating 0;
but in next line error - &buffSize - last parameter lpcchBuffer must containing size of the buffer in characters. so exactly buffer size which you allocated. but you allocate buffSize + 1 space, not buffSize. so code must be next:
if (SC_HANDLE hSCManager = OpenSCManagerW(nullptr, nullptr, 0))
{
DWORD cch = 0;
if (!GetServiceDisplayNameW(hSCManager, m_serviceName, nullptr, &cch))
{
if (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
{
PWSTR buff =(PWSTR)alloca(++cch*sizeof(WCHAR));
if (GetServiceDisplayNameW(hSCManager, m_serviceName, buff, &cch))
{
DbgPrint("%S\n", buff);
}
}
}
CloseServiceHandle(hSCManager);
}
so you in your code must replace buffSize + 1 to ++buffSize
about ansi version - GetServiceDisplayNameA - here really error in api implementation - if buffer size in characters not big enough - it return how many bytes require unicode service name excluding the null-terminating symbol. if buffer is big enough it at all not update lpcchBuffer. this yet else one argument never use A versions of api, but always W

I think the correct answer came after 6 month (i saw it yet after 3 years) from Raymond Chen;
Why is it reporting a required buffer size larger than what it
actually needs?
Because character set conversion is hard.
When you call the GetServiceDisplayNameA function (ANSI version),
it forwards the call to GetServiceDisplayNameW function (Unicode
version). If the Unicode version says, “Sorry, that buffer is too
small; it needs to be big enough to hold N Unicode characters,” the
ANSI version doesn’t know how many ANSI characters that translates to.
A single Unicode character could expand to as many as two ANSI
characters in the case where the ANSI code page is DBCS. The
GetServiceDisplayNameA function plays it safe and takes the
worst-case scenario that the service display name consists completely
of Unicode characters which require two ANSI characters to represent.
That’s why it over-reports the buffer size.
devblogs.microsoft.com/oldnewthing/20180606-00/?p=98925

Set Registry Value to a Wide Character String (WCHAR) in C++

I'm trying to add a wide character string to registry in C++. The problem is that the RegSetValueEx() function does not support wide chars, it only supports BYTE type (BYTE = unsigned char).
WCHAR myPath[] = "C:\\éâäà\\éâäà.exe"
RegSetValueExA(HKEY_CURRENT_USER, "MyProgram", 0, REG_SZ, myPath, sizeof(myPath)); // error: cannot convert argument 5 from WCHAR* to BYTE*
And please don't tell me I should convert WCHAR to BYTE because characters such as é and â can't be stored as 8 bit characters.
I'm sure this is possible because I tried opening regedit and adding a new key with value C:\\éâäà\\éâäà.exe and it worked. I wonder how other programs can add themselves to startup on a Russian or Chinese computer.
Is there another way to do so? Or is there a way to format wide character path using wildcards?
Edit: The Unicode version of the function RegSetValueExW() only changes the type of the second argument.

You are calling RegSetValueExA() when you should be calling RegSetValueExW() instead. But in either case, RegSetValueEx() writes bytes, not characters, that is why the lpData parameter is declared as BYTE*. Simply type-cast your character array. The REG_SZ value in the dwType parameter will let RegSetValueEx() know that the bytes represent a Unicode string. And make sure to include the null terminator in the value that you pass to the cbData parameter, per the documentation:
cbSize [in]
The size of the information pointed to by the lpData parameter, in bytes. If the data is of type REG_SZ, REG_EXPAND_SZ, or REG_MULTI_SZ, cbData must include the size of the terminating null character or characters.
For example:
WCHAR myPath[] = L"C:\\éâäà\\éâäà.exe";
RegSetValueExW(HKEY_CURRENT_USER, L"MyProgram", 0, REG_SZ, (LPBYTE)myPath, sizeof(myPath));
Or:
LPCWSTR myPath = L"C:\\éâäà\\éâäà.exe";
RegSetValueExW(HKEY_CURRENT_USER, L"MyProgram", 0, REG_SZ, (LPCBYTE)myPath, (lstrlenW(myPath) + 1) * sizeof(WCHAR));
That being said, you should not be writing values to the root of HKEY_CURRENT_USER itself. You should be writing to a subkey instead, eg:
WCHAR myPath[] = L"C:\\éâäà\\éâäà.exe";
if (RegCreateKeyEx(HKEY_CURRENT_USER, L"Software\\MyProgram", 0, NULL, REG_OPTION_NON_VOLATILE, KEY_SET_VALUE, NULL, &hKey, NULL) == 0)
{
RegSetValueExW(hKey, L"MyValue", 0, REG_SZ, (LPBYTE)myPath, sizeof(myPath));
RegCloseKey(hKey);
}

It seems to me you're trying to use the narrow/non-wide-char version of that function, which will only support ASCII. How about trying RegSetValueExW? Maybe you should also look up how the Windows API tries to supports ASCII and UNICODE as transparently as possible.

Edit: The Unicode version of the function RegSetValueExW() only changes the type of the second argument.
No it does not.
REG_SZ: A null-terminated string. This will be either a Unicode or an ANSI string, depending on whether you use the Unicode or ANSI functions.
From here:
https://learn.microsoft.com/en-us/windows/win32/sysinfo/registry-value-types

How to check the contents of a LPTSTR string?

I'm trying to understand why a segmentation fault (SIGSEGV) occurs during the execution of this piece of code. This error occurs when testing the condition specified in the while instruction, but it does not occur at the first iteration, but at the second iteration.
LPTSTR arrayStr[STR_COUNT];
LPTSTR inputStr;
LPTSTR str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, (char*)&inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, (char*)str);
arrayStr[i] = str;
str = str + strlen((char*)str) + 1;
i++;
}
After reading this answer, I have done some research on the internet and found this article, so I tried to modify the above code, using this piece of code read in this article (see below). However, this change did not solve the problem.
for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {
... do something with pszz ...
}
As assumed in this answer, it seems that the code expects double null terminated arrays of string. Therefore, I wonder how I could check the contents of the inputStr string, in order to check if it actually contains only one null terminator char.
NOTE: the number of characters in the string printed from printf instruction is twice the value returned by the lstrlen(str) function call at the first iteration.

OK, now that you've included the rest of the code it is clear that it is indeed meant to parse a set of consecutive strings. The problem is that you're mixing narrow and wide string types. All you need to do to fix it is change the variable definitions (and remove the casts):
char *arrayStr[STR_COUNT];
char *inputStr;
char *str;
// calls a function from external library
// in order to set the inputStr string
set_input_str(param1, &inputStr, param3);
str = inputStr;
while( *str != '\0' )
{
if( debug )
printf("String[%d]: %s\n", i, str);
arrayStr[i] = str;
str = str + strlen(str) + 1;
i++;
}
Specifically, the issue was occurring on this line:
while( *str != '\0' )
since you hadn't cast str to char * the comparison was looking for a wide nul rather than a narrow nul.

str = str + strlen(str) + 1;
You go out of bounds, change to
str = str + 1;
or simply:
str++;

Of course you are inconsistently using TSTR and strlen, the latter assuming TCHAR = char
In any case, strlen returns the length of the string, which is the number of characters it contains not including the nul character.
Your arithmetic is out by one but you know you have to add one to the length of the string when you allocate the buffer.
Here however you are starting at position 0 and adding the length which means you are at position len which is the length of the string. Now the string runs from offset 0 to offset len - 1 and offset len holds the null character. Offset len + 1 is out of bounds.
Sometimes you might get away with reading it, if there is extra padding, but it is undefined behaviour and here you got a segfault.

This looks to me like code that expects double null terminated arrays of strings. I suspect that you are passing a single null terminated string.
So you are using something like this:
const char* inputStr = "blah";
but the code expects two null terminators. Such as:
const char* inputStr = "blah\0";
or perhaps an input value with multiple strings:
const char* inputStr = "foo\0bar\0";
Note that these final two strings are indeed double null terminated. Although only one null terminator is written explicitly at the end of the string, the compiler adds another one implicitly.
Your question edit throws a new spanner in the works? The cast in
strlen((char*)str)
is massively dubious. If you need to cast then the cast must be wrong. One wonders what LPTSTR expands to for you. Presumably it expands to wchar_t* since you added that cast to make the code compile. And if so, then the cast does no good. You are lying to the compiler (str is not char*) and lying to the compiler never ends well.

The reason for the segmentation fault is already given by Alter's answer. However, I'd like to add that the usual style of parsing a C-style string is more elegant and less verbose
while (char ch = *str++)
{
// other instructions
// ...
}
The scope of ch is only within in the body of the loop.
Aside: Either tag the question as C or C++ but not both, they're different languages.

Why obtained MachineGuid looks not alike a GUID but like Korean?

I created a simple function:
std::wstring GetRegKey(const std::string& location, const std::string& name){
const int valueLength = 10240;
auto platformFlag = KEY_WOW64_64KEY;
HKEY key;
TCHAR value[valueLength];
DWORD bufLen = valueLength*sizeof(TCHAR);
long ret;
ret = RegOpenKeyExA(HKEY_LOCAL_MACHINE, location.c_str(), 0, KEY_READ | platformFlag, &key);
if( ret != ERROR_SUCCESS ){
return std::wstring();
}
ret = RegQueryValueExA(key, name.c_str(), NULL, NULL, (LPBYTE) value, &bufLen);
RegCloseKey(key);
if ( (ret != ERROR_SUCCESS) || (bufLen > valueLength*sizeof(TCHAR)) ){
return std::wstring();
}
std::wstring stringValue(value, (size_t)bufLen - 1);
size_t i = stringValue.length();
while( i > 0 && stringValue[i-1] == '\0' ){
--i;
}
return stringValue;
}
And I call it like auto result = GetRegKey("SOFTWARE\\Microsoft\\Cryptography", "MachineGuid");
yet string looks like
㤴ㄷ㤵戰㌭㉣ⴱ㔴㍥㤭慣ⴹ㍥摢㘵〴㉡ㄵ\0009ca9-e3bd5640a251
not like RegEdit
4971590b-3c21-45e3-9ca9-e3bd5640a251
So I wonder what shall be done to get a correct representation of MachineGuid in C++?

RegQueryValueExA is an ANSI wrapper around the Unicode version since Windows NT. When building on a Unicode version of Windows, it not only converts the the lpValueName to a LPCWSTR, but it will also convert the lpData retrieved from the registry to an LPWSTR before returning.
MSDN has the following to say:
If the data has the REG_SZ, REG_MULTI_SZ or REG_EXPAND_SZ type, and
the ANSI version of this function is used (either by explicitly
calling RegQueryValueExA or by not defining UNICODE before including
the Windows.h file), this function converts the stored Unicode string
to an ANSI string before copying it to the buffer pointed to by
lpData.
Your problem is that you are populating the lpData, which holds TCHARs (WCHAR on Unicode versions of Windows) with an ANSI string.
The garbled string that you see is a result of 2 ANSI chars being used to populate a single wchar_t. That explains the Asian characters. The portion that looks like the end of the GUID is because the print function blew past the terminating null since it was only one byte and began printing what is probably a portion of the buffer that was used by RegQueryValueExA before converting to ANSI.
To solve the problem, either stick entirely to Unicode, or to ANSI (if you are brave enough to continue using ANSI in the year 2014), or be very careful about your conversions. I would change GetRegKey to accept wstrings and use RegQueryValueExW instead, but that is a matter of preference and what sort of code you plan on using this in.
(Also, I would recommend you have someone review this code since there are a number of oddities in the error checking, and a hard coded buffer size.)

C++ WCHAR manipulations

I'm developing a tiny Win32 app in C++.
I've studied C++ fundamentals long time ago, so now I completely confused because of character strings in C++. There were no WCHAR or TCHAR only char and String.
After a little investigation I've decided not to use TCHAR.
My issue is very simple I think, but I can't find clear guide how to manipulate strings in C++. Affected by PHP coding last few years I've expected something simple with strings manipulations and was wrong!
Simply, all I need is to put new data to a character string.
WCHAR* cs = L"\0";
swprintf( cs, "NEW DATA" );
This was my first attempt. When debugging my app I've investigated that swprintf puts only first 2 chars to my cs var. I've resolved my problem this way:
WCHAR cs[1000];
swprintf( cs, "NEW DATA" );
But generally this trick could fail, because in my case new data is not constant value but another variable, that could potentialy be wider, than 1000 chars long. And my code is looks like this:
WCHAR cs[1000];
WCHAR* nd1;
WCHAR* nd2;
wcscpy(nd1, L"Some value");
wcscpy(nd2, L"Another value"); // Actually these vars stores the path for user selected folder
swprintf( cs, "The paths are %s and %s", nd1, nd2);
In this case there is possibility than nd1 and nd2 total character count could be greater than 1000 chars so critical data will be lost.
The question is how can I copy all data I need to WCHAR string declared this way WCHAR* wchar_var; without losing anything?
P.S. Since I'm Russian the question may be unclear. Let me now about that, and I'll try to explain my issue more clear and complex.

In modern Windows programming, it's OK to just ignore TCHAR and instead use wchar_t (WCHAR) and Unicode UTF-16.
(TCHAR is a model of the past, when you wanted to have a single code base, and produce both ANSI/MBCS and Unicode builds changing some preprocessor switches like _UNICODE and UNICODE.)
In any case, you should use C++ and convenient string classes to simplify your code. You can use ATL::CString (which corresponds to CStringW in Unicode builds, which are the default since VS2005), or STL's std::wstring.
Using CString, you can do:
CString str1 = L"Some value";
CString str2 = L"Another value";
CString cs;
cs.Format(L"The paths are %s and %s", str1.GetString(), str2.GetString());
CString also provides proper overloads of operator+ to concatenate strings (so you don't have to calculate the total length of the resulting string, dynamically allocate a buffer for the destination string or check existing buffer size, call wcscpy, wcscat, don't forget to release the buffer, etc.)
And you can simply pass instances of CString to Win32 APIs expecting const wchar_t* (LPCWSTR/PCWSTR) parameters, since CString offers an implicit conversion operator to const wchar_t*.

When you're using a WCHAR*, you are invoking undefined behavior because you have a pointer but have not made it point to anything valid. You need to find out how long the resulting string will be and dynamically allocate space for the string. For example:
WCHAR* cs;
WCHAR* nd1;
WCHAR* nd2;
nd1 = new WCHAR[lstrlen(L"Some value") + 1]; // +1 for the null terminator
nd2 = new WCHAR[lstrlen(L"Another value") + 1];
cs = new WCHAR[lstrlen(L"The paths are and ") + lstrlen(nd1) + lstrlen(nd2) + 1];
wcscpy(nd1, L"Some value");
wcscpy(nd2, L"Another value"); // Actually these vars stores the path for user selected folder
swprintf( cs, L"The paths are %s and %s", nd1, nd2);
delete[] nd1;
delete[] nd2;
delete[] cs;
But this is very ugly and error-prone. As noted, you should be using std::wstring instead, something like this:
std::wstring cs;
std::wstring nd1;
std::wstring nd2;
nd1 = L"Some value";
nd2 = L"Another value";
cs = std::wstring(L"The paths are ") + nd1 + L" and " + nd2;

Suggest to use ATL CStringW class instead of raw WCHAR, it's much more handy. CString is wrapper for dynamically allocated C-string. It will manage string length & allocated memory buffer appropriately after each operation so you wouldn't care on it.
Typical usage:
#include <atlstr.h>
CStringW s;
s.Format(L"The paths are %s and %s", L"Some value", L"Another value");
const WCHAR* wstr = s.GetString(); // To pass to some API that need WCHAR
or
#include <atlstr.h>
CStringW s(L"The paths are ");
s += L"Some value";
s += L" and ";
s += L"Another value";
const WCHAR* wstr = s.GetString(); // To pass to some API that need WCHAR

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ char conversion - c++

use the ANSI version of the function RegQueryValueExA that way you don't need to convert.

Related

Why GetServiceDisplayNameW() and GetServiceDisplayNameA() returns different required buffer sizes in character?

Set Registry Value to a Wide Character String (WCHAR) in C++

How to check the contents of a LPTSTR string?

Why obtained MachineGuid looks not alike a GUID but like Korean?

C++ WCHAR manipulations

Categories

Resources