Why wstring can accept WCHAR[] while string doesn't accept UCHAR[]

Why wstring can accept WCHAR[] while string doesn't accept UCHAR[] - c++

I am trying to print the returned value of NtQueryValueKey which is UCHAR Data[1]; i have tried printf, cout, and string(Data, DataLengh), with the first two printing only 1 character and the last one throws an exception... Basically if i changed the Data Type to WCHAR Data[1] and used wstring(Data) it accepts it normally without any complain... also wprintf prints the value normally.
Edit: I meant NtQueryValueKey using the KEY_VALUE_PARTIAL_INFORMATION, I am using VS 2015 btw...

You must have mixed something up. You did not specify what value from the KEY_NAME_INFORMATION enumeration you are using for the second parameter to specify the data type, but a quick look at MSDN shows that all of the structures contain WCHAR Name[1]; or something similar as the last member (which I guess is the one you are interested in). Can you elaborate and provide the link or other means of documentation that states you actually need to use UCHAR ?

WCHAR is an alias for wchar_t. std::wstring operates with wchar_t elements. A WCHAR[] can decay to a wchar_t*, and thus can be assigned directly to a std::wstring.
UCHAR is an alias for unsigned char. std::string operates with char elements instead. A UCHAR[]/UCHAR* cannot be assigned directly to a std::string without a type-cast to char*, as char and unsigned char are distinct data types.
unsigned char is commonly used to represent 8bit bytes (it is the same data type used for BYTE).
NtQueryKey() returns strings as UTF-16LE encoded bytes using WCHAR[] character arrays, not UCHAR[] byte arrays. So your code is declaring things wrong if you are using UCHAR[] to begin with. But even so, you can use UCHAR if you pay attention to the encoding and byte length, and use appropriate type-casts.
Any associated Length value reported by NtQueryKey() is expressed in bytes, not characters. sizeof(UCHAR) is 1 and sizeof(WCHAR) is 2. So every 2 UCHARs represents 1 WCHAR. And the strings are not null-terminated, so you have to take the Length into account when printing or converting.
In Latin-based languages, most commonly used Unicode characters will be <= U+00FF, and thus every other UCHAR in UTF-16LE will usually be 0. That is interpreted as a null terminator when UTF-16 is printed with printf() or std::cout. You need to use wprintf() or std::wcout instead.
Converting Data to a std::string is a valid operation and should not be raising an exception:
std::string((char*)Data, DataLength)
Provided that:
Data is a valid pointer.
DataLength is an accurate byte count.
The only way this could raise an exception is if either:
Data is not pointing at valid memory.
the value of DataLength is more than the actual number of bytes allocated for Data.
available memory is too low to allocate std::string's internal buffer.
memory is corrupted.
Assigning Data by itself to a std::wstring without taking DataLength into account is not a valid operation because the strings are not null-terminated. You must specify the length:
std::wstring(Data, DataLength / sizeof(WCHAR))
If Data is UCHAR then use a type-cast:
std::wstring((WCHAR*)Data, DataLength / sizeof(WCHAR))
When printing Data directly with wprintf(), you must pass DataLength as an input parameter:
wprintf(L"%.*s", DataLength / sizeof(WCHAR), Data);
When printing Data directly with std::wcout, you should use write() instead of operator<< so you can pass DataLength as an input parameter:
std::wcout.write(Data, DataLength / sizeof(WCHAR));
If Data is UCHAR then use a type-cast:
std::wcout.write((WCHAR*)Data, DataLength / sizeof(WCHAR));

Related

cast to pointer from integer of different size when converting uint64_t to bytes

[EDIT]I wanted write uint64_t to char* array in network byte order to send it as UDP datagram with sendto, uint64_t has 8 bytes so I convert them as follow:
void strcat_number(uint64_t v, char* datagram) {
uint64_t net_order = htobe64(v);
for (uint8_t i=0; i<8 ;++i) {
strcat(datagram, (const char*)((uint8_t*)&net_order)[i]);
}
}
wchich give me
warning: cast to pointer from integer of different size [-Wint-to-pointer-xast]
strcat(datagram, (const char*)((uint8_t*)&net_order)[i]);
how can I get rid of this warning or maybe do this number converting simpler or clearer?

((uint8_t*)&net_order)
this is a pointer to net_order casted to a uint8_t pointer
((uint8_t*)&net_order)[i]
this is the i-th byte of the underlying representation of net_order.
(const char*)((uint8_t*)&net_order)[i]
this is the same as above, but brutally casted to a const char *. This is an invalid pointer, and it is what the compiler is warning you about; even just creating this pointer is undefined behavior, and using it in any way will almost surely result in a crash.
Notice that, even if you somehow managed to make this kludge work, strcat is still the wrong function, as it deals with NUL-terminated strings, while here you are trying to put binary data inside your buffer, and binary data can naturally contain embedded NULs. strcat will append at the first NUL (and stop at the first NUL in the second parameter) instead of at the "real" end.
If you are building a buffer of binary data you have to use straight memcpy, and most importantly you cannot use string-related functions that rely on the final NUL to know where the string ends, but you have to keep track explicitly of how many bytes you used (i.e. the current position in the datagram).

Null bytes in char* in QByteArray with QDataStream

I'm discovered that char* in QByteArray have null bytes. Code:
QByteArray arr;
QDataStream stream(&arr, QIODevice::WriteOnly);
stream << "hello";
Look at debugger variable view:
I don't understand why I have three empty bytes at the beginning. I know that [3] byte is string length. Can I remove last byte? I know it's null-terminated string, but for my application I must have raw bytes (with one byte at beggining for store length).
More weird for me is when I use QString:
QString str = "hello";
[rest of code same as above]
stream << str;
It's don't have null at end, so I think maybe null bytes before each char informs that next byte is char?
Just two questions:
Why so much null bytes?
How I can remove it, including last null byte?

I don't understand why I have three empty bytes at the beginning.
It's a fixed-size, uint32_t (4-byte) header. It's four bytes so that it can specify data lengths as long as (2^32-1) bytes. If it was only a single byte, then it would only be able to describe strings up to 255 bytes long, because that's the largest integer value that can fit into a single byte.
Can I remove last byte? I know it's null-terminated string, but for my
application I must have raw bytes (with one byte at beggining for
store length).
Sure, as long as the code that will later parse the data array is not depending on the presence of a trailing NUL byte to work correctly.
More weird for me is when I use QString [...] it's don't have null at end, so I think maybe null bytes before each char informs that next byte is char?
Per the Qt serialization documentation page, a QString is serialized as:
- If the string is null: 0xFFFFFFFF (quint32)
- Otherwise: The string length in bytes (quint32) followed by the data in UTF-16.
If you don't like that format, instead of serializing the QString directly, you could do something like
stream << str.toUtf8();
instead, and that way the data in your QByteArray would be in a simpler format (UTF-8).
Why so much null bytes?
They are used in fixed-size header fields when the length-values being encoded are small; or to indicate the end of NUL-terminated C strings.
How I can remove it, including last null byte?
You could add the string in your preferred format (no NUL terminator but with a single length header-byte) like this:
const char * hello = "hello";
char slen = strlen(hello);
stream.writeRawData(&slen, 1);
stream.writeRawData(hello, slen);
... but if you have the choice, I highly recommend just keeping the NUL-terminator bytes at the end of the strings, for these reasons:
A single preceding length-byte will limit your strings to 255 bytes long (or less), which is an unnecessary restriction that will likely haunt you in the future.
Avoiding the NUL-terminator byte doesn't actually save any space, because you've added a string-length byte to compensate.
If the NUL-terminator byte is there, you can simply pass a pointer to the first byte of the string directly to any code expects a C-style string, and it will be able to use the string immediately (without any data-conversion steps). If you rely on a different convention instead, you'll end up having to make a copy of the entire string before you can pass it to that code, just so that you can append a NUL byte to the end of the string so that that C-string-expecting code can use it. That will be CPU-inefficient and error-prone.

RegKeyValue returning nonsense data

char value[255];
DWORD BufferSize = 8192;
RegGetValue(HKEY_LOCAL_MACHINE, L"SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion", L"ProductName", RRF_RT_ANY, NULL, &value, &BufferSize);
cout << value;
After RegKeyValue() runs, it appears that value is
value 0x0034f50c "ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ... char[255]
What's going on here?
Note: RegKeyValue() returns 0

There are two issues here.
Make sure the return value of RegGetValue is ERROR_SUCCESS. If it is not, the routine failed. Also, you can check to see what was written into BufferSize, as RegGetValue specifies the number of bytes written.
You're passing in a buffer defined as char value[255];, then specifying it's length as 8192. This can cause a buffer overrun.

You didn't check the return value of RegGetValue. Most likely the call failed and the buffer value was never assigned anything. Always check return values.
From the code we can see, I note that you are lying about the buffer size. You say that it is 8192 bytes. But you only allocated 255 bytes. You are also calling the Unicode version of the API, but passing in a char buffer. If you are expecting string data then you need to supply a buffer of wide characters. The Unicode version of this API will return string data as UTF-16 encoded text.
Once you get all that sorted you next need to check what type is stored in that value. You are passing NULL for the type parameter. Pass a pointer to a variable and find out whether or not a string really is stored
there. You will also need to read how many bytes are read and set the null-terminator in your buffer accordingly.

Cannot convert CString to BYTE array

I need to convert CString to BYTE array. I don't know why, but everything that I found in internet does not work :(
For example, I have
CString str = _T("string");
I've been trying so
1)
BYTE *pbBuffer = (BYTE*)(LPCTSTR)str;
2)
BYTE *pbBuffer = new BYTE[str.GetLength()+1];
memcpy(pbBuffer, (VOID*)(LPCTSTR)StrRegID, str.GetLength());
3)
BYTE *pbBuffer = (BYTE*)str.GetString();
And always pbBuffer contains just first letter of str
DWORD dwBufferLen = strlen((char *)pbBuffer)+1;
is 2
But if I use const string:
BYTE *pbBuffer = (BYTE*)"string";
pbBuffer contains whole string
Where is my mistake?

Your CString is Unicode (two bytes per character) and you try to interpret it as ANSI (one byte per character). This leads to results you don't expect.
Instead of casting the underlying buffer into char* you need to convert the data. Use WideCharToMultiByte() for that.

You are probably compiling with unicode. This means that your CString contains wchar_t instead of char. Converting a wchar_t pointer to a char pointer causes you to interpret the second byte of the first wchar_t as a string terminator (since that by is 0 for the most common characters)
When using visual studio you should always use _T() to declare string literals and TCHAR as your character type. In your case:
BYTE* pBuffer = (BYTE*)(LPCTSTR)str;
You get the buffer, but every other byte is most probably zero.
Use a CStringA if you need an ANSI string. (But then skip the _T() when initializing it)

How to retrieve SID's byte array

How can I convert a PSID type into a byte array that contains the byte value of the SID?
Something like:
PSID pSid;
byte sidBytes[68];//Max. length of SID in bytes is 68
if(GetAccountSid(
NULL, // default lookup logic
AccountName,// account to obtain SID
&pSid // buffer to allocate to contain resultant SID
)
{
ConvertPSIDToByteArray(pSid, sidBytes);
}
--how should I write the function ConvertPSIDToByteArray?

Use the GetLengthSid() to get the number of bytes you'll need. Then memcpy() from the PSID.

I think the function you might be looking for is ConvertSidToStringSid. The general idea is to convert the PSID struct to a LPTSTR which is in fact of type wchar_t. You can then convert this using standard functions to a multi-byte char array using wcstombs which will then give you the SID in bytes. Alternatively, you can operate on the wchar_t type directly and just write that out - there are functions for handling that. In either case, the result will be UTF-16 LE encoded and if you need to change from that you'll have to do a conversion.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why wstring can accept WCHAR[] while string doesn't accept UCHAR[] - c++

Related

cast to pointer from integer of different size when converting uint64_t to bytes

Null bytes in char* in QByteArray with QDataStream

RegKeyValue returning nonsense data

Cannot convert CString to BYTE array

How to retrieve SID's byte array

Categories

Resources