Check length char[] before converting to wstring() - c++

I have a api function. I takes a pointer to array char. The calling function is out of my control. Array is dynamic but still need some checking
extern "C" int __stdcall calcW2(LPWSTR foo)
If somebody make a call with
char foo[5000];
LPSTR lpfoo2 = foo;
calcW2(lpfoo2 );
I understand that i need to make some checks. I can test for nulltpr. But if I want to len checking. That the char array has some validity. How is that best done? In the safest way for a string to 0 to 2500 chars. Do need check for something more?
if(foo != nullptr)
{
//Size checking
//size_t newsize = strlen(SerialNumber) + 1 not good?
std::wstring test(foo);
}

You missed one important point. The function signature says LPWSTR not LPSTR. This means that the function expects (or should expect) to receive wchar_t[] not char[]. See https://msdn.microsoft.com/en-us/library/cc230355.aspx.
I mean:
extern "C" int __stdcall calcW2(LPWSTR foo) <--- LP-W-STR
char foo[5000];
LPSTR lpfoo2 = foo; <--- LP-STR
calcW2(lpfoo2 ); <--- LP-STR passed into LP-W-STR ??
that should not compile. Argument types are wrong.
If you change the array to wchar_t[] and it starts to fail to compile, then most probably you have some _UNICODE #defines set wrong. In WINAPI and similar, many functions have dual definitions. When "UNICODE" flag is set, they take LPWSTR, but when the flag is cleared, the headers switch them to taking LPSTR. So if you see that it should be LPWSTR and you want it to be LPWSTR and it insists on being LPSTR, then you either messed up the function names, or UNICODE flag (or the header you have is simply incorrect).
char and wchar_t are different. Simplifying, char is "singlebyte" and wchar_t is "twobyte". Both use '\0' as the end-of-string marker, but in wchar_t that's actually '\0\0' since it's two bytes per character. Also, in wchar_t[] plain ASCII data isn't like a|b|c|d|e|f, it's 0|a|0|b|0|c|0|d|0|e|0|f since it's two bytes per character. That's why the strlen cannot work on 16bit encoded data properly - it picks the first \0 from the first character as end-of-string. Having a wchar_t data forcibly packed into char[] is plainly wrong or at least highly misleading and error-prone.
That's why you should use wstrlen instead, which so happens to take wchar_t* instead of char*.
This is a overall 'rule'. For any function working on char (strlen, strcat, strcmp, ..) you should be able to find relevant w* function (wstrlen, wstrcat, wstrcmp, ..). There may be some underscores in the names sometimes. Search the docs. Don't mix up char types. That't now just byte-array. There is some semantics out there for them, and usually if some types are named differently, there's a reason for that.

Related

Only first character is assigned converting LPCTSTR to char*

I'm completely new to C++. In my program there's a function which has to take a LPCTSTR as a parameter. I want to convert it into a char*. What I tried is as follows,
char* GetChar(LPCTSTR var){
char* id = (char*)var;
.....
}
But while debugging I noticed that only first letter of var is assigned to id.
What have I done wrong?
(I tried various answers in StackOverflow about converting LPCTSTR to char* before coming to this solution. None of them worked for me.)
UPDATE
What i want is to get full string pointed by var to be treated as char*
It is much more useful to just pick a character set (wchar_t, or char), and just stick to it, in your application, since trying to use TCHAR, when trying to support both, may cause you some headaches. To be fair, today, you can just, safely, use wchar_t (or WCHAR, since from the current types you are using, I suspect that you are using Windows headers).
The problem that you have, is because casting a pointer does not have any impact on its contents. And, since, typically wchar_t is 2 bytes in size, while char is 1 byte in size, storing the value, that fits inside a char, in wchar_t, leaves 2nd byte of wchar_t set to \0. And when you try to print null(\0)-terminated string of wchar_ts as a string of chars, the printing function reaches the \0 character after reading the first symbol, and assumes it is the end of the string. \0 character in wchar_t is 2 bytes long.
For example, the string
LPCWSTR test = L"Hi!";
is stored in memory as:
48 00 69 00 21 00 00 00
If you want to convert between the wchar_t version of the string to char version, or vice-versa, there exist some functions, that can do the conversion, and since I noticed that you probably are using Windows headers (from LPCTSTR define), those functions are WideCharToMultiByte/ MultiByteToWideChar.
You may now start to think: I am not using wchar_t! I am using TCHAR!
Typically TCHAR is defined in the following way:
#ifdef UNICODE
typedef WCHAR TCHAR;
#else
typedef char TCHAR;
#endif
So you could do similar handling in your conversion code:
template<int N>
bool GetChar(LPCTSTR var, char (&out)[N])){
#ifdef UNICODE
return WideCharToMultiByte (CP_ACP, 0, var, -1, out, N, NULL, NULL) != 0;
#else
return strcpy_s (out, var) == 0;
#endif
}
Note, the return value of GetChar function is true if the function Succeeds; false - otherwise.
You code has told the compiler to convert var (which is a pointer) into a pointer to a character and then assign that converted value to id. The only thing it converts is the pointer value. It doesn't make any changes to the thing var points to, copy it, or convert it. So you haven't done anything to the string var points to.
It's not clear what you're trying to do. But your code doesn't really do anything but convert a pointer value without changing or affecting the thing pointed to in any way.
When you convert a LPCTSTR (a long pointer to a const tchar string) to a char*, you get a char* that points to a CTSTR (a const tchar string). What use is that? What sense does that make?
Most probaby LPCTSTR is const wchar_t*, so if you cast it to char* (which is Undefined Behaviour - as var could point to literal), the LSB byte (wchar_t under Visual Studio is 16bits) of *var is zero so it is treated as '\0' - which indicates end of string. So in the end you get only one char.
To convert LPCTSTR to char* you can use wsctombs for example, see here: Convert const wchar_t* to const char*.
Here's an easy solution I found based on other answers given here.
char* GetChar(LPCTSTR var){
char id[30];
int i = 0;
while (var[i] != '\0')
{
id[i] = (char)var[i];
i++;
}
id[i] = '\0';
UPDATE
As mentioned in comments this is not a good way to solve this problem. But if someone has the same problem and cannot understand any other solution, this will help a bit.
Therefore I won't remove this answer.

how to convert char array to LPCTSTR

I have function like this which takes variable number of argument and constructs
the string and passes it to another function to print the log .
logverbose( const char * format, ... )
{
char buffer[1024];
va_list args;
va_start (args, format);
vsprintf (buffer,format, args);
va_end (args);
LOGWriteEntry( "HERE I NEED TO PASS buffer AS LPCTSTR SO HOW TO CONVERT buffer to LPCTSTR??");
}
Instead of using buffer[1024] is there any other way? since log can be bigger or very smaller . All this am writing in C++ code please let me know if there is better way to do this .....
You can probably just pass it:
LOGWriteEntry (buffer);
If you are using ancient memory models with windows, you might have to explicitly cast it:
LOGWriteEntry ((LPCTSTR) buffer);
correction:
LPCTSTr is Long Pointer to a Const TCHAR STRing. (I overlooked the TCHAR) with the first answer.
You'll have to use the MultiByteToWideChar function to copy buffer to another buffer and pass that to the function:
w_char buf2 [1024];
MultiByteToWideChar (CP_ACP, 0, buffer, -1, buf2, sizeof buf2);
LOGWriteEntry (buf2);
A good way to proceed might be from among these alternatives:
Design the logverbose function to use TCHAR rather than char; or
Find out if the logging API provides a char version of LOGWriteEntry, and use that alternative.
If no char version of LOGWriteEntry exists, extend that API by writing one. Perhaps it can be written as a cut-and-paste clone of LOGWriteEntry, with all TCHAR use replaced by char, and lower-level functions replaced by their ASCII equivalents. For example, if LOGWriteEntry happens to call the Windows API function ReportEvent, your LOGWriteEntryA version could call ReportEventA.
Really, in modern applications, you should just forget about char and just use wchar_t everywhere (compatible with Microsoft's WCHAR). Under Unicode builds, TCHAR becomes WCHAR. Even if you don't provide a translated version of your program (all UI elements and help text is English), a program which uses wide characters can at least input, process and output international text, so it is "halfway there".

retrieving string from LPVOID

Can someone explain and help me out here please.
Lets say i have function like this where lpData holds a pointer to the data i want.
void foo(LPVOID lpData) {
}
What is the proper way to retreive this. This works but i get weird characters at the end
void foo(LPVOID lpData) {
LPVOID *lpDataP = (LPVOID *)lpData;
char *charData = (char*)lpDataP;
//i log charData....
}
I would prefer to use strings but I don't understand how to retrieve the data, i just get null pointer error when i try to use string. lpData holds a pointer right? (But my function is lpData not *lpData) so it isn't working? Am i doing this all wrong?
string *datastring = reinterpret_cast<std::string *>(lpData);
is what im trying.
This works but i get weird characters at the end
That means that your string isn't null-terminated—that is, it doesn't have a NUL byte (0) marking the end of the string.
C strings have to be null-terminated.* When you log a C string (char *), it keeps logging characters until it finds a NUL. If there wasn't one on the end of the string, it'll keep going through random memory until it finds one (or until you hit a page fault and crash). This is bad. And there's no way to fix it; once you lose the length, there's no way to get it back.
However, an unterminated string along with its length can be useful. Many functions can take the length alongside the char *, as an extra argument (e.g., the string constructor) or otherwise (e.g., width specifiers in printf format strings).
So, if you take the length, and only call functions that also take the length—or just make a null-terminated copy and use that—you're fine. So:
void foo(LPVOID lpData, int cchData) {
string sData(static_cast<const char *>(lpData), cchData);
// now do stuff with sData
}
Meanwhile, casting from LPVOID (aka void *, aka pointer-to-anything) to LPVOID * (aka void **, aka pointer to pointer-to-anything) to then cast to char * (pointer-to-characters) is wrong (and should be giving you a compiler warning in the second cast; if you're getting warnings and ignoring them, don't do that!). Also, it's generally better to use modern casts instead of C-style casts, and it's always better to be const-correct when there's no down side; it just makes things more explicit to the reader and safer in the face of future maintenance.
Finally:
string *datastring = reinterpret_cast<std::string *>(lpData);
This is almost certainly wrong.** The LPVOID is just pointing at a bunch of characters. You're saying you want to interpret those characters as if they were a string object. But a string object is some header information (maybe a length and capacity, etc.) plus a pointer to a bunch of characters. Treating one as the other is going to lead to garbage or crashes.***
* Yes, you're using C++, not C, but a char * is a "C string".
** If you actually have a string object that you've kept alive somewhere, and you stashed a pointer to that object in an LPVOID and have now retrieved it (e.g., with SetWindowLongPtr/GetWindowLongPtr), then a cast from LPVOID to string * would make sense. But I doubt that's what you're doing. (If you are, then you don't need the reinterpret_cast. The whole point of void * is that it's not interpreted, so there's nothing to reinterpret from. Just use static_cast.)
*** Or, worst of all, it may appear to work, but then lead to hard-to-follow crashes or corruption. Some standard C++ libraries use a special allocator to put the header right before the characters and return a pointer to the first character, so that a string can be used anywhere a char * can. Inside the string class, every method has to fudge the this pointer backward; for example, instead of just saying m_length it has to do something like static_cast<_string_header *>(this)[-1]->m_length. But the other way around doesn't work—if you just have a bunch of characters, not a string object, that fudge is going to read whatever bytes happened to be allocated right before the characters and try to interpret them as an integer, so you may end up thinking you have a string of length 0, or 182423742341241243.
There are at least two ways:
void foo(LPVOID lpData)
{
char *charData = (char*)lpData;
//i log charData....
}
or
void foo(LPVOID lpData)
{
char *charData = static_cast<char*>lpData;
//i log charData....
}

how to convert or cast CString to LPWSTR?

I tried to use this code:
USES_CONVERSION;
LPWSTR temp = A2W(selectedFileName);
but when I check the temp variable, just get the first character
thanks in advance
If I recall correctly, CString is typedef'd to either CStringA or CStringW, depending on whether you're building Unicode or not.
LPWSTR is a "Long Pointer to a Wide STRing" -- aka: wchar_t*
If you want to pass a CString to a function that takes LPWSTR, you can do:
some_function(LPWSTR str);
// if building in unicode:
some_function(selectedFileName);
// if building in ansi:
some_function(CA2W(selectedFileName));
// The better way, especially if you're building in both string types:
some_function(CT2W(selectedFileName));
HOWEVER LPWSTR is non-const access to a string. Are you using a function that tries to modify the string? If so, you want to use an actual buffer, not a CString.
Also, when you "check" temp -- what do you mean? did you try cout << temp? Because that won't work (it will display just the first character):
char uses one byte per character. wchar_t uses two bytes per character. For plain english, when you convert it to wide strings, it uses the same bytes as the original string, but each character gets padded with a zero. Since the NULL terminator is also a zero, if you use a poor debugger or cout (which is uses ANSI text), you will only see the first character.
If you want to print a wide string to standard out, use wcout.
In short: You cannot. If you need a non-const pointer to the underlying character buffer of a CString object you need to call GetBuffer.
If you need a const pointer you can simply use static_cast<LPCWSTR>(selectedFilename).
I know this is a decently old question, but I had this same question and none of the previous answers worked for me.
This, however, did work for my unicode build:
LPWSTR temp = (LPWSTR)(LPCWSTR)selectedFileName;
LPWSTR is a "Long Pointer to a Wide String". It is like wchar*.
CString strTmp = "temp";
wchar* szTmp;
szTmp = new WCHAR[wcslen(strTmp) + 1];
wcscpy_s(szTmp, wcslen(strTmp) + 1, strTmp);

CreateFileMapping() name

Im creating a DLL that shares memory between different applications.
The code that creates the shared memory looks like this:
#define NAME_SIZE 4
HANDLE hSharedFile;
create(char[NAME_SIZE] name)
{
hSharedFile = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, 1024, (LPCSTR)name);
(...) //Other stuff that maps the view of the file etc.
}
It does not work. However if I replace name with a string it works:
SharedFile = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, 1024, (LPCSTR)"MY_TEST_NAME");
How can I get this to work with the char array?
I have a java background where you would just use string all the time, what is a LPCSTR? And does this relate to whether my MS VC++ project is using Unicode or Multi-Byte character set
I suppose you should increase NAME_SIZE value.
Do not forget that array must be at least number of chars + 1 to hold \0 char at the end, which shows the end of the line.
LPCSTR is a pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters and defined as follows:
LPCSTR defined as typedef __nullterminated CONST CHAR *LPCSTR;
For example even if you have "Hello world" constant and it has 11 characters it will take 12 bytes in the memory.
If you are passing a string constant as an array you must add '\0' to the end like {'T','E','S','T', '\0'}
If you look at the documentation, you'll find that most Win32 functions take an LPCTSTR, which represents a string of TCHAR. Depending on whether you use Unicode (the default) or ANSI, TCHAR will expand to either wchar_t or char. Also, LPCWSTR and LPCSTR explicitly represent Unicode and ANSI strings respectively.
When you're developing for Win32, in most cases, it's best to follow suit and use LPCTSTR wherever you need strings, instead of explicit char arrays/pointers. Also, use the TEXT("...") macro to create the correct kind of string literals instead of just "...".
In your case though, I doubt this is causing a problem, since both your examples use only LPCSTR. You have also defined NAME_SIZE to be 4, could it be that your array is too small to hold the string you want?