How do I convert a `CString` into a `CHAR *`? - c++

I have the following c++ code:
#include "stdafx.h"
#include <atlstr.h>
int _tmain(int argc, _TCHAR* argv[])
{
CString OFST_PATH;
TCHAR DIR_PATH[MAX_PATH];
GetCurrentDirectory(MAX_PATH, DIR_PATH);
OFST_PATH.Format(DIR_PATH);
CHAR *pOFST_PATH = (LPSTR)(LPCTSTR)OFST_PATH;
return 0;
}
I want to understand why the value of pOFST_PATH in the end of the program is "c"? what did (LPSTR)(LPCTSTR) casting of variable OFST_PATH did to the whole path that was written in there?
As you can see in the following window, when debuging the variables values are:

CString and LPCTSTR are both based on TCHAR, which is wchar_t when UNICODE is defined (which it is, in your case, as I can tell by the value of argv in your debugger). When you do this:
(LPCTSTR)OFST_PATH
That works okay, because CString has a conversion operator to LPCTSTR. But with UNICODE defined, LPCTSTR is LPCWSTR, a.k.a. wchar_t const*. It points to an array of utf16 characters. The first character in that array is L'c' (that's the wide character version of 'c'). The bytes of L'c' look like this in memory: 0x63 0x00. That's the ASCII code for the letter 'c', followed by a zero. So, when you convert your CString to LPCTSTR, that's valid, however, your next conversion:
(LPSTR)(LPCTSTR)OFST_PATH
That's not valid. LPSTR is char*, so you are treating a wchar_t const* as if it's a char*. Well your debugger assumes that when it sees a char*, it is looking at a null terminated narrow character string. And if you remember from above what the value of the bytes of the first character were, it is the ASCII value for the letter 'c', followed by a zero. So the debugger sees this as a null terminated string consisting of just the letter 'c'.
The moral of the story is, don't use c-style casts if you don't understand what they do, and whether they are appropriate.

Related

Converting a TCHAR to wstring

TCHAR path[_MAX_PATH+1];
std::wstring ws(&path[0], sizeof(path)/sizeof(path[0]));
or
TCHAR path[_MAX_PATH];
std::wstring ws(&path[0]);
While converting a TCHAR to wstring both are correct?
I'm asking just for clarification, I'm in doubt if I'm converting it correctly.
The code is problematic in several ways.
First, std::wstring is a string of wchar_t (aka WCHAR) while TCHAR may be either CHAR or WCHAR, depending on configuration. So either use WCHAR and std::wstring, or TCHAR and std::basic_string<TCHAR> (remembering that std::wstring is just a typedef for std::basic_string<WCHAR>).
Second, the problem is with string length. This snippet:
WCHAR path[_MAX_PATH];
std::wstring ws(&path[0], sizeof(path)/sizeof(path[0]));
will create a string of length exactly _MAX_PATH + 1, plus a terminating null (and likely with embedded nulls, C++ strings allow that). Likely not what you want.
The other one:
WCHAR path[_MAX_PATH+1];
...
std::wstring ws(&path[0]);
expects that path holds a null-terminated string by the time ws is constructed, and copies it into ws. If path happens to be not null-terminated, UB ensues (usually, either garbage in ws or access violation).
If your path is either null-terminated or contains _MAX_PATH-length string, I suggest using it like this:
WCHAR path[_MAX_PATH+1];
... // fill up to _MAX_PATH characters
path[_MAX_PATH] = L'0'; // ensure it is null-terminated
std::wstring ws(path); // construct from a null-terminated string
Or if you know the actual length, just pass it:
WCHAR path[_MAX_PATH];
size_t length = fill_that_path(path);
std::wstring ws(path, length); // length shouldn’t include the null terminator, if any
See the docs (it’s the same for string and wstring except of different char type).
It depends on the content of path. If it is an arbitrary char array that can contain null characters, then you should use the first version which explicitely gives the size. But if is contains a null terminated string (and only contains unused values after the first null), then you should use the second one which will stop on the terminating null character.

Only first character is assigned converting LPCTSTR to char*

I'm completely new to C++. In my program there's a function which has to take a LPCTSTR as a parameter. I want to convert it into a char*. What I tried is as follows,
char* GetChar(LPCTSTR var){
char* id = (char*)var;
.....
}
But while debugging I noticed that only first letter of var is assigned to id.
What have I done wrong?
(I tried various answers in StackOverflow about converting LPCTSTR to char* before coming to this solution. None of them worked for me.)
UPDATE
What i want is to get full string pointed by var to be treated as char*
It is much more useful to just pick a character set (wchar_t, or char), and just stick to it, in your application, since trying to use TCHAR, when trying to support both, may cause you some headaches. To be fair, today, you can just, safely, use wchar_t (or WCHAR, since from the current types you are using, I suspect that you are using Windows headers).
The problem that you have, is because casting a pointer does not have any impact on its contents. And, since, typically wchar_t is 2 bytes in size, while char is 1 byte in size, storing the value, that fits inside a char, in wchar_t, leaves 2nd byte of wchar_t set to \0. And when you try to print null(\0)-terminated string of wchar_ts as a string of chars, the printing function reaches the \0 character after reading the first symbol, and assumes it is the end of the string. \0 character in wchar_t is 2 bytes long.
For example, the string
LPCWSTR test = L"Hi!";
is stored in memory as:
48 00 69 00 21 00 00 00
If you want to convert between the wchar_t version of the string to char version, or vice-versa, there exist some functions, that can do the conversion, and since I noticed that you probably are using Windows headers (from LPCTSTR define), those functions are WideCharToMultiByte/ MultiByteToWideChar.
You may now start to think: I am not using wchar_t! I am using TCHAR!
Typically TCHAR is defined in the following way:
#ifdef UNICODE
typedef WCHAR TCHAR;
#else
typedef char TCHAR;
#endif
So you could do similar handling in your conversion code:
template<int N>
bool GetChar(LPCTSTR var, char (&out)[N])){
#ifdef UNICODE
return WideCharToMultiByte (CP_ACP, 0, var, -1, out, N, NULL, NULL) != 0;
#else
return strcpy_s (out, var) == 0;
#endif
}
Note, the return value of GetChar function is true if the function Succeeds; false - otherwise.
You code has told the compiler to convert var (which is a pointer) into a pointer to a character and then assign that converted value to id. The only thing it converts is the pointer value. It doesn't make any changes to the thing var points to, copy it, or convert it. So you haven't done anything to the string var points to.
It's not clear what you're trying to do. But your code doesn't really do anything but convert a pointer value without changing or affecting the thing pointed to in any way.
When you convert a LPCTSTR (a long pointer to a const tchar string) to a char*, you get a char* that points to a CTSTR (a const tchar string). What use is that? What sense does that make?
Most probaby LPCTSTR is const wchar_t*, so if you cast it to char* (which is Undefined Behaviour - as var could point to literal), the LSB byte (wchar_t under Visual Studio is 16bits) of *var is zero so it is treated as '\0' - which indicates end of string. So in the end you get only one char.
To convert LPCTSTR to char* you can use wsctombs for example, see here: Convert const wchar_t* to const char*.
Here's an easy solution I found based on other answers given here.
char* GetChar(LPCTSTR var){
char id[30];
int i = 0;
while (var[i] != '\0')
{
id[i] = (char)var[i];
i++;
}
id[i] = '\0';
UPDATE
As mentioned in comments this is not a good way to solve this problem. But if someone has the same problem and cannot understand any other solution, this will help a bit.
Therefore I won't remove this answer.

Check length char[] before converting to wstring()

I have a api function. I takes a pointer to array char. The calling function is out of my control. Array is dynamic but still need some checking
extern "C" int __stdcall calcW2(LPWSTR foo)
If somebody make a call with
char foo[5000];
LPSTR lpfoo2 = foo;
calcW2(lpfoo2 );
I understand that i need to make some checks. I can test for nulltpr. But if I want to len checking. That the char array has some validity. How is that best done? In the safest way for a string to 0 to 2500 chars. Do need check for something more?
if(foo != nullptr)
{
//Size checking
//size_t newsize = strlen(SerialNumber) + 1 not good?
std::wstring test(foo);
}
You missed one important point. The function signature says LPWSTR not LPSTR. This means that the function expects (or should expect) to receive wchar_t[] not char[]. See https://msdn.microsoft.com/en-us/library/cc230355.aspx.
I mean:
extern "C" int __stdcall calcW2(LPWSTR foo) <--- LP-W-STR
char foo[5000];
LPSTR lpfoo2 = foo; <--- LP-STR
calcW2(lpfoo2 ); <--- LP-STR passed into LP-W-STR ??
that should not compile. Argument types are wrong.
If you change the array to wchar_t[] and it starts to fail to compile, then most probably you have some _UNICODE #defines set wrong. In WINAPI and similar, many functions have dual definitions. When "UNICODE" flag is set, they take LPWSTR, but when the flag is cleared, the headers switch them to taking LPSTR. So if you see that it should be LPWSTR and you want it to be LPWSTR and it insists on being LPSTR, then you either messed up the function names, or UNICODE flag (or the header you have is simply incorrect).
char and wchar_t are different. Simplifying, char is "singlebyte" and wchar_t is "twobyte". Both use '\0' as the end-of-string marker, but in wchar_t that's actually '\0\0' since it's two bytes per character. Also, in wchar_t[] plain ASCII data isn't like a|b|c|d|e|f, it's 0|a|0|b|0|c|0|d|0|e|0|f since it's two bytes per character. That's why the strlen cannot work on 16bit encoded data properly - it picks the first \0 from the first character as end-of-string. Having a wchar_t data forcibly packed into char[] is plainly wrong or at least highly misleading and error-prone.
That's why you should use wstrlen instead, which so happens to take wchar_t* instead of char*.
This is a overall 'rule'. For any function working on char (strlen, strcat, strcmp, ..) you should be able to find relevant w* function (wstrlen, wstrcat, wstrcmp, ..). There may be some underscores in the names sometimes. Search the docs. Don't mix up char types. That't now just byte-array. There is some semantics out there for them, and usually if some types are named differently, there's a reason for that.

Different char type in windows programming

Recently, I meet some tasks about the char/string on windows platform. I see that they are different char type like char, TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR. Can someone give me some information about it? And how to use like the regular char and char *. I cam confused about these types?
Best Regards,
They are documented on MSDN. Here's a few:
TCHAR: A WCHAR if UNICODE is defined, a CHAR otherwise.
WCHAR: A 16-bit Unicode character.
CHAR: An 8-bit Windows (ANSI) character.
LPTSTR: An LPWSTR if UNICODE is defined, an LPSTR otherwise.
LPSTR: A pointer to a null-terminated string of 8-bit Windows (ANSI) characters.
LPWSTR: A pointer to a null-terminated string of 16-bit Unicode characters.
LPCTSTR: An LPCWSTR if UNICODE is defined, an LPCSTR otherwise.
LPCWSTR: A pointer to a constant null-terminated string of 16-bit Unicode characters.
LPCSTR: A pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters.
Note that some of these types map to something different depending on whether UNICODE has been #define'd. By default, they resolve to the ANSI versions:
#include <windows.h>
// LPCTSTR resolves to LPCSTR
When you #define UNICODE before #include <windows.h>, they resolve to the Unicode versions.
#define UNICODE
#include <windows.h>
// LPCTSTR resolves to LPCWSTR
They are in reality typedefs to some fundamental types in the C and C++ language. For example:
typedef char CHAR;
typedef wchar_t WCHAR;
On compilers like Visual C++, there's really no difference between an LPCSTR and a const char* or a LPCWSTR and a const wchar_t* . This might differ between compilers however, which is why these data types exist in the first place!
It's sort of like the Windows API equivalent of <cstdint> or <stdint.h>. The Windows API has bindings in other languages, and having data types with a known size is useful, if not required.
char is the standard 8-bit character type.
wchar_t is a 16-bit Unicode UTF-16 character type, used since about Windows 95. WCHAR is another name for it.
TCHAR can be either one, depending on your compiler settings. Most of the time in a modern program it's wchar_t.
The P and LP prefixes are pointers to the different types. The L is legacy (stands for Long pointer), and became obsolete with Windows 95; you still see it quite a bit though.
The C after the prefix stands for const.
TCHAR, LPTSTR and LPCTSTR are all generalized macros that will be either regular character strings or wide character strings depending on whether or not the UNICODE define is set. CHAR, LPSTR and LPCSTR are regular character strings. WCHAR, LPWSTR and LPCWSTR are wide character strings. TCHAR, CHAR and WCHAR represents a single character. LPTSTR, LPSTR and LPWSTR are "Long Pointer to STRing". LPCTSTR, LPCSTR and LPWCSTR are constant string pointers.
Let me try to shed some light (I've blogged this on my site at https://www.dima.to/blog/?p=190 in case you want to check it out):
#include "stdafx.h"
#include "Windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
/* Quick Tutorial on Strings in Microsoft Visual C++
The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about:
1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN.
2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN.
From above, we can see that the size of each character in our strings will change depending on our chosen character set.
WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte."
WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character.
Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters).
Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx.
*/
// Strings that are Multibyte.
LPSTR a; // Regular Multibyte string (synonymous with char *).
LPCSTR b; // Constant Multibyte string (synonymous with const char *).
// Strings that are Unicode.
LPWSTR c; // Regular Unicode string (synonymous with wchar_t *).
LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *).
// Strings that take on either Multibyte or Unicode depending on project settings.
LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *).
LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *).
/* From above, it is safe to assume that the pattern is as follows:
LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *).
W: Specifies that the type is of the Unicode Character Set.
C: Specifies that the type is constant.
T: Specifies that the type has a variable encoding.
STR: Specifies that the type is a string type.
*/
// String format specifiers:
e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings.
e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T).
c = L"Example."; // Formats a string as Unicode.
a = "Example."; // Formats a string as Multibyte.
return 0;
}

MFC: How to i convert DWORD and BYTE to LPCTSTR in order to display in MessageBox

I'm using VS2005 with "using Unicode Character Set" option
typedef unsigned char BYTE;
typedef unsigned long DWORD;
BYTE m_bGeraet[0xFF];
DWORD m_dwAdresse[0xFF];
How do i make the code work?
MessageBox (m_bGeraet[0], _T("Display Content"));
MessageBox (m_dwAdresse[0], _T("Display Content"));
It looks like you might need some help with the C language itself, and I recommend you find a beginner's book on C that is not about Windows programming.
MessageBox() only displays C-style strings which are arrays of type char which contain a character with ASCII value 0. This zero character is the NUL character, and such strings are said to be "NUL-terminated" or "Zero-terminated." Only the characters prior to the NUL are displayed when the string is printed, or copied when the string is concatenated. However, if there is no NUL character in the array, then the string is not properly terminated and an attempt to display it could lead to a crash, or to "garbage" being displayed, as in: "Can I have a beer?#BT&I10)aaX?.
The szTitle and szText arguments to MessageBox() expect char * which are pointers to this type of string.
If you attempt to pass a BYTE instead of a char *, the value of the BYTE will be mistakenly treated as an address. MessageBox() will attempt to access memory at the value "specified" by the BYTE and an Access Violation will occur.
One solution to this problem is to allocate a buffer of type char and use snprintf_s to transcribe your data values to string representations.
For example:
char output_buffer[1024];
snprintf_s(output_buffer, dimensionof(output_buffer), "Geraet = 0x%02X", m_bGeraet[i]);
MessageBox(hwnd_parent, output_buffer, "Message from me:", MB_OK);
Would display a MessageBox with a message reading something like "Geraet = 0x35".
If it's essential that BYTE is 1-byte then you have to (optionally) convert your byte strings to wide strings using mbstowcs.
//easy way for bytes is to do this
CString sTemp;
sTemp.Format("my byte = %d", bySomeVal);
MessageBox(sTemp);
//for a DWORD try
sTemp.Format("Dword is %lu", dwSomeVal);
MessageBox(sTemp);
if you using MessageBox, i would suggest soetming like AfxMessageBox...