Difference between char* and wchar_t* - c++

I am new to MFC. I am trying to do simple mfc application and I'm getting confuse in some places. For example, SetWindowText have two api, SetWindowTextA, SetWindowTextW one api takes char * and another one accepts wchar_t *.
What is the use of char * and wchar_t *?

char is used for so called ANSI family of functions (typically function name ends with A), or more commonly known as using ASCII character set.
wchar_t is used for new so called Unicode (or Wide) family of functions (typically function name ends with W), which use UTF-16 character set. It is very similar to UCS-2, but not quite it. If character requires more than 2 bytes, it will be converted into 2 composite codepoints, and this can be very confusing.
If you want to convert one to another, it is not really simple task. You will need to use something like MultiByteToWideChar, which requires knowing and providing code page for input ANSI string.

On Windows, APIs that take char * use the current code page whereas wchar_t * APIs use UTF-16. As a result, you should always use wchar_t on Windows. A recommended way to do this is to:
// Be sure to define this BEFORE including <windows.h>
#define UNICODE 1
#include <windows.h>
When UNICODE is defined, APIs like SetWindowText will be aliased to SetWindowTextW and can therefore be used safely. Without UNICODE, SetWindowText will be aliased to SetWindowTextA and therefore cannot be used without first converting to the current code page.
However, there's no good reason to use wchar_t when you are not calling Windows APIs, since its portable functionality is not useful, and its useful functionality is not portable (wchar_t is UTF-16 only on Windows, on most other platforms it is UTF-32, what a total mess.)

SetWindowTextA takes char*, which is a pointer to ANSI strings.
SetWindowTextW takes wchar_t*, which is a pointer to "wide" strings (Unicode).
SetWindowText has been defined (#define) to either of these in header Windows.h based on the type of application you are building. If you are building a UNICODE build then your code will automatically use SetWindowTextW.
SetWindowTextA is there primarily to support legacy code, which needs to be built as SBCS (Single byte character set).

char* : It means that this is a pointer to data of type char.
Example
// Regular char
char aChar = 'a';
// Pointer to char
char* aPointer = new char;
*aPointer = 'a';
// Pointer to an array of 10 chars
char* anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';
// Also a pointer to an array of 10
char[] anArray = new char[ 10 ];
*anArray = 'a';
anArray[ 1 ] = 'b';
wchar_t* : wchar_t is defined such that any locale's char encoding can be converted to a wchar_t representation where every wchar_t represents exactly one codepoint.

Related

What is the difference and the relationship of char and CString [duplicate]

This question already has answers here:
What is `CString`?
(3 answers)
Closed 9 years ago.
Can someone explain me the difference and the relationship between the char * and CString?... Thanks.
There are few important differences.
char * is a pointer to char. Generally you can't say if it a single char, or a beginning of a string, and what is the length. All those things are dictated by program logic and some conventions, i.e. standard C functions, like to use const char * as inputs. You need to manage memory allocated for strings manually.
CString is a macro. Depending on your program compilation options, it can be defined to either the CStringA or CStringW class. There are differences and similarities.
The difference is that CStringAoperates with non-Unicode data (similar to char*), and CStringW is a Unicode string (similar to wchar_t*).
Both classes, however, are equivalent in the aspect of string manipulation and storage management. They are closer to the standard C++ std::string and std::wstring classes.
Apart from that, both CStringA and CStringW provide the capability to convert strings to and from Unicode form.
a CString will be an array of char and a char* will be a pointer into the array of char with which you can iterate the characters of the string.
Actually from MSDN:
CString is based on the TCHAR data type. If the symbol _UNICODE is defined for your program, TCHAR is defined as type wchar_t, a 16-bit character type; otherwise, it is defined as char, the normal 8-bit character type. Under Unicode, then, CString objects are composed of 16-bit characters. Without Unicode, they are composed of 8-bit char type.
CString is a class packed with different functionalities.. MSDN
char * is just a regular c++ data type.
CString is used mostly in MFC applications.
CString is a sequence of TCHAR-s rather then char*. The main difference is that if UNICODE is defined CString will be sequence of wchar. Actually depending on that macro CString will be tpyedef -ed either to CStringA or CStringW. Another major difference is that CString is a class while char* is simply a pointer to character.
Depending on the type of TCHAR, CString can be either CStringA or CStringW.
That said, CString is a wrapper over an array of chars, that enables you to easily treat that array of chars as a string, and operate on it in manners relevant to the string type.
For the relationship between them, here is something that illustrates it easily. You can convert between char * and CString like this:
CString str = "abc"; // const char[3] or char * to CString
and
const char * p = str.Get() // CString to const char *
A CString is a class and provides lots of functionalities that a char * doesnt. A char * is just a pointer to char or chars array.
A CString contains a buffer that is roughtly the same as a char * : LPTSTR GetBuffer( int nMinBufLength );
For the difference between LPTSTR and char * go here and here
CString is a wrapper class around a char* to provide some useful additional functions and to hide the memory allocation/deallocation from the user.
There is not much difference in performance terms so if you are using MFC classes, you might as well use a CString.

CreateFileMapping() name

Im creating a DLL that shares memory between different applications.
The code that creates the shared memory looks like this:
#define NAME_SIZE 4
HANDLE hSharedFile;
create(char[NAME_SIZE] name)
{
hSharedFile = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, 1024, (LPCSTR)name);
(...) //Other stuff that maps the view of the file etc.
}
It does not work. However if I replace name with a string it works:
SharedFile = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, 1024, (LPCSTR)"MY_TEST_NAME");
How can I get this to work with the char array?
I have a java background where you would just use string all the time, what is a LPCSTR? And does this relate to whether my MS VC++ project is using Unicode or Multi-Byte character set
I suppose you should increase NAME_SIZE value.
Do not forget that array must be at least number of chars + 1 to hold \0 char at the end, which shows the end of the line.
LPCSTR is a pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters and defined as follows:
LPCSTR defined as typedef __nullterminated CONST CHAR *LPCSTR;
For example even if you have "Hello world" constant and it has 11 characters it will take 12 bytes in the memory.
If you are passing a string constant as an array you must add '\0' to the end like {'T','E','S','T', '\0'}
If you look at the documentation, you'll find that most Win32 functions take an LPCTSTR, which represents a string of TCHAR. Depending on whether you use Unicode (the default) or ANSI, TCHAR will expand to either wchar_t or char. Also, LPCWSTR and LPCSTR explicitly represent Unicode and ANSI strings respectively.
When you're developing for Win32, in most cases, it's best to follow suit and use LPCTSTR wherever you need strings, instead of explicit char arrays/pointers. Also, use the TEXT("...") macro to create the correct kind of string literals instead of just "...".
In your case though, I doubt this is causing a problem, since both your examples use only LPCSTR. You have also defined NAME_SIZE to be 4, could it be that your array is too small to hold the string you want?

Different char type in windows programming

Recently, I meet some tasks about the char/string on windows platform. I see that they are different char type like char, TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR. Can someone give me some information about it? And how to use like the regular char and char *. I cam confused about these types?
Best Regards,
They are documented on MSDN. Here's a few:
TCHAR: A WCHAR if UNICODE is defined, a CHAR otherwise.
WCHAR: A 16-bit Unicode character.
CHAR: An 8-bit Windows (ANSI) character.
LPTSTR: An LPWSTR if UNICODE is defined, an LPSTR otherwise.
LPSTR: A pointer to a null-terminated string of 8-bit Windows (ANSI) characters.
LPWSTR: A pointer to a null-terminated string of 16-bit Unicode characters.
LPCTSTR: An LPCWSTR if UNICODE is defined, an LPCSTR otherwise.
LPCWSTR: A pointer to a constant null-terminated string of 16-bit Unicode characters.
LPCSTR: A pointer to a constant null-terminated string of 8-bit Windows (ANSI) characters.
Note that some of these types map to something different depending on whether UNICODE has been #define'd. By default, they resolve to the ANSI versions:
#include <windows.h>
// LPCTSTR resolves to LPCSTR
When you #define UNICODE before #include <windows.h>, they resolve to the Unicode versions.
#define UNICODE
#include <windows.h>
// LPCTSTR resolves to LPCWSTR
They are in reality typedefs to some fundamental types in the C and C++ language. For example:
typedef char CHAR;
typedef wchar_t WCHAR;
On compilers like Visual C++, there's really no difference between an LPCSTR and a const char* or a LPCWSTR and a const wchar_t* . This might differ between compilers however, which is why these data types exist in the first place!
It's sort of like the Windows API equivalent of <cstdint> or <stdint.h>. The Windows API has bindings in other languages, and having data types with a known size is useful, if not required.
char is the standard 8-bit character type.
wchar_t is a 16-bit Unicode UTF-16 character type, used since about Windows 95. WCHAR is another name for it.
TCHAR can be either one, depending on your compiler settings. Most of the time in a modern program it's wchar_t.
The P and LP prefixes are pointers to the different types. The L is legacy (stands for Long pointer), and became obsolete with Windows 95; you still see it quite a bit though.
The C after the prefix stands for const.
TCHAR, LPTSTR and LPCTSTR are all generalized macros that will be either regular character strings or wide character strings depending on whether or not the UNICODE define is set. CHAR, LPSTR and LPCSTR are regular character strings. WCHAR, LPWSTR and LPCWSTR are wide character strings. TCHAR, CHAR and WCHAR represents a single character. LPTSTR, LPSTR and LPWSTR are "Long Pointer to STRing". LPCTSTR, LPCSTR and LPWCSTR are constant string pointers.
Let me try to shed some light (I've blogged this on my site at https://www.dima.to/blog/?p=190 in case you want to check it out):
#include "stdafx.h"
#include "Windows.h"
int _tmain(int argc, _TCHAR* argv[])
{
/* Quick Tutorial on Strings in Microsoft Visual C++
The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about:
1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN.
2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN.
From above, we can see that the size of each character in our strings will change depending on our chosen character set.
WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte."
WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character.
Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters).
Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx.
*/
// Strings that are Multibyte.
LPSTR a; // Regular Multibyte string (synonymous with char *).
LPCSTR b; // Constant Multibyte string (synonymous with const char *).
// Strings that are Unicode.
LPWSTR c; // Regular Unicode string (synonymous with wchar_t *).
LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *).
// Strings that take on either Multibyte or Unicode depending on project settings.
LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *).
LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *).
/* From above, it is safe to assume that the pattern is as follows:
LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *).
W: Specifies that the type is of the Unicode Character Set.
C: Specifies that the type is constant.
T: Specifies that the type has a variable encoding.
STR: Specifies that the type is a string type.
*/
// String format specifiers:
e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings.
e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T).
c = L"Example."; // Formats a string as Unicode.
a = "Example."; // Formats a string as Multibyte.
return 0;
}

how to convert char * to uchar16 in JNI C++

here's what I am trying to do:
typedef uint16_t uchar16_t;
uchar16_t buf[32];
// buf will contain timezone information like GMT-6, Eastern Daylight Time, etc
char * str = "Test";
for (int i = 0; i <= strlen(str); i++)
buf[i] = str[i];
I guess that's not correct since uchar16_t would contain 2 bytes and str contains 1 byte.
What is it that I am supposed to do ?
Strlen? buf[32]? Trying to destroy the universe?
You want to use a wstringstream.
std::wstringstream lols;
lols << "Test";
std::wstring cakes;
lols >> cakes;
Edit#Comment:
You shouldn't use strlen because any decent string system allows embedded zeros, and strlen is seriously slow. In addition, you didn't resize your buffer as needed, so if you had a string of size > 31 you would get a buffer overflow. In addition, you would have to (if you did dynamically size your buffer) manually free it afterwards. Both of these things are serious failings of the C string system. My example code makes your standard library writer do all the work and avoid all these problems for you.
That's actually OK if your string will always be ASCII. To do it correctly, the portable function is mbstowcs which assumes you're converting from the default locale or if you're on Windows then there's API functions that let you specify the source code page explicitly.
Your code will work, as long as str is ASCII; calling strlen() in the loop condition is probably a bad idea, though. It might be easier to just use swprintf() if it's available on your system:
uchar16_t buf[32];
char *str = "Test";
swprintf(buf, sizeof buf, "%s", str);
Have a look here.
Also, is there a good reason you are defining your own type?
If you have a (narrow) char string, you cannot convert it to
a wchar_t string by setting your locale to "C" and then passing
the string through mbstowcs(). That's because the "C" locale specifies
a -particular- character encoding, and that encoding might not match
the encoding of the execution character set, so mbstowcs() might
map the characters to something unexpected, or could even fail
(if the execution character set happened to use encodings that
were incompatible with the encoding structure for the C locale
character set.)
Thus, in order to convert a char
string into a wider string, you have
to copy the chars one by one into an
array of wchar_t . If you need to work
with Unicode or utf-16 or whatever
after that, then wcstombs() is what
you should look at.

CStringT to char[]

I'm trying to make changes to some legacy code. I need to fill a char[] ext with a file extension gotten using filename.Right(3). Problem is that I don't know how to convert from a CStringT to a char[].
There has to be a really easy solution that I'm just not realizing...
TIA.
If you have access to ATL, which I imagine you do if you're using CString, then you can look into the ATL conversion classes like CT2CA.
CString fileExt = _T ("txt");
CT2CA fileExtA (fileExt);
If a conversion needs to be performed (as when compiling for Unicode), then CT2CA allocates some internal memory and performs the conversion, destroying the memory in its destructor. If compiling for ANSI, no conversion needs to be performed, so it just hangs on to a pointer to the original string. It also provides an implicit conversion to const char * so you can use it like any C-style string.
This makes conversions really easy, with the caveat that if you need to hang on to the string after the CT2CA goes out of scope, then you need to copy the string into a buffer under your control (not just store a pointer to it). Otherwise, the CT2CA cleans up the converted buffer and you have a dangling reference.
Well you can always do this even in unicode
char str[4];
strcpy( str, CStringA( cString.Right( 3 ) ).GetString() );
If you know you AREN'T using unicode then you could just do
char str[4];
strcpy( str, cString.Right( 3 ).GetString() );
All the original code block does is transfer the last 3 characters into a non unicode string (CStringA, CStringW is definitely unicode and CStringT depends on whether the UNICODE define is set) and then gets the string as a simple char string.
First use CStringA to make sure you're getting char and not wchar_t. Then just cast it to (const char *) to get a pointer to the string, and use strcpy or something similar to copy to your destination.
If you're completely sure that you'll always be copying 3 characters, you could just do it the simple way.
ext[0] = filename[filename.Length()-3];
ext[1] = filename[filename.Length()-2];
ext[2] = filename[filename.Length()-1];
ext[3] = 0;
I believe this is what you are looking for:
CString theString( "This is a test" );
char* mychar = new char[theString.GetLength()+1];
_tcscpy(mychar, theString);
If I remember my old school MS C++.
You do not specify where is the CStringT type from. It could be anything, including your own implementation of string handling class. Assuming it is CStringT from MFC/ATL library available in Visual C++, you have a few options:
It's not been said if you compile with or without Unicode, so presenting using TCHAR not char:
CStringT
<
TCHAR,
StrTraitMFC
<
TCHAR,
ChTraitsCRT<TCHAR>
>
> file(TEXT("test.txt"));
TCHAR* file1 = new TCHAR[file.GetLength() + 1];
_tcscpy(file1, file);
If you use CStringT specialised for ANSI string, then
std::string file1(CStringA(file));
char const* pfile = file1.c_str(); // to copy to char[] buffer