wchar_t pointer - c++

What's wrong with this:
wchar_t * t = new wchar_t;
t = "Tony";
I thought I could use a wchar_t pointer as a string...

Your code has two issues.
First, "Tony" is a pointer to a string of char's. L"Tony" is the appropriate wide string.
Second, you allocate a single wchar_t via new, then immediately lose track of it by reassigning the pointer to Tony. This results in a memory leak.

A pointer just points to a single value. This is important.
All you've done is allocated room for a single wchar_t, and point at it. Then you try to set the pointer to point at a string (remember, just at the first character), but the string type is incorrect.
What you have is a string of char, it "should" be L"Tony". But all you're doing here is leaking your previous memory allocation because the pointer holds a new value.
Rather you want to allocate enough room to hold the entire string, then copy the string into that allocated memory. This is terrible practice, though; never do anything that makes you need to explicitly free memory.
Just use std::wstring and move on. std::wstring t = L"Tony";. It handles all the details, and you don't need to worry about cleaning anything up.

Since you are a C# developer I will point out a few things c++ does different.
This allocates a new wchar_t and assigns it to t
wchar_t* t = new wchar_t
This is an array of constant char
"Tony"
To get a constant wchar_t array prefix it with L
L"Tony"
This reasigns t to point to the constant L"Tony" instead of your old wchar_t and causes a memory leak since your wchar_t will never be released.
t = L"Tony"
This creates a string of wide chars (wchar_t) to hold a copy of L"Tony"
std::wstring t = L"Tony"
I think the last line is what you want. If you need access to the wchar_t pointer use t.c_str(). Note that c++ strings are mutable and are copied on each assignment.
The c way to do this would be
const wchar_t* t = L"Tony"
This does not create a copy and only assigns the pointer to point to the const wchar array

What this does is first assign a pointer to a newly allocated wchar_t into t, and then try to assign a non-wide string into t.
Can you use std::wstring instead? That will handle all your memory management needs for you.

you can, its just that "Tony" is a hardcoded string, and they're ANSI by default in most editors/compilers. If you want to tell the editor you're typing in a Unicode string, then prefix it with L, e.g. t = L"Tony".
You have other problems with your code, your allocation is allocating a single Unicode character (2 bytes), then you're trying to point the original variable to the constant string, thus leaking those 2 bytes.
If you want to create a buffer of Unicode data and place data into it, you want to do:
wchar_t* t = new wchar_t[500];
wcscpy(t, "Tony");

this is completely wrong.
There's no need to allocate two bytes, make t to point to them, and then overwrite the pointer t leaking the lost memory forever.
Also, "Tony" has a different type. Use:
wchar_t *t = L"Tony";
IMHO better don't use wchars at all - See https://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful

Related

How does the std::string constructor handle char[] of fixed size?

How does the string constructor handle char[] of a fixed size when the actual sequence of characters in that char[] could be smaller than the maximum size?
char foo[64];//can hold up to 64
char* bar = "0123456789"; //Much less than 64 chars, terminated with '\0'
strcpy(foo,bar); //Copy shorter into longer
std::string banz(foo);//Make a large string
In this example will the size of the banz objects string be based on the original char* length or the char[] that it is copied into?
First you have to remember (or know) that char strings in C++ are really called null-terminated byte strings. That null-terminated bit is a special character ('\0') that tells the end of the string.
The second thing you have to remember (or know) is that arrays naturally decays to pointers to the arrays first element. In the case of foo from your example, when you use foo the compiler really does &foo[0].
Finally, if we look at e.g. this std::string constructor reference you will see that there is an overload (number 5) that accepts a const CharT* (with CharT being a char for normal char strings).
Putting it all together, with
std::string banz(foo);
you pass a pointer to the first character of foo, and the std::string constructor will treat it as a null-terminated byte string. And from finding the null-terminator it knows the length of the string. The actual size of the array is irrelevant and not used.
If you want to set the size of the std::string object, you need to explicitly do it by passing a length argument (variant 4 in the constructor reference):
std::string banz(foo, sizeof foo);
This will ignore the null-terminator and set the length of banz to the size of the array. Note that the null-terminator will still be stored in the string, so passing a pointer (as retrieved by e.g. the c_str function) to a function which expects a null-terminated string, then the string will seem short. Also note that the data after the null-terminator will be uninitialized and have indeterminate contents. You must initialize that data before you use it, otherwise you will have undefined behavior (and in C++ even reading indeterminate data is UB).
As mentioned in a comment from MSalters, the UB from reading uninitialized and indeterminate data also goes for the construction of the banz object using an explicit size. It will typically work and not lead to any problems, but it does break the rules set out in the C++ specification.
Fixing it is easy though:
char foo[64] = { 0 };//can hold up to 64
The above will initialize all of the array to zero. The following strcpy call will not touch the data of the array beyond the terminator, and as such the remainder of the array will be initialized.
The constructor called is one that takes a const char* as an argument. That constructor attempts to copy the character data pointed to by that pointer, until the first NUL terminator is reached. If there is no such NUL terminator then the behaviour of the constructor is undefined.
Your foo type is converted to a char* by pointer decay, then an implicit conversion to const char* occurs at the calling site.
Perhaps there could have been a templatised std::string constructor taking a const char[N] as an argument, which would have allowed the insertion of more than one NUL character (the std::string class after all does support that), but it was not introduced and to do so now would be a breaking change; using
std::string foo{std::begin(foo), std::end(foo)};
will also copy the entire array foo.

Pointer stores strings?

I recently started learning C++ and came across with the concept of a pointer (which is a variable that stores the address of another variable). However I also came across with char* str = "Hello" and I became confused. So it looks like the of "Hello" is being assigned to the pointer str (which I thought could only store addresses). So can a pointer also store a string?
For future reference you should only use the language tag of the language you're using. C and C++ are two very different languages, and in this case there is a difference.
First the common part: Literal strings like "Hello" are stored by the compiler as arrays. In the case of "Hello" it's an array of six char elements, including the string null terminator.
Now for the part that's different: In C++ such string literal arrays are constant, they can not be modified. Therefore it's an error to have a non-const pointer to such an array. In C the string literal arrays are not constant, but they are still not modifiable, they are in essence read-only. But it's still allowed to have a non-const pointer to them.
And finally for your question: As with all arrays, using them make them decay into a pointer to their first element, and that is basically what happens here. You make your variable str point to the first element in the string literal array.
A little simplified it can be seen like this (in C):
char anonymous_literal_array[] = "Hello";
...
char *str = &anonymous_literal_array[0]; // Make str point to first element in array
The pointer will store the address of the start of the string, therefore the first character. In this case "Hello" is an immutable literal. (Check the difference: Immutable vs constant)
More correctly, a pointer cannot store a string as well as anything, a pointer can point to an address containing data of the pointer's type.
Since char* is a pointer to char, it points exactly to a char.
In this example, the pointer is the address of the first character in the string. This is inherited from C where a "string" is an array of characters terminated by a NULL character. In C and C++, arrays and pointers are closely related. When you do your own memory management, you often create an array with a pointer to the first element of the array. That is exactly what is going on here with the array holding the string literal "Hello".
in c/c++ strings are stored as array of characters. Literal string like "Hello" actually return start of temporary read only character array which hold this string.
A char* variable is a pointer to a single byte(char) in memory. The most common way of handling strings is called a c-style string where the char* is a pointer to the first character in the string and is followed by the rest of the characters in memory. The c-string will always end in a '\0' or null character to signify that you've reached the end of the string ( 'H', 'e', 'l', 'l', 'o', '\0' ).
The "Hello" is called a string literal. What happens in memory is at the very beginning of your program, before anything else is run, the program allocates and sets the memory for the "Hello" string where the other static constants are located. When you write char* str = "Hello"; The compiler knows you're using a string literal and sets str to the location of the first character of that string literal.
But be careful though. All string literals are stored in a portion of memory that you cannot write to. If you try to modify that string, you might get memory errors. To make sure this doesn't happen, when dealing with c-strings, you should always write const char* str = "Hello"; That way the compiler will never allow you to modify that memory.
To have a modifiable string, you will need to allocate and manage the memory yourself. I would suggest using std::string, or have some fun and make your own string class that handles the memory.

Understanding C-strings & string literals in C++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.
The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.
char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".
In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)
char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.

wstringstream to LPWSTR

I have built up a string using wstringstream and need to assign it to a member of a struct of type LPWSTR. I try to use my_stringstream.str().c_str() but get the following compile time error:
cannot convert from 'const wchar_t *' to 'LPWSTR'
How can I do this? I have tried many different combinations of casts with either more compile time errors or random jargon when I try and display the string in a GUI.
Your function expects a pointer to modifiable data, i.e. wchar_t*, but the standard string class only exposes a pointer to constant. Assuming that your function may actually write to the memory, we need to provide it with a valid pointer.
An easy way to obtain a modifiable buffer is, as always, a vector:
std::vector<wchar_t> buf(mystring.begin(), mystring.end());
buf.push_back(0); // because your consumer expects null-termination
crazy_function(buf.data());
crazy_function(&buf[0]); // old-style
// need a string again?
std::wstring newstr(buf.data()); // or &buf[0]
LPWSTR is typedefd as wchar_t*. You're trying to convert a const wchar_t* to a wchar_t*. You can't do that implicitly.
You can get around this by using const_cast, but only if you are certain the function won't modify the memory:
wstring str = my_stringstream.str();
LPWSTR str = const_cast<LPWSTR>(str.c_str());
Note that you do not want to do const_cast<LPWSTR>(my_stringstream.str().c_str()) (unless you are passing that to a function) because that will create a temporary string object, get it's pointer, convert it to a LPWSTR and then the temporary string you get from str() will be destroyed at the end of that line, leaving your LPWSTR pointing to a deallocated block of memory.
If the function you are passing the LPWSTR to is modifying the string, see Kerrek's answer.
If you are absolutely sure that the content of the string will not be modified, you can cast the const away via a static_cast; a situation where this can be acceptable for example is if you are using some struct to provide data to a function, but the same struct is also used for retrieving it, so that member is LPWSTR instead of just an LPCWSTR.
If, on the contrary, the function to which you'll pass the struct needs to modify the string, you have two alternatives.
The safest one is to allocate a separate copy of the string as a raw dynamic array of WCHAR and copy there the content of the wstring. You will probably want to wrap the result of the new with a smart pointer, unless you're transferring the ownership of the string (and in that case you'll probably have to use some special allocation function).
You can also pass a pointer to the internal buffer of the string using &YourString[0], but (1) I'm not sure it's guaranteed to work by the standard, and (2) it works fine only if the function won't change the length of your string adding a L'\0' before the end of it; in this case you should also re-adjust the actual length of the string.
Both in the last and in the first case you must be sure that the function you're passing the struct to do not expect the pointed buffer to live longer than the scope of your wstring (careful: mystream.str() is a temporary that dies on the very line you use it, you have to assign it to a new variable to give it a broader scope).
wstringstream b;
..
wchar_t z[100];
b.read(z,100);
where string length known to be less then 101.
it works without unsetf(std::ios_base::skipws) and all this. And without ZeroMemory on wchar_t array.
The reason for this is rather simple: LPWSTR expands to wchar_t *. As the pointer to the stream contents is a const, it's not possible to cast this const away, unless using const_cast<LPWSTR>(my_stringstream.str().c_str()). However I'd advice against this (as you might simply screw this up and/or modify something different that way. Only do it if you're sure it won't be modified or the modification won't matter.
The easiest (and most secure solution) is to create your own copy of the string provided by the wstringstream in a buffer and refer to this one in the struct. Just don't forget to free the memory later on.
std::wstring temp(my_stringstream.str());
lpwstrVar = &temp[0];

std::string vs. char*

does std::string store data differently than a char* on either stack or heap or is it just derived from char* into a class?
char*
Is the size of one pointer for your CPU architecture.
May be a value returned from malloc or calloc or new or new[].
If so, must be passed to free or delete or delete[] when you're done.
If so, the characters are stored on the heap.
May result from "decomposition" of a char[ N ] (constant N) array or string literal.
Generically, no way to tell if a char* argument points to stack, heap, or global space.
Is not a class type. It participates in expressions but has no member functions.
Nevertheless implements the RandomAccessIterator interface for use with <algorithm> and such.
std::string
Is the size of several pointers, often three.
Constructs itself when created: no need for new or delete.
Owns a copy of the string, if the string may be altered.
Can copy this string from a char*.
By default, internally uses new[] much as you would to obtain a char*.
Provides for implicit conversion which makes transparent the construction from a char* or literal.
Is a class type. Defines other operators for expressions such as catenation.
Defines c_str() which returns a char* for temporary use.
Implements std::string::iterator type with begin() and end().
string::iterator is flexible: an implementation may make it a range-checked super-safe debugging helper or simply a super-efficient char* at the flip of a switch.
If you mean, does it store contiguously, then the answer is that it's not required but all known (to me, anyway) implementations do so. This is most likely to support the c_str() and data() member requirements, which is to return a contiguous string (null-terminated in the case of c_str())
As far as where the memory is stored, it's usually on the heap. But some implementations employ the "Short String Optimization", whereby short string contents are stored within a small internal buffer. So, in the case that the string object is on the stack, it's possible that the stored contents are also on the stack. But this should make no difference to how you use it, since one the object is destroyed, the memory storing the string data is invalidated in either case.
(btw, here's an article on a similar technique applied generally, which explains the optimization.)
These solve different problems. char* (or char const*) points to a C style string which isn't necessarily owned by the one storing the char* pointer. In C, because of the lack of a string type, necessarily you often use char* as "the string type".
std::string owns the string data it points to. So if you need to store a string somewhere in your class, chances are good you want to use std::string or your librarie's string class instead of char*.
On contiguity of the storage of std::string, other people already answered.