TCHAR pointer initialization - c++

I am trying to understand the following code:
const TCHAR * portName = "COM15";
I understand that a TCHAR is either a Char (in ANSI) or a wChar (in Unicode), basically a 1 byte or 2 byte container that represents something.
Now, if I declare a pointer to a const TCHAR called portName, portName is then a pointer. When I use the "=" sign, I am giving that pointer a value, and it seems irrational to me that "COM15" would be the address. I assume that line of code is giving me a pointer to the location of the beginning of the "COM15" string of characters, correct?
So what is actually happening in that line of code?
Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?

"Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?"
Yes, exactly. But other than it sounds from your question as you might have expected, this happens when the program is compiled, and not at run time. Also the const keyword prohibits changing that pointer at runtime later.

This how C works:
When you say char * str1 in C, you are allocating a pointer in the memory. When you write str1 = "Hello";, you are creating a string literal in memory and making the pointer point to it.
When you create another string literal "new string" and assign it to str1, all you are doing is changing where the pointer points.

Related

Why do I get a deprecated conversion warning with string literals in one case but not another?

I am learning C++. In the program shown here, as far as I know, str1 and str2 store the addresses of first characters of each of the relevant strings:
#include <iostream>
using namespace std;
int main()
{
char str1[]="hello";
char *str2="world";
cout<<str1<<endl;
cout<<str2<<endl;
}
However, str1is not giving any warnings, while with str2 I get this warning:
warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
char *str2="world";
What's different between these two declarations that causes the warning in the second case but not the first?
When you write
char str1[] = "hello";
you are saying "please make me an array of chars that holds the string "hello", and please choose the size of the array str1 to be the size of the string initializing it." This means that str1 ends up storing its own unique copy of the string "hello". The ultimate type of str1 is char[6] - five for hello and one for the null terminator.
When you write
char *str2 = "world";
you are saying "please make me a pointer of type char * that points to the string literal "world"." The string literal "world" has type const char[6] - it's an array of six characters (five for hello and one for the null terminator), and importantly those characters are const and can't be modified. Since you're pointing at that array with a char * pointer, you're losing the const modifier, which means that you now (unsafely) have a non-const pointer to a const bit of data.
The reason that things are different here is that in the first case, you are getting a copy of the string "hello", so the fact that your array isn't const isn't a problem. In the second case, you are not getting a copy of "hello" and are instead getting a pointer to it, and since you're getting a pointer to it there's a concern that modifying it could be a real problem.
Stated differently, in the first case, you're getting an honest-to-goodness array of six characters that have a copy of hello in them, so there's no problem if you then decide to go and mutate those characters. In the second case, you're getting a pointer to an array of six characters that you're not supposed to modify, but you're using a pointer that permits you to mutate things.
So why is it that "world" is a const char[6]? As an optimization on many systems, the compiler will only put one copy of "world" into the program and have all copies of the literal "world" point to the exact same string in memory. This is great, as long as you don't change the contents of that string. The C++ language enforces this by saying that those characters are const, so mutating them leads to undefined behavior. On some systems, that undefined behavior leads to things like "whoa, my string literal has the wrong value in it!," and in others it might just segfault.
The problem is that you are trying to convert a string literal (with type const char*) to char*.

confusion about char pointer in c++

I'm new in c++ language and I am trying to understand the pointers concept.
I have a basic question regarding the char pointer,
What I know is that the pointer is a variable that stores an address value,
so when I write sth like this:
char * ptr = "hello";
From my basic knowledge, I think that after = there should be an address to be assigned to the pointer, but here we assign "hello" which is set of chars.
So what does that mean ?
Is the pointer ptr points to an address that stores "hello"? or does it store the hello itself?
Im so confused, hope you guys can help me..
Thanks in advance.
ptr holds the address to where the literal "hello" is stored at. In this case, it points to a string literal. It's an immutable array of characters located in static (most commonly read-only) memory.
You can make ptr point to something else by re-assigning it, but before you do, modifying the contents is illegal. (its type is actually const char*, the conversion to char* is deprecated (and even illegal in C++11) for C compatibility.
Because of this guarantee, the compiler is free to optimize for space, so
char * ptr = "hello";
char * ptr1 = "hello";
might yield two equal pointers. (i.e. ptr == ptr1)
The pointer is pointing to the address where "hello" is stored. More precisely it is pointing the 'h' in the "hello".
"hello" is a string literal: a static array of characters. Like all arrays, it can be converted to a pointer to its first element, if it's used in a context that requires a pointer.
However, the array is constant, so assigning it to char* (rather than const char*) is a very bad idea. You'll get undefined behaviour (typically an access violation) if you try to use that pointer to modify the string.
The compiler will "find somewhere" that it can put the string "hello", and the ptr will have the address of that "somewhere".
When you create a new char* by assigning it a string literal, what happens is char* gets assigned the address of the literal. So the actual value of char* might be 0x87F2F1A6 (some hex-address value). The char* points to the start (in this case the first char) of the string. In C and C++, all strings are terminated with a /0, this is how the system knows it has reached the end of the String.
char* text = "Hello!" can be thought of as the following:
At program start, you create an array of chars, 7 in length:
{'H','e','l','l','o','!','\0'}. The last one is the null character and shows that there aren't any more characters after it. [It's more efficient than keeping a count associated with the string... A count would take up perhaps 4 bytes for a 32-bit integer, while the null character is just a single byte, or two bytes if you're using Unicode strings. Plus it's less confusing to have a single array ending in the null character than to have to manage an array of characters and a counting variable at the same time.]
The difference between creating an array and making a string constant is that an array is editable and a string constant (or 'string literal') is not. Trying to set a value in a string literal causes problems: they are read-only.
Then, whenever you call the statement char* text = "Hello!", you take the address of that initial array and stick it into the variable text. Note that if you have something like this...
char* text1 = "Hello!";
char* text2 = "Hello!";
char* text3 = "Hello!";
...then it's quite possible that you're creating three separate arrays of {'H','e','l','l','o','!','\0'}, so it would be more efficient to do this...
char* _text = "Hello!";
char* text1 = _text;
char* text2 = _text;
char* text3 = _text;
Most compilers are smart enough to only initialize one string constant automatically, but some will only do that if you manually turn on certain optimization features.
Another note: from my experience, using delete [] on a pointer to a string literal doesn't cause issues, but it's unnecessary since as far as I know it doesn't actually delete it.

C++ Concatenation Oops

So I've been going back and forth from C++, C# and Java lately and well writing some C++ code I did something like this.
string LongString = "Long String";
char firstChar = LongString.at(0);
And then tried using a method that looks like this,
void MethodA(string str)
{
//some code
cout << str;
//some more code }
Here's how I implemented it.
MethodA("1. "+ firstChar );
though perfectly valid in C# and Java this did something weird in C++.
I expected something like
//1. L
but it gave me part of some other string literal later in the program.
what did I actually do?
I should note I've fixed the mistake so that it prints what I expect but I'm really interested in what I mistakenly did.
Thanks ahead of time.
C++ does not define addition on string literals as concatenation. Instead, a string literal decays to a pointer to its first element; a single character is interpreted as a numeric value so the result is a pointer offset from one location in the program's read-only memory segment to another.
To get addition as concatenation, use std::string:
MethodA(std::string() + "1. " + firstChar);
MethodA(std::string("1. ")+ firstChar );
since "1. " is const char[4] and has no concat methods)
The problem is that "1. " is a string literal (array of characters), that will decay into a pointer. The character itself is a char that can be promoted to an int, and addition of a const char* and an int is defined as calculating a new pointer by offsetting the original pointer by that many positions.
Your code in C++ is calling MethodA with the result of adding (int)firstChar (ASCII value of the character) to the string literal "1. ", which if the value of firstChar is greater than 4 (which it probably is) will be undefined behavior.
MethodA("1. "+ firstChar ); //your code
doesn't do what you want it to do. It is a pointer arithmetic : it just adds an integral value (which is firstChar) to the address of string-literal "1. ", then the result (which is of char const* type) is passed to the function, where it converts into string type. Based on the value of firstChar, it could invoked undefined behavior. In fact, in your case, it does invoke undefined behavior, because the resulting pointer points to beyond the string-literal.
Write this:
MethodA(string("1. ")+ firstChar ); //my code
String literals in C++ are not instances of std::string, but rather constant arrays of chars. So by adding a char to it an implicit cast to character pointer which is then incremented by the numerical value of the character, whick happened to point to another string literal stored in .data section.

Basic c-style string memory allocation

I am working on a project with existing code which uses mainly C++ but with c-style strings. Take the following:
#include <iostream>
int main(int argc, char *argv[])
{
char* myString = "this is a test";
myString = "this is a very very very very very very very very very very very long string";
cout << myString << endl;
return 0;
}
This compiles and runs fine with the output being the long string.
However I don't understand WHY it works. My understanding is that
char* myString
is a pointer to an area of memory big enough to hold the string literal "this is a test". If that's the case, then how am I able to then store a much longer string in the same location? I expected it to crash when doing this due to trying to cram a long string into a space set aside for the shorter one.
Obviously there's a basic misunderstanding of what's going on here so I appreciate any help understanding this.
You're not changing the content of the memory, you're changing the value of the pointer to point to a different area of memory which holds "this is a very very very very very very very very very very very long string".
Note that char* myString only allocates enough bytes for the pointer (usually 4 or 8 bytes). When you do char* myString = "this is a test";, what actually happened was that before your program even started, the compiler allocated space in the executable image and put "this is a test" in that memory. Then when you do char* myString = "this is a test"; what it actually does is just allocate enough bytes for the pointer, and make the pointer point to that memory it had already allocated at compile time, in the executable.
So if you like diagrams:
char* myString = "this is a test";
(allocate memory for myString)
---> "this is a test"
/
myString---
"this is a very very very very very very very very very very very long string"
Then
myString = "this is a very very very very very very very very very very very long string";
"this is a test"
myString---
\
---> "this is a very very very very very very very very very very very long string"
There are two strings in the memory. First is "this is a test" and lets say it begins at the address 0x1000. The second is "this is a very very ... test" and it begins at the address 0x1200.
By
char* myString = "this is a test";
you crate a variable called myString and assign address 0x1000 to it. Then, by
myString = "this is a very very ... test";
you assign 0x1200. By
cout << myString << endl;
you just print the string beginning at 0x1200.
You have two string literals of type const char[n]. These can be assigned to a variable of type char*, which is nothing more than a pointer to a char. Whenever you declare a variable of type pointer-to-T you are only declaring the pointer, and not the memory to which it points.
The compiler reserves memory for both literals and you just take your pointer variable and point it at those literals one after the other. String literals are read-only and their allocation is taken care of by the compiler. Typically they are stored in the executable image in protected read-only memory. A string literal typically has a lifetime equal to that of the program itself.
Now, it would be UB if you attempted to modify the contents of a literal, but you don't. To help prevent yourself from attempting modifications in error you would be wise to declare your variable as const char*.
During program execution, a block of memory containing "this is a test" is allocated, and the address of the first character in that block of memory is assigned to the myString variable. In the next line, a separate block of memory containing "this is a very very..." is allocated, and the address of the first character in that block of memory is now assigned to the myString variable, replacing the address it used to store with the new address to the "very very long" string.
just for illustration, let's say the first block of memory looks like this:
[t][h][i][s][ ][i][s][ ][a][ ][t][e][s][t]
and let's just say the address of this first 't' character in this sequence/array of characters is 0x100.
so after the first assignment of the myString variable, the myString variable contains the address 0x100, which points to the first letter of "this is a test".
then, a totally different block of memory contains:
[t][h][i][s][ ][i][s][ ][a][ ][v][e][r][r][y]...
and let's just say that the address of this first 't' character is 0x200.
so after the second assignment of the myString variable, the myString variable NOW contains the address 0x200, which points to the first letter of "this is a very very very...".
Since myString is just a pointer to a character (hence: "char *" is it's type), it only stores the address of a character; it has no concern for how big the array is supposed to be, it doesn't even know that it is pointing to an "array", only that it is storing the address of a character...
for example, you could legally do this:
char myChar = 'C';
/* assign the address of the location in
memory in which 'C' is stored to
the myString variable. */
myString = &myChar;
Hopefully that was clear enough. If so, upvote/accept answer. If not, please comment so that I may clarify.
string literals do not require allocation - they are stored as-is and can be used directly. Essentially myString was a pointer to one string literal, and was changed to point to another string literal.
char* means a pointer to a block of memory that holds a character.
C style string functions get a pointer to the start of a string. They assume there's a sequence of characters that end with a 0-null character (\n).
So what the << operator actually does is loop from that first character position until it finds a null character.

need help changing single character in char*

I'm getting back into c++ and have the hang of pointers and whatnot, however, I was hoping I could get some help understanding why this code segment gives a bus error.
char * str1 = "Hello World";
*str1 = '5';
ERROR: Bus error :(
And more generally, I am wondering how to change the value of a single character in a cstring. Because my understanding is that *str = '5' should change the value that str points to from 'H' to '5'. So if I were to print out str it would read: "5ello World".
In an attempt to understand I wrote this code snippet too, which works as expected;
char test2[] = "Hello World";
char *testpa2 = &test2[0];
*testpa2 = '5';
This gives the desired output. So then what is the difference between testpa2 and str1? Don't they both point to the start of a series of null-terminated characters?
When you say char *str = "Hello World"; you are making a pointer to a literal string which is not changeable. It should be required to assign the literal to a const char* instead, but for historical reasons this is not the case (oops).
When you say char str[] = "Hello World;" you are making an array which is initialized to (and sized by) a string known at compile time. This is OK to modify.
Not so simple. :-)
The first one creates a pointer to the given string literal, which is allowed to be placed in read-only memory.
The second one creates an array (on the stack, usually, and thus read-write) that is initialised to the contents of the given string literal.
In the first example you try to modify a string literal, this results in undefined behavior.
As per the language standard in 2.13.4.2
Whether all string literals are
distinct (that is, are stored in
nonoverlapping objects) is
implementation-defined. The effect of
attempting to modify a string literal
is undefined.
In your second example you used string-literal initialization, defined in 8.5.2.1
A char array (whether plain char,
signed char, or unsigned char) can be
initialized by a string- literal
(optionally enclosed in braces); a
wchar_t array can be initialized by a
wide string-literal (option- ally
enclosed in braces); successive
characters of the string-literal
initialize the members of the
array.