Basic c-style string memory allocation - c++

I am working on a project with existing code which uses mainly C++ but with c-style strings. Take the following:
#include <iostream>
int main(int argc, char *argv[])
{
char* myString = "this is a test";
myString = "this is a very very very very very very very very very very very long string";
cout << myString << endl;
return 0;
}
This compiles and runs fine with the output being the long string.
However I don't understand WHY it works. My understanding is that
char* myString
is a pointer to an area of memory big enough to hold the string literal "this is a test". If that's the case, then how am I able to then store a much longer string in the same location? I expected it to crash when doing this due to trying to cram a long string into a space set aside for the shorter one.
Obviously there's a basic misunderstanding of what's going on here so I appreciate any help understanding this.

You're not changing the content of the memory, you're changing the value of the pointer to point to a different area of memory which holds "this is a very very very very very very very very very very very long string".
Note that char* myString only allocates enough bytes for the pointer (usually 4 or 8 bytes). When you do char* myString = "this is a test";, what actually happened was that before your program even started, the compiler allocated space in the executable image and put "this is a test" in that memory. Then when you do char* myString = "this is a test"; what it actually does is just allocate enough bytes for the pointer, and make the pointer point to that memory it had already allocated at compile time, in the executable.
So if you like diagrams:
char* myString = "this is a test";
(allocate memory for myString)
---> "this is a test"
/
myString---
"this is a very very very very very very very very very very very long string"
Then
myString = "this is a very very very very very very very very very very very long string";
"this is a test"
myString---
\
---> "this is a very very very very very very very very very very very long string"

There are two strings in the memory. First is "this is a test" and lets say it begins at the address 0x1000. The second is "this is a very very ... test" and it begins at the address 0x1200.
By
char* myString = "this is a test";
you crate a variable called myString and assign address 0x1000 to it. Then, by
myString = "this is a very very ... test";
you assign 0x1200. By
cout << myString << endl;
you just print the string beginning at 0x1200.

You have two string literals of type const char[n]. These can be assigned to a variable of type char*, which is nothing more than a pointer to a char. Whenever you declare a variable of type pointer-to-T you are only declaring the pointer, and not the memory to which it points.
The compiler reserves memory for both literals and you just take your pointer variable and point it at those literals one after the other. String literals are read-only and their allocation is taken care of by the compiler. Typically they are stored in the executable image in protected read-only memory. A string literal typically has a lifetime equal to that of the program itself.
Now, it would be UB if you attempted to modify the contents of a literal, but you don't. To help prevent yourself from attempting modifications in error you would be wise to declare your variable as const char*.

During program execution, a block of memory containing "this is a test" is allocated, and the address of the first character in that block of memory is assigned to the myString variable. In the next line, a separate block of memory containing "this is a very very..." is allocated, and the address of the first character in that block of memory is now assigned to the myString variable, replacing the address it used to store with the new address to the "very very long" string.
just for illustration, let's say the first block of memory looks like this:
[t][h][i][s][ ][i][s][ ][a][ ][t][e][s][t]
and let's just say the address of this first 't' character in this sequence/array of characters is 0x100.
so after the first assignment of the myString variable, the myString variable contains the address 0x100, which points to the first letter of "this is a test".
then, a totally different block of memory contains:
[t][h][i][s][ ][i][s][ ][a][ ][v][e][r][r][y]...
and let's just say that the address of this first 't' character is 0x200.
so after the second assignment of the myString variable, the myString variable NOW contains the address 0x200, which points to the first letter of "this is a very very very...".
Since myString is just a pointer to a character (hence: "char *" is it's type), it only stores the address of a character; it has no concern for how big the array is supposed to be, it doesn't even know that it is pointing to an "array", only that it is storing the address of a character...
for example, you could legally do this:
char myChar = 'C';
/* assign the address of the location in
memory in which 'C' is stored to
the myString variable. */
myString = &myChar;
Hopefully that was clear enough. If so, upvote/accept answer. If not, please comment so that I may clarify.

string literals do not require allocation - they are stored as-is and can be used directly. Essentially myString was a pointer to one string literal, and was changed to point to another string literal.

char* means a pointer to a block of memory that holds a character.
C style string functions get a pointer to the start of a string. They assume there's a sequence of characters that end with a 0-null character (\n).
So what the << operator actually does is loop from that first character position until it finds a null character.

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?
Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.
A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.
The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.
Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

Access violation at modifying char* by dereferencing [duplicate]

I read this on wikipedia
int main(void)
{
char *s = "hello world";
*s = 'H';
}
When the program containing this code is compiled, the string "hello world" is placed in the section of the program executable file marked as read-only; when loaded, the operating system places it with other strings and constant data in a read-only segment of memory. When executed, a variable, s, is set to point to the string's location, and an attempt is made to write an H character through the variable into the memory, causing a segmentation fault**
i don't know why the string is placed in read only segment.please someone could explain this.
String literals are stored in read-only memory, that's just how it works. Your code uses a pointer initialized to point at the memory where a string literal is stored, and thus you can't validly modify that memory.
To get a string in modifiable memory, do this:
char s[] = "hello world";
then you're fine, since now you're just using the constant string to initialize a non-constant array.
There is a big difference between:
char * s = "Hello world";
and
char s[] = "Hello world";
In the first case, s is a pointer to something that you can't change. It's stored in read-only memory (typically, in the code section of your application).
In the latter case, you allocate an array in read-write memory (typically plain RAM), that you can modify.
When you do: char *s = "hello world"; then s is a pointer that points to a memory that is in the code part, so you can't change it.
When you do: char s[] = "Hello World"; then s is an array of chars
that are on the stack, so you can change it.
If you don't want the string to be changed during the program, it is better to do: char
const *s = ....;. Then, when you try to change the string, your program will not crash with segmentation fault, it will arise a compiler error (which is much better).
first have a good understanding of pointers, I will give u a short demo:
First let us analyze your code line by line. Lets start from main onwards
char *s = "Some_string";
first of all, you are declaring a pointer to a char variable, now *s is a address in memory, and C will kick you if you try to change its memory value, thats illegal, so u better declare a character array, then assign s to its address, then change s.
Hope you get, it. For further reference and detailed understanding, refer KN King: C programming A Modern Approach
Per the language definition, string literals have to be stored in such a way that their lifetime extends over the lifetime of the program, and that they are visible over the entire program.
Exactly what this means in terms of where the string gets stored is up to the implementation; the language definition does not mandate that string literals are stored in read-only memory, and not all implementations do so. It only says that attempting to modify the contents of a string literal results in undefined behavior, meaning the implementation is free to do whatever it wants.

TCHAR pointer initialization

I am trying to understand the following code:
const TCHAR * portName = "COM15";
I understand that a TCHAR is either a Char (in ANSI) or a wChar (in Unicode), basically a 1 byte or 2 byte container that represents something.
Now, if I declare a pointer to a const TCHAR called portName, portName is then a pointer. When I use the "=" sign, I am giving that pointer a value, and it seems irrational to me that "COM15" would be the address. I assume that line of code is giving me a pointer to the location of the beginning of the "COM15" string of characters, correct?
So what is actually happening in that line of code?
Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?
"Is a string of characters ("COM15") being created and the "=" sign actually means that the location of the beginning of that string is being given to portName?"
Yes, exactly. But other than it sounds from your question as you might have expected, this happens when the program is compiled, and not at run time. Also the const keyword prohibits changing that pointer at runtime later.
This how C works:
When you say char * str1 in C, you are allocating a pointer in the memory. When you write str1 = "Hello";, you are creating a string literal in memory and making the pointer point to it.
When you create another string literal "new string" and assign it to str1, all you are doing is changing where the pointer points.

confusion about char pointer in c++

I'm new in c++ language and I am trying to understand the pointers concept.
I have a basic question regarding the char pointer,
What I know is that the pointer is a variable that stores an address value,
so when I write sth like this:
char * ptr = "hello";
From my basic knowledge, I think that after = there should be an address to be assigned to the pointer, but here we assign "hello" which is set of chars.
So what does that mean ?
Is the pointer ptr points to an address that stores "hello"? or does it store the hello itself?
Im so confused, hope you guys can help me..
Thanks in advance.
ptr holds the address to where the literal "hello" is stored at. In this case, it points to a string literal. It's an immutable array of characters located in static (most commonly read-only) memory.
You can make ptr point to something else by re-assigning it, but before you do, modifying the contents is illegal. (its type is actually const char*, the conversion to char* is deprecated (and even illegal in C++11) for C compatibility.
Because of this guarantee, the compiler is free to optimize for space, so
char * ptr = "hello";
char * ptr1 = "hello";
might yield two equal pointers. (i.e. ptr == ptr1)
The pointer is pointing to the address where "hello" is stored. More precisely it is pointing the 'h' in the "hello".
"hello" is a string literal: a static array of characters. Like all arrays, it can be converted to a pointer to its first element, if it's used in a context that requires a pointer.
However, the array is constant, so assigning it to char* (rather than const char*) is a very bad idea. You'll get undefined behaviour (typically an access violation) if you try to use that pointer to modify the string.
The compiler will "find somewhere" that it can put the string "hello", and the ptr will have the address of that "somewhere".
When you create a new char* by assigning it a string literal, what happens is char* gets assigned the address of the literal. So the actual value of char* might be 0x87F2F1A6 (some hex-address value). The char* points to the start (in this case the first char) of the string. In C and C++, all strings are terminated with a /0, this is how the system knows it has reached the end of the String.
char* text = "Hello!" can be thought of as the following:
At program start, you create an array of chars, 7 in length:
{'H','e','l','l','o','!','\0'}. The last one is the null character and shows that there aren't any more characters after it. [It's more efficient than keeping a count associated with the string... A count would take up perhaps 4 bytes for a 32-bit integer, while the null character is just a single byte, or two bytes if you're using Unicode strings. Plus it's less confusing to have a single array ending in the null character than to have to manage an array of characters and a counting variable at the same time.]
The difference between creating an array and making a string constant is that an array is editable and a string constant (or 'string literal') is not. Trying to set a value in a string literal causes problems: they are read-only.
Then, whenever you call the statement char* text = "Hello!", you take the address of that initial array and stick it into the variable text. Note that if you have something like this...
char* text1 = "Hello!";
char* text2 = "Hello!";
char* text3 = "Hello!";
...then it's quite possible that you're creating three separate arrays of {'H','e','l','l','o','!','\0'}, so it would be more efficient to do this...
char* _text = "Hello!";
char* text1 = _text;
char* text2 = _text;
char* text3 = _text;
Most compilers are smart enough to only initialize one string constant automatically, but some will only do that if you manually turn on certain optimization features.
Another note: from my experience, using delete [] on a pointer to a string literal doesn't cause issues, but it's unnecessary since as far as I know it doesn't actually delete it.

Dynamic Memory Allocation

I have a small confusion in the dynamic memory allocation concept.
If we declare a pointer say a char pointer, we need to allocate adequate memory space.
char* str = (char*)malloc(20*sizeof(char));
str = "This is a string";
But this will also work.
char* str = "This is a string";
So in which case we have to allocate memory space?
In first sample you have memory leak
char* str = (char*)malloc(20*sizeof(char));
str = "This is a string"; // memory leak
Allocated address will be replaced with new.
New address is an address for "This is a string".
And you should change second sample.
const char* str = "This is a string";
Because of "This is a string" is write protected area.
The presumably C++98 code snippet
char* str = (char*)malloc(20*sizeof(char));
str = "This is a string";
does the following: (1) allocates 20 bytes, storing the pointer to that memory block in str, and (2) stores a pointer to a literal string, in str. You now have no way to refer to the earlier allocated block, and so cannot deallocate it. You have leaked memory.
Note that since str has been declared as char*, the compiler cannot practically detect if you try to use to modify the literal. Happily, in C++0x this will not compile. I really like that rule change!
The code snippet
char* str = "This is a string";
stores a pointer to a string literal in a char* variable named str, just as in the first example, and just as that example it won't compile with a C++0x compiler.
Instead of this sillyness, use for example std::string from the standard library, and sprinkle const liberally throughout your code.
Cheers & hth.,
In the first example, you dynamically allocated memory off the heap. It can be modified, and it must be freed. In the second example, the compiler statically allocated memory, and it cannot be modified, and must not be freed. You must use a const char*, not a char*, for string literals to reflect this and ensure safe usage.
Assigning to a char* variable makes it point to something else, so why did you allocate the memory in the first place if you immediately forget about it? That's a memory leak. You probably meant this:
char* str = (char*)malloc(20*sizeof(char));
strcpy(str, "This is a string");
// ...
free(str);
This will copy the second string to the first.
Since this is tagged C++, you should use a std::string:
#include <string>
std::string str = "This is a string";
No manual memory allocation and release needed, and assignment does what you think it does.
String literals are a special case in the language. Let's look closer at your code to understand this better:
First, you allocate a buffer in memory, and assign the address of that memory to str:
char* str = (char*)malloc(20*sizeof(char));
Then, you assign a string literal to str. This will overwrite what str held previously, so you will lose your dynamically allocated buffer, incidentally causing a memory leak. If you wanted to modify the allocated buffer, you would need at some point to dereference str, as in str[0] = 'A'; str[1] = '\0';.
str = "This is a string";
So, what is the value of str now? The compiler puts all string literals in static memory, so the lifetime of every string literal in the program equals the lifetime of the entire program. This statement is compiled to a simple assignment similar to str = (char*)0x1234, where 0x1234 is supposed to be the address at which the compiler has put the string literal.
That explains why this works well:
char* str = "This is a string";
Please also note that the static memory is not to be changed at runtime, so you should use const char* for this assignment.
So in which case we have to allocate memory space?
In many cases, for example when you need to modify the buffer. In other words; when you need to point to something that could not be a static string constant.
I want to add to Alexey Malistov's Answer by adding that you can avoid memory leak in your first example by copying "This is a string" to str as in the following code:
char* str = (char*)malloc(20*sizeof(char));
strcpy(str, "This is a string");
Please, note that by can I don't mean you have to. Its just adding to an answer to add value to this thread.
In the first example you're just doing things wrong. You allocate dynamic memory on the heap and let str point to it. Then you just let str point to a string literal and the allocated memory is leaked (you don't copy the string into the allocated memory, you just change the address str is pointing at, you would have to use strcpy in the first example).