memory leak, pointer to literal - c++

I've learned not to let pointer point to literals because it causes memory leaks. But when I assign a pointer to a literal, it still points to the same address than before:
unsigned maxlen = 20;
char* testpointer = new char[sizeof(char) * maxlen]; //Pointer points to RAM
cout << "&testpointer = " << &testpointer << endl;
strncpy(testpointer, "Happy Eastern", 13);
cout << "&testpointer = " << &testpointer << endl;
testpointer = "Merry Christmas"; // I know I shouldn't do this
cout << "&testpointer = " << &testpointer << endl;
I still get the same memory address each time:
&testpointer = 0x28fc60
&testpointer = 0x28fc60
&testpointer = 0x28fc60
Shouldn't the address change when I let the pointer point to a literal?
I thought the memory which I allocated with new should be in RAM while the literal should be in ROM, which should have a different address. Am I wrong?
Thank you, Philipp

Your cout instructions are printing the address of the variable called testpointer. That's some location in the stack frame of the current function. It has nothing to do with the value of testpointer, nor with the value pointed by testpointer.
Also, either whoever told you that you should not let a pointer point to a literal was mad, or you did not understand what they said to you. There is absolutely no problem with letting a pointer point to a literal.

&testpointer is a pointer to the variable testpointer. It's where the variable itself is stored.
If you want to print where testpointer is pointing, you print its value (as a void* since otherwise operator<< will print it as a string):
std::cout << "testpointer = " << static_cast<void*>(testpointer) << '\n';
Also note that on modern computers there's really no ROM. The executable image is loaded from the disk into the virtual memory ("RAM") and that includes data such as constant string literals (which are really arrays of constant characters).
Also, you can have pointers to constant string literals, but they should really be pointers to const char since constant string literals are constant. The problem is with the reassignment of the variable. You would get the same problem with e.g.
unsigned maxlen = 20;
char* testpointer = new char[sizeof(char) * maxlen];
// ... stuff happens...
testpointer = new char[some_other_size];
If there's no delete[] before the second new[], then you have a memory leak.
Finally a warning about your use of std::strncpy: It will not add the terminating '\0' at the end. That's because your supplied size (the third argument) is smaller or equal to the length of the source string, in which case the function will not add the terminator. So don't attempt to print the contents of the "string" or use it as a "string".

There's two misconceptions in your question.
One, you're printing the address of testpointer instead of its value, so it's obviously not changing. If you replace &testpointer with static_cast<void*>(testpointer), you will see the difference. Note that the cast is necessary, because << is overloaded for char* to print the characters instead of the pointer itself.
Two,
not to let pointer point to literals because it causes memory leaks
is simply not true. A leak happens if and only if you have some dynamically allocated memory and lose any reference to that memory; in other words, if you no longer have a pointer to that memory. In such case, you no longer have a way to deallocate that memory, and hence you leak it.
This happens in your program by doing this sequence of operations:
char* testpointer = new char[sizeof(char) * maxlen];
testpointer = "Merry Christmas";
Either one of them is fine on its own(1), but together, they cause a leak:
First, you allocate memory
Then, you forget its address (by pointing the pointer elsewhere).
Note that a literal being involved is irrelevant. This would be exactly the same leak:
char* testpointer = new char[sizeof(char) * maxlen];
testpointer = nullptr;
As would this:
char* testpointer = new char[sizeof(char) * maxlen];
testpointer = new char[sizeof(char) * maxlen];
(1) Except you're pointing a char * at a string literal, which is not allowed since C++11, because string literals are const. You'd need a const char * for that.

I've learned not to let pointer point to literals because it causes memory leaks.
Pointing to string literals does not cause memory leaks. It's fine to point to a string literal.
Incidentally, your program does leak memory.
But when I assign a pointer to a literal
Your program does not assign the pointer after initialization.
it still points to the same address than before:
You don't stream the address that the pointer points at. You use addressof operator on the pointer variable, so you stream the address where the pointer is stored. This wouldn't change even if you did assign the pointer.
Shouldn't the address change when I let the pointer point to a literal?
The address of the pointer variable wouldn't change. But the address it points to (i.e. the value of the pointer) would change. But you neither point to a string literal, nor do you observe the address that is pointed at.

Related

Initialization of pointers in c++

I need to clarify my concepts regarding the basics of pointer initialization in C++. As per my understanding, a pointer must be assigned an address before putting some value using the pointer.
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This would probably show the correct output (10) but this may cause issue in larger programs since p initially had garbage address which can be anything & may later be used somewhere else in the program as well.So , I believe this is incorrrect, the correct way is:
int *p;
int x=10;
p=&x; //appropriate
cout << *p <<"\n";
My question is, if the above understanding is correct, then does the same apply on char* as well?:
const char *str="hello"; // inappropriate
cout << str << "\n";
//OR
const string str1= "hello";
const char str2[6] ="world";
const char *str=str1; //appropriate
const char *st=str2; //appropriate
cout << str << st << "\n";
Please advice
Your understanding of strings is incorrect.
Lets take for example the very first line:
const char *str="hello";
This is actually correct. A string literal like "hello" is turned into a constant array by the compiler, and like all arrays it can decay to a pointer to its first element. So what you are doing is making str point to the first character of the array.
Then lets continue with
const string str1= "hello";
const char *str=str1;
This is actually wrong. A std::string object have no casting operator defined to cast to a const char *. The compiler will give you an error for this. You need to use the c_str function go get a pointer to the contained string.
Lastly:
const char str2[6] ="world";
const char *st=str2; //appropriate
This is really no different than the first line when you declare and initialize str. This is, as you say, "appropriate".
About that first example with the "inappropriate" pointer:
int *p;
*p=10; //inappropriate
cout << *p <<"\n";
This is not only "inappropriate", this leads to undefined behavior and may actually crash your program. Also, the correct term is that the value of p is indeterminate.
When I declare a pointer
int *p;
I get an object p whose values are addresses. No ints are created anywhere. The thing you need to do is think of p as being an address rather than being an int.
At this point, this isn't particularly useful since you have no addresses you could assign to it other than nullptr. Well, technically that's not true: p itself has an address which you can get with &p and store it in an int**, or even do something horrible like p = reinterpret_cast<int*>(&p);, but let's ignore that.
To do something with ints, you need to create one. e.g. if you go on to declare
int x;
you now have an int object whose values are integers, and we could then assign its address to p with p = &x;, and then recover the object from p via *p.
Now, C style strings have weird semantics — the weirdest aspect being that C doesn't actually have strings at all: it's always working with arrays of char.
String literals, like "Hello!", are guaranteed to (act1 like they) exist as an array of const char located at some address, and by C's odd conversion rules, this array automatically converts to a pointer to its first element. Thus,
const char *str = "hello";
stores the address of the h character in that character array. The declaration
const char str2[6] ="world";
works differently; this (acts1 like it) creates a brand new array, and copies the contents of the string literal "world" into the new array.
As an aside, there is an obsolete and deprecated feature here for compatibility with legacy programs, but for some misguided reason people still use it in new programs these days so you should be aware of it and that it's 'wrong': you're allowed to break the type system and actually write
char *str = "hello";
This shouldn't work because "hello" is an array of const char, but the standard permits this specific usage. You're still not actually allowed to modify the contents of the array, however.
1: By the "as if" rule, the program only has to behave as if things happen as I describe, but if you peeked at the assembly code, the actual way things happen can be very different.

What is the right way to handle char* strings?

I have a third party library that is using char* (non-const) as placeholder for string values. What is the right and safe way to assign values to those datatypes? I have the following test benchmark that uses my own timer class to measure execution times:
#include "string.h"
#include <iostream>
#include <sj/timer_chrono.hpp>
using namespace std;
int main()
{
sj::timer_chrono sw;
int iterations = 1e7;
// first method gives compiler warning:
// conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
cout << "creating c-strings safe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = new char[strlen("teststr")];
strcpy(str, "teststring");
}
sw.stop();
cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;
return 0;
}
Output:
creating c-strings unsafe(?) way...
1.9164 ns
creating c-strings safe(?) way...
31.7406 ns
While the "safe" way get's rid of the compiler warning it makes the code about 15-20 times slower according to this benchmark (1.9 nanoseconds per iteration vs 31.7 nanoseconds per iteration). What is the correct way and what are is so dangerous about that "deprecated" way?
The C++ standard is clear:
An ordinary string literal has type “array of n const char” (section 2.14.5.8 in C++11).
and
The effect of attempting to modify a string literal is undefined (section 2.14.5.12 in C++11).
For a string known at compile time, the safe way of obtaining a non-const char* is this
char literal[] = "teststring";
you can then safely
char* ptr = literal;
If at compile time you don't know the string but know its length you can use an array:
char str[STR_LENGTH + 1];
If you don't know the length then you will need to use dynamic allocation. Make sure you deallocate the memory when the strings are no longer needed.
This will work only if the API doesn't take ownership of the char* you pass.
If it tries to deallocate the strings internally then it should say so in the documentation and inform you on the proper way to allocate the strings. You will need to match your allocation method with the one used internally by the API.
The
char literal[] = "test";
will create a local, 5 character array with automatinc storage (meaning the variable will be destroyed when the execution leaves the scope in which the variable is declared) and initialize each character in the array with the characters 't', 'e', 's', 't' and '\0'.
You can later edit these characters: literal[2] = 'x';
If you write this:
char* str1 = "test";
char* str2 = "test";
then, depending on the compiler, str1 and str2 may be the same value (i.e., point to the same string).
("Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined." in Section 2.14.5.12 of the C++ standard)
It may also be true that they are stored in a read-only section of memory and therefore any attempt to modify the string will result in an exception/crash.
They are also, in reality of the type const char* so this line:
char* str = "test";
actually casts away the const-ness on the string, which is why the compiler will issue the warning.
The unsafe way is the way to go for all strings that are known at compile-time.
Your "safe" way leaks memory and is rather horrific.
Normally you'd have a sane C API which accepts const char *, so you could use a proper safe way in C++, i.e. std::string and its c_str() method.
If your C API assumes ownership of the string, your "safe way" has another flaw: you can't mix new[] and free(), passing memory allocated using the C++ new[] operator to a C API which expects to call free() on it is not allowed. If the C API doesn't want to call free() later on the string, it should be fine to use new[] on the C++ side.
Also, this is a strange mixture of C++ and C.
You seem to have a fundamental misunderstanding about C strings here.
cout << "creating c-strings unsafe(?) way..." << endl;
sw.start();
for (int i = 0; i < iterations; ++i)
{
char* str = "teststring";
}
Here, you're just assigning a pointer to a string literal constant. In C and C++, string literals are of type char[N], and you can assign a pointer to a string literal array because of array "decay". (However, it's deprecated to assign a non-const pointer to a string literal.)
But assigning a pointer to a string literal can't be what you want to do. Your API expects a non-const string. String literals are const.
What is the right and safe way to assign values to those [char* strings]?
There's no general answer to this question. Whenever you work with C strings (or pointers in general), you need to deal with the concept of ownership. C++ takes care of this for you automatically with std::string. Internally, std::string owns a pointer to a char* array, but it manages the memory for you so you don't need to care about it. But when you use raw C-strings, you DO need to put thought into managing the memory.
How you manage the memory depends on what you're doing with your program. If you allocate a C-string with new[], then you need to deallocate it with delete[]. If you allocate it with malloc, then you must deallocate it with free(). A good solution for working with C-strings in C++ is to use a smart pointer which takes ownership of the allocated C string. (But you'll need to use a deleter that deallocates the memory with delete[]). Or you can just use std::vector<char>. As always, don't forget to allocate room for the terminating null char.
Also, the reason your 2nd loop is so much slower is because it allocates memory in each iteration, whereas the 1st loop simply assigns a pointer to a statically-allocated string literal.

Copy string form char pointer to char pointer

char * p_one = "this is my first char pointer";
char * p_two= "this is second";
strcpy(p_one ,p_two);
consider the above code. This is giving access violation error.
So please help to understand
where is the current "this is my first char pointer" string stored in memory? heap or stack
why I need to allocate memory for p_one before call strcpy, even it's already storing the first string. why "this is second" string cannot copy to same location?
If I allocate memory for p_one before call strcpy then what happen to "this is my first char pointer" string that was pointed by p_one ? is it keep in memory?
How strcpy knows specific pointer have allocated memory or not?
Implementation defined(usually read only) memory.[Ref 1]
You do not need to as long as you don't modify the source string literal.
If you allocate memory to p_one, then it will point to the newly allocated memory region, the string literal may/may not stay in the memory, but it is guaranteed to be alive throughout the lifetime of the program.String literals have static duration lifetime.[Ref 2]
It doesn't. It is users responsibility to ensure that.
Good Read:
[Ref 1]
What is the difference between char a[] = ?string?; and char *p = ?string?;?
[Ref 2]
"life-time" of string literal in C
First off your compiler should be warning that the p_one and p_two are actually const char * because the compiler allocates the storage of this string at compile time.
The reason you cannot modify them is because in theory you could overwrite memory after them, this is what causes hack attack with a stackoverflow.
Also the compiler could be smart and realize that you you use this string in 10 places but notices it is the same, so modifying from one place changes it - but that destroys the logic of the other 9 places that uses it
Answering all the questions in order
It's bit straight forward that your char pointer is always stored in stack. Remember even though you are using Memory allocation, it is only for determining the length of the string and appending the '\0' character.
This would be one solution, according to code you have mentioned:
int main()
{
char * p_one = "this is my first char pointer";
char * p_two= "this is second";
size_t keylen=strlen(p_two);
p_one=(char *)malloc(keylen*sizeof(char));
strncpy(p_one ,p_two,strlen(p_one));
printf("%s",p_one);
return 0;
}
When you have declared a char pointer it only points to the memory allocation. So string copy doesn't point to the end of character. Hence it is always better to use strncpy, in this conditions.
Yes it is allocating memory.
it is bad practice to cast the result of malloc as you will inhibit possible runtime errors being thrown, thanks Gewure
When you have a string literal in your code like that, you need to think of it as a temporary constant value. Sure, you assigned it to a char*, but that does not mean you are allowed to modify it. Nothing in the C specification says this is legal.
On the other hand, this is okay:
const size_t MAX_STR = 50;
char p_one[MAX_STR] = "this is my first char pointer";
const char *p_two = "this is second";
strcpy( p_one, p_two );

Basic c-style string memory allocation

I am working on a project with existing code which uses mainly C++ but with c-style strings. Take the following:
#include <iostream>
int main(int argc, char *argv[])
{
char* myString = "this is a test";
myString = "this is a very very very very very very very very very very very long string";
cout << myString << endl;
return 0;
}
This compiles and runs fine with the output being the long string.
However I don't understand WHY it works. My understanding is that
char* myString
is a pointer to an area of memory big enough to hold the string literal "this is a test". If that's the case, then how am I able to then store a much longer string in the same location? I expected it to crash when doing this due to trying to cram a long string into a space set aside for the shorter one.
Obviously there's a basic misunderstanding of what's going on here so I appreciate any help understanding this.
You're not changing the content of the memory, you're changing the value of the pointer to point to a different area of memory which holds "this is a very very very very very very very very very very very long string".
Note that char* myString only allocates enough bytes for the pointer (usually 4 or 8 bytes). When you do char* myString = "this is a test";, what actually happened was that before your program even started, the compiler allocated space in the executable image and put "this is a test" in that memory. Then when you do char* myString = "this is a test"; what it actually does is just allocate enough bytes for the pointer, and make the pointer point to that memory it had already allocated at compile time, in the executable.
So if you like diagrams:
char* myString = "this is a test";
(allocate memory for myString)
---> "this is a test"
/
myString---
"this is a very very very very very very very very very very very long string"
Then
myString = "this is a very very very very very very very very very very very long string";
"this is a test"
myString---
\
---> "this is a very very very very very very very very very very very long string"
There are two strings in the memory. First is "this is a test" and lets say it begins at the address 0x1000. The second is "this is a very very ... test" and it begins at the address 0x1200.
By
char* myString = "this is a test";
you crate a variable called myString and assign address 0x1000 to it. Then, by
myString = "this is a very very ... test";
you assign 0x1200. By
cout << myString << endl;
you just print the string beginning at 0x1200.
You have two string literals of type const char[n]. These can be assigned to a variable of type char*, which is nothing more than a pointer to a char. Whenever you declare a variable of type pointer-to-T you are only declaring the pointer, and not the memory to which it points.
The compiler reserves memory for both literals and you just take your pointer variable and point it at those literals one after the other. String literals are read-only and their allocation is taken care of by the compiler. Typically they are stored in the executable image in protected read-only memory. A string literal typically has a lifetime equal to that of the program itself.
Now, it would be UB if you attempted to modify the contents of a literal, but you don't. To help prevent yourself from attempting modifications in error you would be wise to declare your variable as const char*.
During program execution, a block of memory containing "this is a test" is allocated, and the address of the first character in that block of memory is assigned to the myString variable. In the next line, a separate block of memory containing "this is a very very..." is allocated, and the address of the first character in that block of memory is now assigned to the myString variable, replacing the address it used to store with the new address to the "very very long" string.
just for illustration, let's say the first block of memory looks like this:
[t][h][i][s][ ][i][s][ ][a][ ][t][e][s][t]
and let's just say the address of this first 't' character in this sequence/array of characters is 0x100.
so after the first assignment of the myString variable, the myString variable contains the address 0x100, which points to the first letter of "this is a test".
then, a totally different block of memory contains:
[t][h][i][s][ ][i][s][ ][a][ ][v][e][r][r][y]...
and let's just say that the address of this first 't' character is 0x200.
so after the second assignment of the myString variable, the myString variable NOW contains the address 0x200, which points to the first letter of "this is a very very very...".
Since myString is just a pointer to a character (hence: "char *" is it's type), it only stores the address of a character; it has no concern for how big the array is supposed to be, it doesn't even know that it is pointing to an "array", only that it is storing the address of a character...
for example, you could legally do this:
char myChar = 'C';
/* assign the address of the location in
memory in which 'C' is stored to
the myString variable. */
myString = &myChar;
Hopefully that was clear enough. If so, upvote/accept answer. If not, please comment so that I may clarify.
string literals do not require allocation - they are stored as-is and can be used directly. Essentially myString was a pointer to one string literal, and was changed to point to another string literal.
char* means a pointer to a block of memory that holds a character.
C style string functions get a pointer to the start of a string. They assume there's a sequence of characters that end with a 0-null character (\n).
So what the << operator actually does is loop from that first character position until it finds a null character.

Difference between using character pointers and character arrays

Basic question.
char new_str[]="";
char * newstr;
If I have to concatenate some data into it or use string functions like strcat/substr/strcpy, what's the difference between the two?
I understand I have to allocate memory to the char * approach (Line #2). I'm not really sure how though.
And const char * and string literals are the same?
I need to know more on this. Can someone point to some nice exhaustive content/material?
The excellent source to clear up the confusion is Peter Van der Linden, Expert C Programming, Deep C secrets - that arrays and pointers are not the same is how they are addressed in memory.
With an array, char new_str[]; the compiler has given the new_str a memory address that is known at both compilation and runtime, e.g. 0x1234, hence the indexing of the new_str is simple by using []. For example new_str[4], at runtime, the code picks the address of where new_str resides in, e.g. 0x1234 (that is the address in physical memory). by adding the index specifier [4] to it, 0x1234 + 0x4, the value can then be retrieved.
Whereas, with a pointer, the compiler gives the symbol char *newstr an address e.g. 0x9876, but at runtime, that address used, is an indirect addressing scheme. Supposing that newstr was malloc'd newstr = malloc(10);, what is happening is that, everytime a reference in the code is made to use newstr, since the address of newstr is known by the compiler i.e. 0x9876, but what is newstr pointing to is variable. At runtime, the code fetches data from physical memory 0x9876 (i.e. newstr), but at that address is, another memory address (since we malloc'd it), e.g 0x8765 it is here, the code fetches the data from that memory address that malloc assigned to newstr, i.e. 0x8765.
The char new_str[] and char *newstr are used interchangeably, since an zeroth element index of the array decays into a pointer and that explains why you could newstr[5] or *(newstr + 5) Notice how the pointer expression is used even though we have declared char *newstr, hence *(new_str + 1) = *newstr; OR *(new_str + 1) = newstr[1];
In summary, the real difference between the two is how they are accessed in memory.
Get the book and read it and live it and breathe it. Its a brilliant book! :)
Please go through this article below:
Also see in case of array of char like in your case, char new_str[] then the new_str will always point to the base of the array. The pointer in itself can't be incremented. Yes you can use subscripts to access the next char in array eg: new_str[3];
But in case of pointer to char, the pointer can be incremented new_str++ to fetch you the next character in the array.
Also I would suggest this article for more clarity.
This is a character array:
char buf [1000];
So, for example, this makes no sense:
buf = &some_other_buf;
This is because buf, though it has characteristics of type pointer, it is already pointing to the only place that makes sense for it.
char *ptr;
On the other hand, ptr is only a pointer, and may point somewhere. Most often, it's something like this:
ptr = buf; // #1: point to the beginning of buf, same as &buf[0]
or maybe this:
ptr = malloc (1000); // #2: allocate heap and point to it
or:
ptr = "abcdefghijklmn"; // #3: string constant
For all of these, *ptr can be written to—except the third case where some compiling environment define string constants to be unwritable.
*ptr++ = 'h'; // writes into #1: buf[0], #2: first byte of heap, or
// #3 overwrites "a"
strcpy (ptr, "ello"); // finishes writing hello and adds a NUL
The difference is that one is a pointer, the other is an array. You can, for instance, sizeof() array. You may be interested in peeking here
If you're using C++ as your tags indicate, you really should be using the C++ strings, not the C char arrays.
The string type makes manipulating strings a lot easier.
If you're stuck with char arrays for some reason, the line:
char new_str[] = "";
allocates 1 byte of space and puts a null terminator character into it. It's subtly different from:
char *new_str = "";
since that may give you a reference to non-writable memory. The statement:
char *new_str;
on its own gives you a pointer but nothing that it points to. It can also have a random value if it's local to a function.
What people tend to do (in C rather than C++) is to do something like:
char *new_str = malloc (100); // (remember that this has to be freed) or
char new_str[100];
to get enough space.
If you use the str... functions, you're basically responsible for ensuring that you have enough space in the char array, lest you get all sorts of weird and wonderful practice at debugging code. If you use real C++ strings, a lot of the grunt work is done for you.
The type of the first is char[1], the second is char *. Different types.
Allocate memory for the latter with malloc in C, or new in C++.
char foo[] = "Bar"; // Allocates 4 bytes and fills them with
// 'B', 'a', 'r', '\0'.
The size here is implied from the initializer string.
The contents of foo are mutable. You can change foo[i] for example where i = 0..3.
OTOH if you do:
char *foo = "Bar";
The compiler now allocates a static string "Bar" in readonly memory and cannot be modified.
foo[i] = 'X'; // is now undefined.
char new_str[]="abcd";
This specifies an array of characters (a string) of size 5 bytes (one byte for each character plus one for the null terminator). So it stores the string 'abcd' in memory and we can access this string using the variable new_str.
char *new_str="abcd";
This specifies a string 'abcd' is stored somewhere in the memory and the pointer new_str points to the first character of that string.
To differentiate them in the memory allocation side:
// With char array, "hello" is allocated on stack
char s[] = "hello";
// With char pointer, "hello" is stored in the read-only data segment in C++'s memory layout.
char *s = "hello";
// To allocate a string on heap, malloc 6 bytes, due to a NUL byte in the end
char *s = malloc(6);
s = "hello";
If you're in c++ why not use std::string for all your string needs? Especially anything dealing with concatenation. This will save you from a lot of problems.