Understanding pointers and local scope [duplicate] - c++

This question already has answers here:
How to access a local variable from a different function using pointers?
(10 answers)
Closed 8 years ago.
Suppose I have the following functions:
char* allocateMemory()
{
char str[20] = "Hello world.";
return str;
}
int* another()
{
int x = 5;
return &x;
}
int _tmain(int argc, _TCHAR* argv[])
{
char* pString = allocateMemory();
printf("%s\n", pString);
int* blah = another();
printf("%d %d \n", blah, *blah);
return 0;
}
The first printf prints random values, because str IS LOCAL SCOPE.
The second printf prints the proper values, with blah = address of blah, *blah = 5
Why is it that local scope only affects allocateMemory which deals with arrays, but not integer?
Why does the first printf (returning char* ) prints random values and is affected by local scope, but not the second one (returning int* )?

Both ways of accessing the local variables of a method which goes out of scope is Undefined Behavior. These are some valid ways:
char* allocateMemory()
{
char* str= malloc(sizeof(char) * 20); //assuming C
strcpy(str, "Hello World.");
return str; //Valid
}
const char* allocateMemory()
{
return "Hello world."; //Valid Hello World is in read only location
}
int* another()
{
int *x = malloc(sizeof(int)); //assuming C
*x = 5;
return x; //Valid
}

char str[20] = "Hello world.";
str is local to function allocateMemory() and is no more valid once you exit the function and hence accessing it out of its scope if undefined behavior.
int x = 5;
The same applies here also.
You can have your data on heap and return the pointer to it is valid.
char *allocatememory()
{
char *p = malloc(20); /* Now the memory allocated is on heap and it is accessible even after the exit of this function */
return p;
}

Change the first function to:
char* allocateMemory()
{
static char str[20] = "Hello world.";
return str;
}
and see the difference.
And now explanation:
When you return address of local data (variable or array, does not matter - it is AUTOMATIC variables) you have a risk to lose data or make a mess in the memory. It was just a good luck that integer data was correct after the second function call. But if you return address of STATIC variables - no mistakes. Also you can allocate memory from HEAP for data and return address.

These are both, of course, UB, as the other answerers said. They also gave some good ways to do what you want to do in a proper fashion. But you were asking why does this actually happen in your case. To understand it, you need to understand what happens in the stack when you call a function. I'll try to provide a really simplified explanation.
When a function is called, a new stack frame is created on top of the stack. All the data in the function is put onto the stack frame. So, for the function
char* allocateMemory()
{
char str[20] = "Hello world.";
return str;
}
The stack frame for allocateMemory will contain, besides some other stuff, the 20 elements of the string (char array) str.
For this function:
int* another()
{
int x = 5;
return &x;
}
The stack frame for another will contain the contents of the variable x.
When a function returns, the stack pointer, which marks the top of the stack, drops all the way down to where it was before a function invocation. However, the memory is still there on the stack, it doesn't get erased - it is a costy and pointless process. However, there is no longer anything protecting this memory from being overwritten by something: it has been marked "unneeded".
Now, what's the difference between your calls to printf? Well, when you call printf, it gets its own stack frame. It overwrites what was left of the previous called function's stack frame.
In the first case, you just pass pString to printf. Then printf overwrites the memory that once was the stack frame of allocateMemory, and the memory that was once str gets covered with stuff printf needs to work with string output, like iteration variables. Then it proceeds to try and get memory pointed to by the pointer you passed to it, pString... But it has just overwritten this memory, so it outputs what looks like garbage to you.
In the second case, you first got the value of the pointer blah, which resides in your local scope. Then you dereferenced it with *blah. Now comes the fun part: you've done the dereferencing before you've called another function which could overwrite the contents of the old stack frame. Which means the memory that was once the variable x in the function another is sort of still there, and by dereferencing the pointer blah, you get the value of x. And then you pass it to printf, but now, it doesn't matter that printf will overwrite another's stack frame: the values you passed to it are now sort of "safe". That's why the second call to printf outputs the values you expect.
I've heard of people who dislike using the heap so much that they use this "trick" in the following way: they form a stack array in a function and return a pointer to it, then, after the function returns, they copy its contents to an array in the caller's scope before calling any other function, and then proceed to use it. Never do this, for the sake of all the people who may read your code.

Related

Strcpy behavior with stack array c++

Here is my program :
#include <cstring>
const int SIZE =10;
int main()
{
char aName [SIZE]; // creates an array on the stack
std::strcpy(aName, "Mary");
return 0;
}
This program is obviously useless, I am just trying to understand the behavior of the strcpy function.
Here is it's signature :
char * strcpy ( char * destination, const char * source )
so when I do :
std::strcpy(aName, "Mary");
I am passing by value the variable aName. I know that the aName (in the main) contains the address of the array.
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
I am asking this because it is very confusing to me. Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Thanks!
Whenever you encounter addresses it doesn't mean it will always point to memory allocated to heap.
You can assign the address of a variable to a pointer like this
int a=5;
int *myPtr= &a;
Now, myPtr is a pointer of type integer which points to the memory of variable which is created on stack which is a have value 5.
So, whenever you create a pointer and assign the (address of) memory using new keyword, it will allocate the memory on heap. So, if I assign the value like this it will be on stack
int *myPtr= new int[5];
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
Yes.
Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Yep, usually. But not always.
Pointers to non-dynamically-allocated things are fairly rare in C++, though in C it's more common as that's the only way to have "out arguments" (C does not have references).
strcpy is a function from C's standard library.
Maybe it would help to look at an example implementation of strcpy():
char* strcpy(char* d, const char* s)
{
char* tmp = d;
while (*tmp++ = *s++)
;
return d;
}
That's really all there is to it. Copy characters from the source to the destination until the source character is null (including the null). Return the pointer to the beginning of the destination. Done.
Pointers point to memory. It doesn't matter if that memory is "stack", "heap" or "static".
Function parameters are its local variables.
In this call
std::strcpy(aName, "Mary");
the two arrays (one that is created in main with the automatic storage duration and other is the string literal that has the static storage duration) are implicitly converted to pointers to their first elements.
So you may imagine this call and the function definition the following way
std::strcpy(aName, "Mary");
// …
char * strcpy ( /* char * destination, const char * source */ )
{
char *destination = aName;
const char *source = "Mary";
// …
return destination;
}
Or even like
char *p_to_aName = &aName[0];
const char *p_to_literal = &"Mary"[0];
std::strcpy( p_to_aName, p_to_literal );
// …
char * strcpy ( /* char * destination, const char * source */ )
{
char *destination = p_to_aName;
const char *source = p_to_literal;
// …
return destination;
}
That is within the function its parameters are local variable of pointer types with the automatic storage duration that are initialized by pointers to first characters of the passed character arrays
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
Yes. That is correct. Though I probably wouldn't call it a local variable. It is a parameter. Local variable usually means something like this:
int localVariable;
The word'parameter" is often associated with things like this:
int myFunction(int parameter) {
// use parameter some where...
}
The point is roughly the same though: it creates a variable that will go out of scope once the function exits.
I am asking this because it is very confusing to me. Whenever I have encountered addresses it usually was to point to a memory allocated on the heap...
Yes, this is the most common use case for them. But it isn't their only use. Pointers are addresses, and every variable has an address in memory regardless of whether it is allocated on the "heap" or "stack."
The use here probably because pointers to a char are commonly used to store strings, particularly on older compilers. That combined with the fact that arrays "decay" into pointers, it is probably easier to work with pointers. It is also certainly more backwards compatible to do it this way.
The function could have just as easily used an array, like this:
char * strcpy ( char destination[], const char source[ )
But I'm going to assume it is easier to work with pointers here instead (Note: I don't think you can return an array in C++, so I'm still using char *. However, even if you could, I would imagine it is still easier to work with pointers anyway, so I don't think it makes a lot of difference here.).
Another common use of pointers is using them as a way to sort of "pass by reference":
void foo(int * myX) {
*myX = 4;
}
int main() {
int x = 0;
foo(&x);
std::cout << x; // prints "4"
return 0;
}
However, in modern C++, actually passing by reference is preferred to this:
void foo(int & myX) {
myX = 4;
}
int main() {
int x = 0;
foo(x);
std::cout << x; // prints "4"
return 0;
}
But I bring it up as another example to help drive the point home: memory allocated on the heap isn't the only use of pointers, merely the most common one (though actually dynamically allocated memory has been mostly replaced in modern C++ by things like std::vector, but that is beside the point here).
I know that the aName (in the main) contains the address of the array.
You knew wrong. aName is an array. It contains the elements, not an address.
But when you use the name of the array as a value such as when passing it to strcpy, it is implicitly converted to a pointer to first element of the array (the value of a pointer is the memory address of the pointed object). Such implicit conversion is called decaying.
So is this assertion correct : strcpy creates a local variable called destination that has as value the address of the array aName that I have created on the stack in the main function?
This is correct enough. To clarify: It is a function argument rather than a local variable. But the distinction is not important here. Technically, it is the caller who is responsible for pushing the arguments onto the stack or storing them into registers, so it could be considered that main "creates" the variable.
Whenever I have encountered addresses it usually was to point to a memory allocated on the heap
Pointers are not uniquely associated with "heap". Pretty much any object can be pointed at, whether it has dynamic, static or automatic storage or even if it is a subobject.

Can't iterate over an array stored in Smart Pointers

Suppose i have the following piece of code:
std::shared_ptr<char*> getString()
{
char hello[] = {'h','e','l','l','o'};
return std::make_shared<char*>(hello);
}
int main()
{
std::shared_ptr<char*> shared_str = getString();
std::cout<< (*shared_str)<<std::endl;//OK
std::cout<<(*shared_str)<<std::endl;//KO
return 0;
}
I don't know why I get just the first printing, while the second is in error. For the same reason I cannot iterate over such smart pointers like the following :
for(int i = 0; i < 5; i++)
std::cout<<(*shared_str)[i];
because also in this case, just the letter 'h' would be printed.
I am really confused about smart pointers and i didn't find that much since most of the explenations are about the handling of the life-time of referenced objects.
To summarize : error happens because the "hello" array goes out of scope, in fact, make_shared allocates memory dynamically for a char*, and stores inside the pointer "hello",however the array itself is going to die as the function geString() ends.
You have undefined behaviour in your code. This line:
return std::make_shared<char*>(hello);
assign hello to the shared pointer which you are returning, but this is a local array which does not exist after returning. Also shared_ptr will delete this pointer once its reference count reaches zero which is another UB.
The easiest solution is to use std::string:
std::shared_ptr<std::string> getString()
{
char hello[] = {'h','e','l','l','o', '\0'};
return std::make_shared<std::string>(hello);
}

Pass char pointer/array to a function

I am trying to understand char pointer in C more but one thing gets me.
Supposed I would like to pass a char pointer into a function and change the value that pointer represents. A example as followed:
int Foo (char *(&Msg1), char* Msg2, char* Msg3){
char *MsgT = (char*)malloc(sizeof(char)*60);
strcpy(MsgT,"Foo - TEST");
Msg1 = MsgT; // Copy address to pointer
strcpy(Msg2,MsgT); // Copy string to char array
strcpy(Msg3,MsgT); // Copy string to char pointer
return 0;
}
int main() {
char* Msg1; // Initial char pointer
char Msg2[10]; // Initial char array
char* Msg3 = (char*)malloc(sizeof(char) * 10); // Preallocate pointer memory
Foo(Msg1, Msg2, Msg3);
printf("Msg1: %s\n",Msg1); // Method 1
printf("Msg2: %s\n",Msg2); // Method 2
printf("Msg3: %s\n",Msg3); // Method 3
free(Msg1);
free(Msg3);
return 0;
}
In the above example, I listed all working methods I know for passing char pointer to function. The one I don't understand is Method 1.
What is the meaning of char *(&Msg1) for the first argument that is passed to the function Foo?
Also, it seems like method 2 and method3 are widely introduced by books and tutorials, and some of them even referring those methods as the most correct ways to pass arrays/pointers. I wonder that Method 1 looks very nice to me, especially when I write my API, users can easily pass a null pointer into function without preallocate memory. The only downside may be potential memory leak if users forget to free the memory block (same as method 3). Is there any reason we should prefer using Method 2 or 3 instead Method 3?
int f(char* p) is the usual way in C to pass the pointer p to the function f when p already points to the memory location that you need (usually because there is a character array already allocated there as in your Method 2 or Method 3).
int f(char** p) is the usual way in C to pass the pointer p to the function f when you want f to be able to modify the pointer p for the caller of this function. Your Method 1 is an example of this; you want f to allocate new memory and use p to tell the caller where that memory is.
int f(char*& p) is C++, not C. Since this compiles for you, we know you are using a C++ compiler.
Consider what happens when you take an argument of type int& (reference to int) :
void f(int &x) {
x++;
}
void g(int x) {
x++;
}
int main() {
int i = 5;
f(i);
assert(i == 6);
g(i);
assert(i == 6);
}
The same behaviour can be achieved by taking a pointer-to-int (int *x), and modifying it through (*x)++. The only difference in doing this is that the caller has to call f(&i), and that the caller can pass an invalid pointer to f. Thus, references are generally safer and should be preferred whenever possible.
Taking an argument of type char* (pointer-to-char) means that both the caller and the function see the same block of memory "through" that pointer. If the function modifies the memory pointed to by the char*, it will persist to the caller:
void f(char* p) {
(*p) = 'p';
p = NULL; //no efect outside the function
}
int main() {
char *s = new char[4];
strcpy(s, "die");
char *address = s; //the address which s points to
f(s);
assert(strcmp(s, "pie") == 0);
assert(s == address); //the 'value' of the variable s, meaning the actual addres that is pointed to by it, has not changed
}
Taking an argument of type char*& ( reference-to-(pointer-to-char) ) is much the same as taking int&:
If the function modifies the memory pointed to by the pointer, the caller will see it as usual. However, if the function modifies the value of the pointer (its address), the caller will also see it.
void f(char* &p) {
(*p) = 'p';
p = NULL;
}
int main() {
char *s = new char[4];
strcpy(s, "die");
char *address = s; //the address which s points to
f(s);
assert(strcmp(address, "pie") == 0); //the block that s initially pointed to was modified
assert(s == NULL); //the 'value' of the variable s, meaning the actual addres that is pointed to by it, was changed to NULL by the function
}
Again, you could take a char** (pointer-to-pointer-to-char), and modify f to use **p = 'p'; *p = NULL, and the caller would have to call f(&s), with the same implications.
Note that you cannot pass arrays by reference, i.e. if s was defined as char s[4], the call f(s) in the second example would generate a compiler error.
Also note that this only works in C++, because C has no references, only pointers.
You would usually take char** or char*& when your function needs to return a pointer to a memory block it allocated. You see char** more often, because this practice is less common in C++ than in C, where references do not exist.
As for whether to use references or pointers, it is a highly-debated topic, as you will notice if you search google for "c++ pointer vs reference arguments".

Passing char pointer to another function: blank value. c++ [duplicate]

This question already has answers here:
Char * (pointer) function
(4 answers)
When I change a parameter inside a function, does it change for the caller, too?
(4 answers)
Closed 8 years ago.
I have the following very simple program.
int modify(char * v){
v = "123" ;
return 0;
}
int main (int argc, char** argv){
char *test = new char[10];
modify(test);
std::cout << test;
return 0;
}
I know I can just print out "123", but I deliberately wrote it that way to learn about how pointers work. However, "123" is not printed. How should I correctly pass the pointer?
You have to remember that in C++ arguments are by default passed by value, meaning that in the function you have copies., and modifying a copy will not change the original.
If you want to change the pointer to point to something else you need to pass it by reference. However, in this case it will cause other problems, as you then loose the original pointer.
So the solution is to either use std::string passed by reference, or by using strcpy top copy into the destination memory area (but if you use strcpy you have to take care to not write beyond the allocated memory).
Try this inside modify strcpy(v, "123")
There are several problems with your code.
Your modify() function actually changes nothing:
int modify(char * v) {
v = "123"; // overwrites the parameter value copy on the stack
// with a char[] literal pointer
return 0;
}
You need to copy from the literal to the pointer:
int modify(char* v) {
strcpy(v,"123");
return 0;
}
You do not free the allocated memory, which may lead to memory leaks in other situations
int main (int argc, char** argv){
char *test = new char[10];
modify(test);
std::cout << test;
delete [] char; // <<< Note
return 0;
}
As Joachim Pileborg already mentioned the most approriate solution for c++ would be to use a std::string instead of char*.

pointer to preallocated memory as an input parameter and have the function fill it

Test code:
void modify_it(char * mystuff)
{
//last element is null i presume for c style strings here.
char test[7] = "123456";
//when i do this i thought i should be able to gain access to this
//bit of memory when the function is destroyed but that does not
//seem to be the case.
//static char test[] = "123123";
//this is also creating memory on stack and not the heap i reckon
//and gets destroyed once the function is done with.
//char * test = new char[7];
//this does the job as long as memory for mystuff has been
//allocated outside the function.
strcpy_s(mystuff,7,test);
//this does not work. I know with c style strings you can't just do
//string assignments they have to be actually copied. in this case
//I was using this in conjunction with static char test thinking
//by having it as static the memory would not get destroyed and i can
//then simply point mystuff to test and be done with it. i would later
//have address the memory cleanup in the main function.
//but anyway this never worked.
mystuff = test;
}
int main(void)
{
//allocate memory on heap where the pointer will point
char * mystuff = new char [7];
modify_it(mystuff);
std::string test_case(mystuff);
//this is the only way i know how to use cout by making it into a c++ string.
std::cout<<test_case.c_str();
delete [] mystuff;
return 0;
}
In the case of a static array in the function why would it not work?
In the case when I allocated memory using new in the function does it get created on the stack or heap?
In the case when I have a string which needs to be copied into a char * form. everything I see usually requires const char* instead of just char*.
I know I could use a reference to take care of this easily. Or char ** to send in the pointer and do it that way. But I just wanted to know if I could do it with just char *. Anyway your thoughts and comments plus any examples would be very helpful.
char * mystuff = new char [7];
delete mystuff;
delete mystuff is causing undefined behavior. You must delete[] what you new[].
The line mystuff = test; causes the variable mystuff to contain the address of the test array. However, this assignment is local to the function. The caller never sees the modified value of mystuff. This is generally true for C/C++: function parameters are passed by value, and local modifications to that value are invisible outside of the function. The only exception to this is if you use the & operator in the parameter list in C++, which causes the parameter to be passed by reference. Like so:
void modify_it(char* &str) { /* ... */ }
However, if you do this, your program still won't work correctly, and will probably crash. That's because the address of test is stack memory, and that memory will be overwritten when modify_it returns. You'll be giving the caller the address of invalid stack memory, which can only lead to bad things. The correct thing to do is one of the following:
/* function allocates, caller frees */
void modify_it(char* &str) {
str = new char[7]; // allocate enough memory for string
memcpy(str, 7, test);
}
Or this:
/* caller allocates and frees */
void modify_it(char* str, size_t str_len) {
if (str_len < 7) { /* report an error. caller didn't allocate enough space. */ }
memcpy(str, 7, test);
}