I read that in function the local variables are put on stack as they are defined after the parameters has been put there first.
This is mentioned also here
5 .All function arguments are placed on the stack. 6.The instructions
inside of the function begin executing. 7.Local variables are pushed
onto the stack as they are defined.
So I excpect that if the C++ code is like this:
#include "stdafx.h"
#include <iostream>
int main()
{
int a = 555;
int b = 666;
int *p = &a;
std::cout << *(p+1);
return 0;
}
and if integer here has 4 bytes and we call the memory space on stack that contains first 8 bits of int 555 x, then 'moving' another 4 bytes to the top of the stack via *(p+1) we should be looking into memory at address x + 4.
However, the output of this is -858993460 - an that is always like that no matter what value int b has. Evidently its some standard value. Of course I am accessing a memory which I should not as for this is the variable b. It was just an experiment.
How come I neither get the expected value nor an illegal access error?
Where is my assumption wrong?
What could -858993460 represent?
What everyone else has said (i.e. "don't do that") is absolutely true. Don't do that. However, to actually answer your question, p+1 is most likely pointing at either a pointer to the caller's stack frame or the return address itself. The system-maintained stack pointer is decremented when you push something on it. This is implementation dependent, officially speaking, but every stack pointer I've ever seen (this is since the 16-bit era) has been like this. Thus, if as you say, local variables are pushed on the stack as they are initialized, &a should == &b + 1.
Perhaps an illustration is in order. Suppose I compile your code for 32 bit x86 with no optimizations, and the stack pointer esp is 20 (this is unlikely, for the record) before I call your function. This is what memory looks like right before the line where you invoke cout:
4: 12 (value of p)
8: 666 (value of b)
12: 555 (value of a)
16: -858993460 (return address)
p+1, since p is an int*, is 16. The memory at this location isn't read protected because it's needed to return to the calling function.
Note that this answer is academic; it's possible that the compiler's optimizations or differences between processors caused the unexpected result. However, I would not expect p+1 to == &b on any processor architecture with any calling convention I've ever seen because the stack usually grows downward.
Your assumptions are true in theory (From the CS point of view).
In practice there is no guarantee to do pointer arithmetic in that way expecting those results.
For example, your asumption "All function arguments are placed on the stack" is not true: The allocation of function argumments is implementation-defined (Depending on the architecture, it could use registers or the stack), and also the compiler is free to allocate local variables in registers if it feels necesary.
Also the asumption "int size is 4 bytes, so adding 4 to the pointer goes to b" is false. The compiler could have added padding between a and b to ensure memory aligment.
The conclusion here is: Don't use low-level tricks, they are implementation-defined. Even if you have to (Regardless of our advises) do it, you have to know how the compiler works and how it generates the code.
Related
Since the stack grows downwards, ie towards numerically smaller memory addresses why does &i < &j is true. Correct me if I'm wrong, but I'd imagine this was a design decision of C creators (that C++ maintains). But I wonder why though.
It is also strange that a heap-allocated object pin lies at numerically higher memory address than a stack variable and this also contradicts the fact that the heap lies at numerically smaller memory addresses than the stack (and increases upwards).
#include <iostream>
int main()
{
int i = 5; // stack allocated
int j = 2; // stack allocated
int *pi = &i; // stack allocated
int *pj = &j; // stack allocated
std::cout << std::boolalpha << '\n';
std::cout << (&i < &j) && (pi < pj) << '\n'; // true
struct S
{
int in;
};
S *pin // stack allocated
= new S{10}; // heap allocated
std::cout << '\n' << (&(pin->in) > &i) << '\n'; // true
std::cout << ((void*)pin > (void*)pi) << '\n'; // true
}
Am I right so far and if so why C designers reversed this situation that numerically smaller memory addresses appear higher (at least when you compare the pointers or through the addressof operator &). Was this done just 'to make things work'?
Correct me if I'm wrong, but I'd imagine this was a design decision of C creators
It is not part of the design of the C language, nor C++. In fact, there is no such thing as "heap" or "stack" memory recognised by these standards.
It is an implementation detail. Each implementation of each language may do this differently.
Ordered comparisons between pointers to unrelated objects such as &i < &j or (void*)pin > (void*)pi have an unspecified result. Neither is guaranteed to be less or greater than the other.
For what it's worth, your example program outputs three counts of "false" on my system.
The compiler generated code that isn't allocating space for each individual variable in order, but allocating a block for those local variables, and thus can arrange them within that block however it chooses.
Usually, all the local variables of one function are allocated as one block, during function entry. Therefore you will only see the stack growing downward if you compare the address of a local variable allocated in an outer function with the address of a local variable allocated in an inner function.
It's really rather easy: such a stack is an implementation detail. The C and C++ language spec doesn't even need to refer to it. A conforming C or C++ implementation does not need to use a stack! And if it does use a stack, still the language spec doesn't guarantee that the addresses on it are allocated in any particular pattern.
Finally, the variables may be stored in registers, or as immediate values in the code text, and not in data memory. Then: taking the address of such a variable is a self-fulfilling prophecy: the language spec forces the value to a memory location, and the address of that is provided to you - this usually wrecks performance, so don't take addresses of things you don't need to know the address of.
A simple cross-platform example (it does the right thing on both gcc and msvc).
#ifdef _WIN32
#define __attribute__(a)
#else
#define __stdcall
#endif
#ifdef __cplusplus
extern "C" {
#endif
__attribute__((stdcall)) void __stdcall other(int);
void test(){
int x = 7;
other(x);
int z = 8;
other(z);
}
#ifdef __cplusplus
}
#endif
Any reasonable compiler won't put x nor z in memory unnecessarily. They will be either stored in registers, or will be pushed onto the stack as immediate values.
Here's x86-64 output from gcc 9.2 - note that no memory loads nor stores are present, and there's tail call optimization!
gcc -m64 -Os
test:
push rax
mov edi, 7
call other
mov edi, 8
pop rdx
jmp other
On x86, we can force a stdcall calling convention that uses stack to pass all parameters: even then, the value 7 and 8 is never in a stack location for a variable. It is pushed directly to the stack when other gets called, and it doesn't exist on the stack beforehand:
gcc -m32 -fomit-frame-pointer -Os
test:
sub esp, 24
push 7
call other
push 8
call other
add esp, 24
ret
I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards
With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.
While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...
This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.
#include <iostream>
using namespace std;
int main()
{
int a=50;
int b=50;
int *ptr = &b;
ptr++;
*ptr = 40;
cout<<"a= "<<a<<" b= "<<b<<endl;
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
return 0;
}
The above code prints :
a= 50 b= 50
address a 0x7ffdd7b1b710 address b= 0x7ffdd7b1b714
Whereas when I remove the following line from the above code
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
I get output as
a= 40 b= 50
My understanding was that the stack grows downwards, so the second answers seems to be the correct one. I am not able to understand why the print statement would mess up the memory layout.
EDIT:
I forgot to mention, I am using 64 bit x86 machine, with OS as ubuntu 14.04 and gcc version 4.8.4
First of all, it's all undefined behavior. The C++ standard says that you can increment pointers only as long as you are in array boundaries (plus one element after), with some more exceptions for standard layout classes, but that's about it. So, in general, snooping around with pointers is uncharted territory.
Coming to your actual code: since you are never asking for its address, probably the compiler either just left a in a register, or even straight propagated it as a constant throughout the code. For this reason, a never touches the stack, and you cannot corrupt it using the pointer.
Notice anyhow that the compiler isn't restricted to push/pop variables on the stack in the order of their declaration - they are reordered in whatever order they seem fit, and actually they can even move in the stack frame (or be replaced) throughout the function - and a seemingly small change in the function may make the compiler to alter completely the stack layout. So, even comparing the addresses as you did says nothing about the direction of stack growth.
UB - You have taken a pointer to b, you move that pointer ptr++ which means you are pointing to some unknown, un-assigned memory and you try to write on that memory region, which will cause an Undefined Behavior.
On VS 2008, debugging it step-by-step will throw this message for you which is very self-explanatory::
Can someone help me get a better understanding of creating variables in C++? I'll state my understanding and then you can correct me.
int x;
Not sure what that does besides declare that x is an integer on the stack.
int x = 5;
Creates a new variable x on the stack and sets it equal to 5. So empty space was found the stack and then used to house that variable.
int* px = new int;
Creates an anonymous variable on the heap. px is the memory address of the variable. Its value is 0 because, well, the bits are all off at that memory address.
int* px = new int;
*px = 5;
Same thing as before, except that the value of the integer at memory address px is set to 5. (Does this happen in 1 step???? Or does the program create an integer with value 0 on the heap and then set it to 5?
I know that everything I wrote above probably sounds naive, but I really am trying to understand this stuff.
Others have answered this question from the point of view of how the C++ standard works. My only additional comment there would be with global or static variables. So if you have
int bar ()
{
static int x;
return x;
}
then x doesn't live on the stack. It will be initialised to zero at the "start of time" (this is done in a function called crt0, at least with GCC: look up "BSS" segments for more information) and bar will return zero.
I'd massively recommend looking at the assembled code to see how a compiler actually treats what you write. For example, consider this tiny snippet:
int foo (int a)
{
int x, y;
x = 3;
y = a;
return x + y;
}
I made sure to use the values of x and y (by returning their sum) to ensure the compiler didn't just elide them completely. If you stick that code in a file called tmp.cc and then compile it with
$ g++ -O2 -c -o tmp.o tmp.cc
then ask for the disassembled code with objdump, you get:
$ objdump -d tmp.o
tmp.o: file format elf32-i386
Disassembly of section .text:
00000000 <_Z3fooi>:
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 83 c0 03 add $0x3,%eax
7: c3 ret
Whoah! What happened to x and y? Well, the point is that the C and C++ standards merely require the compiler to generate code that has the same behaviour as what your program asks for. In fact, this program loads 32 bits from the stack (this is the contents of a, a fact dictated by the ABI on my particular platform) and sticks it in the eax register. Then it adds three and returns. Another important fact about the ABI on my laptop (and probably yours too) is that the return value of a function sits in eax. Notice, the function didn't allocate any memory on the stack at all!
In fact, I also put bar (from above) in my tmp.cc. Here's the resulting code:
00000010 <_Z3barv>:
10: 31 c0 xor %eax,%eax
12: c3 ret
"Huh, what happened to x?", I hear you say :-) Well, the compiler spotted that nothing in the code required x to actually exist, and it always had the value zero. So the function basically got transformed into
int bar ()
{
return 0;
}
Magic!
When a new variable is created, it does not have a value. It can be anything, pretty much depending on what was in that piece of stack or heap before. int x; will give you a warning if you try to use the value without setting it to something first. E.g. int y = x; will cause a warning unless you give x an explicit value first.
Creating an int on the heap works pretty much the same way: int *p = new int; calls the default constructor, which does nothing, leaving the value of *p up to chance until you set it to something explicit. If you want to make sure your heap value is initialized, use int *p = new int(5); to tell the constructor what value to copy into the memory it allocates.
Unless you initialize an int variable to zero explicitly, it is pretty much never initialized for you unless it is a global, namespace, or class static.
In VS2010 specifically(other compilers may treat it differently), an int is not given a default value of 0. You can see this by trying to print out a non-initialized int. It does allocate memory with a size of int but it is not initialized(just junk).
In both of your cases, the memory is allocated FIRST, and then the value is set. If a value is not set, you have a non-initialized part of memory that will have "junk data" inside of it and you will get a compiler warning and possibly an error when running it.
Yes, it has an address in memory but there is no valid(known) data inside of it unless you specifically set it. It vary well could be anything that the compiler recognizes as available memory to be overwritten. Since it is unknown and not reliable, it is considered junk and useless and why compilers warn you about it.
Compilers WILL set static int and global int to 0.
EDIT: Due to Peter Schneider's comment.
I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.
What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.
You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]
You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.
When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)
Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2
It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?