I have an issue - with the following code I am trying to find out what is stored at a certain address and how long my static variable is stored at this specific position. (I read that static variables are stored infinitely and was quite surprised - wanted to test if this was true).
The code defines a static variable (its address on my system is 0x1000020c0 - This is probably rather random but was continuously the case)
If I now want to find out what integer value is stored at this address I have to first print out the address with $number, which then gives 0x1000020c0. The recasting/reinterpreting of the address (0x1000020c0) gives 100 only! if the address was printed before or if I use &number in the reinterpreting/recasting.
Can someone explain why this is the case?
int static number = 100;
// std::cout << number << std::endl; <- prints 100
// prints 100 and the address 0x1000020c0 in my case
// std::cout << number << " " << &number << std::endl;
// this does not work unless &number is printed previously
// std::cout << "Value is : " << *reinterpret_cast<int*>(0x1000020c0) << std::endl;
// this does work and show the correct value (100)
std::cout << "Value is : " << *reinterpret_cast<int*>(&number) << std::endl;
In any given program, the object might, or might not be stored in the address 0x1000020c0. There are no guarantees either way. The address of the object is decided at compile (or possibly at link) time. A change to the program can change the address.
If you never take the address of the local static object, and never modify it, the compiler may optimize the variable away, so that no memory is used. If the object doesn't exist at all, then it definitely doesn't exist at the memory location 0x1000020c0.
If you use the object in a way that requires the object to exist, it will be in some memory location. Taking the address of the object usually triggers such requirement. This is strikingly similar to observer effect in physics.
If you dereference a pointer which does not point to an object (of appropriate type), the behaviour is undefined.
when I print recast/reinterpret the value that is at 0x1000020c0 it prints nothing
As I explained above, the object is not guaranteed to exist at the memory location 0x1000020c0.
even though the object was used since i printed its value via std::cout << number;
Accessing the value of an object doesn't necessarily require the object to exist. The compiler may be able to prove that the value of the object is 100, so it can store that value as a constant and not store static object at all.
Besides, even if the static object did exist, it wouldn't necessarily exist in the address 0x1000020c0, unless you take the address and observe it to be so.
As a consequence: Don't ever cast an arbitrary number to a pointer (unless you work on some embedded platform that has hardcoded memory mappings). Seeing that the address of an object in one program is 0x1000020c0, doesn't make 0x1000020c0 non-arbitrary in another program.
Assuming specific address of number is not the best idea.
In C++, static inside a function body is created at first invocation or when first time C++ program flow encounters the variable. They are never created if never used.
If possible, a compiler may choose to optimize the static and replace it with the value.
You're missing one very major statement here - the platform you're running this code on.
A static variable isn't stored "infinitely"; it's stored in that location for the duration of the program execution. If you're running on an embedded platform where your code jumps to main() at power-up, then you don't have anything else which will get in the way. If you're running on any other platform (Windows, Linux or whatever), that location becomes free when the program completes, and is immediately available for anything else to use.
It's possible that if you run your program, it completes, and you run it again, then maybe you'll get the same chunk of memory for your program to run in. In that case you'll get the same addresses for static variables. If something else has asked for a chunk of memory between your first run finishing and the next run starting (e.g. Chrome needed a bit more space for the pictures you were browsing), then your program won't be given the same chunk of memory and the variable won't be in the same place.
It gets more fun for DLLs. The same kind of rules apply, except they apply for the duration of the DLL being loaded instead of for the duration of program execution. A DLL could be loaded on startup and stay loaded all the way through, or it could be loaded and unloaded by applications as needed.
All this means that you're making some very strange assumptions. If you get the address of a static variable in your program, and then your program checks the contents of that address, you'll always get whatever's in that static variable. That's how static variables work. If you run your code twice, you'll be getting the address of the location for that static variable at your next run, as set up by your program when you run it that second time. In between runs of your program, that address is 100% free for anything else to use.
As others have already pointed out, after this you may also be seeing effects of compiler optimisation in the specific behaviour you're asking about. But the reason you're asking about this specific behaviour is that you seem to have misunderstood something fundamental to how static variables work.
Related
I have been learning C++ for a while, but currently I know nearly nothing about assembly/machine language and how compiler and hardware work. Sorry if this question is really naive...
Consider the following very simple code:
#include <iostream>
using namespace std;
int main()
{
int x = 0;
cout << &x << endl;
}
In my current understanding, the first line in main() asks the compiler to reserve enough memory to hold an int, associate it with the identifier x, and put 0 into that memory location. And the second line in main() prints out the address of the start of that memory location.
I run the above code twice (consecutively), and I got different outputs as follows:
0000002EECAFFB84 // first time
0000007F1FAFF854 // second time
It is also in my current understanding that (please correct me if I have any misunderstandings):
when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware).
since I modified nothing between my first run and my second run, the compiler will NOT generate machine code again, and thus the machine code used in the second run is the same as in the first run.
If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.
Is such mechanism done directly by the hardware (e.g. CPU?)? Or is it done by the operating system (so can I say OS is a kinda "virtual machine" for C++?)? May I ask who determines the exact memory address of x and how?
May I also ask how exactly does the compiler generates machine code for &x at compile-time, so that the memory address which hasn't been determined yet can be ensured to be retrieved successfully at a later point in runtime?
when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware)
The compiler doesn't generate machine code when you run the program. It generates the machine code when you compile it. The compiler is also a program. C++ code is textual data. The C++ language is a standard. The compiler implements the C++ standard by writing code that can read the textual data of your C++ program and understand what it should do according to the standard. It will then write a file called an executable containing the machine code.
When you launch the executable, the desktop, which is also an executable but with a higher privilege, will use a system call to ask the os to create a new process which will run the code in that executable.
Assembly is also textual data. It is considered lower level than C++ because every line of code is almost on par with the CPU instructions (one line of code = one instruction) but not always. Assembly remains textual data that an assembler understands and can translate it to individual CPU instructions.
I don't think machine code is normally called object code. Object code normally refers to code that isn't yet linked. It means that, for some symbols that you call in higher level languages, the address to reach them isn't yet known. If the compiler cannot determine the address to reach for a certain symbol (like a function), then it will leave an unresolved symbol in the object code. For example, if you include an header and call a function in it, it contains only a declaration. The actual definition of the function is either in another object file or in a library. When you link your object files together, the linker looks at unresolved symbols and attempts to find them in the other object files you passed. If it doesn't find them, then it throws an error. Compilation is done in those two steps to allow for parallel compilation. Basically, your source code file doesn't need any other file to be compiled. It just needs that every symbol is declared so it can create an object file and leave unresolved symbols in it. Then, the linker patches them. It speeds up compilation by a lot because several threads can be used.
If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.
The memory address of x is not determined by the machine code but its relative position within the stack is. The stack's address is stored in a register called the stack pointer. The compiler doesn't know what is the address in advance and it doesn't care. It will access the content of the stack with a relative offset from the stack pointer register. This allows relative addressing for data local to your function.
When a function ends (if you call other functions from main let's say), the compiler puts an instruction to increment the stack pointer. The data that was there is still the same but the stack pointer is pointing above so, when the compiler accesses the stack relative to it, the data isn't in the way. The data is basically forgotten. If you call another function, then that data will most likely be overwritten by what this function initializes (its variables).
For data outside functions (global data), the executable contains a section called the data section which has room for it. The global data will thus have reserved space for the whole execution of your program.
How does the compiler know where in memory the square root will be before the program is executed? I thought the address would be different everytime the program is executed, but this works:
constexpr double(*fp)(double) = &sqrt;
cout << fp(5.0);
Is it because the address is relative to another address in memory? I don't think so because the value of fp is large: 0x720E1B94.
At compile time, the compiler doesn't know the address of sqrt. However, you cannot do anything at compile time with a constexpr function pointer that would allow you to access that pointer's address. Therefore, a function pointer at compile time can be treated as an opaque value.
And since you can't change a constexpr variable after it has been initialized, every constexpr function pointer can be boiled down to the location of a specific function.
If you did something like this:
using fptr = float(*)(float);
constexpr fptr get_func(int x)
{
return x == 3 ? &sqrtf : &sinf;
}
constexpr fptr ptr = get_func(12);
The compiler can detect exactly which function get_func will return for any particular compile time value. So get_func(12) reduces down to &sinf. So whatever &sinf would compile to is exactly what get_func(12) would compile to.
Address value is assigned by a linker, so the compiler does not know the exact address value.
cout << fp(5.0);
This works because it is evaluated at run-time after exact address has been resolved.
In general, you cannot use the actual value (address) of constexpr pointer because it is not known at compile-time.
Bjarne Stroustrup's C++ Programming language 4th edition mentions:
10.4.5 Address Constant Expressions
The address of a statically allocated object (§6.4.2), such as a global variable, is a constant. However, its value is assigned by the linker, rather than the compiler, so the compiler cannot know the value of such an address constant. That limits the range of constant expressions of pointer and reference type. For example:
constexpr const char∗ p1 = "asdf";
constexpr const char∗ p2 = p1; // OK
constexpr const char∗ p2 = p1+2; // error : the compiler does not know the value of p1
constexpr char c = p1[2]; // OK, c==’d’; the compiler knows the value pointed to by p1
How does the compiler know where in memory the square root will be before the program is executed?
The tool chain gets to decide where it puts the functions.
Is it because the address is relative to another address in memory?
If the produced program is either relocatable or position independent then yes, that's the case. If the program is neither, then the address can even be absolute.
Why would the exact same memory spots be available next time the program is run?
Because the memory space is virtual.
It's simple.
Consider how compiler knows the address to call in this code:
puts("hey!");
Compiler has no idea of the location of puts, and it also doesn't add a runtime lookup for it (that'd be rather bad for performance, though it is actually what virtual methods of classes need to do). The possibility of having a different version of dynamic library at runtime (not to mention address space layout randomization even if it is the exact same library file) makes sure the build time toolchain linker doesn't know it either.
So it's up to the dynamic linker to fix the address, when it starts the compiled binary program. This is called relocation.
Exact same thing happens with your constexpr: compiler adds every place in the code using this address to the relocation table, and then dynamic linker does its job every time the program starts.
I've been looking a bit into Cheat Engine, which allows you to inspect and manipulate the memory of running processes on Windows: You scan for variables based on their value, then you can modify them, e.g. to cheat in a game.
In order to write a bot or something similar, you need to find a static address for the variable you want to change - i.e. one that stays the same if the process is restarted. The method for that goes roughly like this:
Look for the address of the variable you're interested in, searching by value
Look for code using that address, e.g. to find the address of the struct it belongs to (since struct offsets are fixed)
Look for another pointer pointing to that pointer until you find one with a static address (shows as green in Cheat Engine)
It seems to work just fine judging from the tutorials I've looked at, but I have trouble understanding why it works.
Don't all variables, including global static ones, get a pretty random address at runtime time?
Bonus questions:
How can Cheat Engine tell if an address is static (i.e. will stay the same on restart)?
A tutorial referred to the fact that many older and some modern games (e.g. Call of Duty 4) use only static addresses. How is that possible?
I will answer the bonus questions first because they introduce some concepts you may need to know to understand the answer for the main question.
Answering the first bonus question is easy if you know how an executable file works: all the global/static variables are inside the .data section, in which the .exe stores the address offset for the section so Cheat Engine just checks if the variable is in this address range (from this section to the next one).
For the second question, it is possible to use only static addresses, but that is nearly impossible for a game. Even the older ones. What the tutorial creator was probably trying to say is that all variables that he wants, actually had a static pointer pointing to them. But solely by the fact that you create a local variable, or even pass an argument to a function, their values are being stored into the stack. That's why it is nearly impossible to have a "static-only" program. Even if you compile a program that actually doesn't do anything, it will probably have some stuff being stored in the stack.
For the whole question itself, not all dynamic address variables are pointed by a global variable. It depends totally on the programmer. I can create a local variable and never assign its address to a global/static pointer in a C program, for example. The only certain way to find that address in this case is to actually know the code when the variable was first assigned a value in the stack.
Some variables have a dynamic address because they are just local variables, which are stored in the stack the first time they have a value assigned to them.
Some other variables have a static address because they are declared either as a global or a static variable to the compiler. These variables have a fixed address offset that is part of the .data section in the executable file.
The executable file has a fixed offset address for each section inside it, and the .data section is no exception.
But it is worth to note that the offset inside the executable itself is fixed. In the operating system things might be different (all random addresses), but that is the job of an OS, abstracting this kind of stuff for you (creating the executable's virtual address space in this case). So it just looks like static variables are actually static, but only inside the executable's memory space. On the RAM things might be anywhere.
Finally, it is difficult to try to explain this to you because you'll have to understand how executable files work. A good start would be to search for some explanations regarding low-level programming, like stack frame, calling conventions, the Assembly language itself and how compilers use some well-known techniques to manage functions (scopes in general), global/static/local/constant variables, and the memory system (sections, the stack, etc.), and maybe some research into PE (and even ELF) files.
As far as I understand it, variables declared static have a permanent offset within the program data. This means that when the program is loaded into RAM, the offset of the variable will always be the same. Because the beginning address of the program is known globally, finding a static variable based on offset, as you mentioned, should be a trivial task. Therefore, while a pointer to a static variable might be random in the scheme of things, its offset to the beginning of program memory should remain the same no matter when the program starts. So Cheat Engine (though I don't know the software) most likely stores the offset of the static variable, and then when the software starts, applies this logic to find that variable.
As to how it can tell it's a static variable in the first place... well, this is partially a guess, but when you declare a variable static in C, I'm assuming the compiler/linker puts some kind of flag so the OS knows that it's a static variable. It could also be that all static variables are stored in a certain way, or at a certain address offset, for all programs compiled for a certain target system. Again, not too sure about that, but from what I understand about memory management, that seems to make the most sense. With these assumptions, it's quite possible for a program to contain solely static variables. The difference is that memory is assigned statically at program runtime, as a opposed to dynamically (as with a call to malloc() or similar). If the variables were stored dynamically, I'm sure there'd be a way to find them easily, so I don't think it matters to Cheat Engine whether or not a variable is static or not. However, as I'm assuming Cheat Engine wants to modify a game upon startup (just like the old GameSharks used to... ahh, miss those days) it's probably more reliable to modify variables that are static, instead of trying to locate pointers and disassemble the code, etc. etc.
If you're interested in learning more, I'd recommend checking out something like this tutorial over at OSDev!
Quoting from C++ Primer:
The address of an object defined outside of any function is a constant expression, and so may be used to initialize a constexpr pointer.
In fact, each time I compile and run the following piece of code:
#include <iostream>
using namespace std;
int a = 1;
int main()
{
constexpr int *p = &a;
cout << "p = " << p << endl;
}
I always get the output:
p = 0x601060
Now, how is that possible? How can the address of an object (global or not) be known at compile time and be assigned to a constexpr? What if that part of the memory is being used for something else when the program is executed?
I always assumed that the memory is managed so that a free portion is allocated when a program is executed, but doesn't matter what particular part of the memory. However, since here we have a constexpr pointer, the program will always require a specific portion, that has to be free to allow the program execution. This doesn't make sense to me, could someone explain this behaviour please? Thanks.
EDIT: After reading your answers and a few articles online, I realized that I missed the whole concept of virtual memory... now it makes sense. It's quite surprising that neither C++ Primer nor Accelerated C++ mention this concept (maybe they will do it in later chapters, I'm still reading...).
However, quoting again C++ Primer:
A constant expression is an expression whose value cannot change and that can be evaluated at compile time.
Given that the linker has a major role in computing the fixed address of global objects, the book would have been more precise if it said "constant expression can be evaluated at link time", not "at compile time".
It's not actually true that the address of an object is known at compile time. What is known at compile time is the offset. When the program is compiled, the address is not emitted into the object file, but a marker to indicate the offset and the section.
To be simplistic about it, the linker then comes along, measures the size of each section, stitches them together and calculates the address of each marker in each object file now that it has a concrete 'base address' for each section.
Of course it's not quite that simple. A linker can also emit a map of the locations of all these adjusted values in its output, so that a loader or load-time linker can re-adjust them just prior to run time.
The point is, logically, for all intents and purposes, the address is a constant from the program's point of view. It's just that the constant isn't given a value until link/load time. When that value is available, every reference to that constant is overwritten by the linker/loader.
If your question is "why is it always the same address?" It's because your OS uses a standard virtual memory layout layered over the virtual memory manager. Addresses in a process are not real memory addresses - they are logical memory addresses. The piece of silicon at that 'address' is mapped in by the virtual memory management circuitry. Thus each process can use the "same" address, while actually using a different area of the memory chips.
I could go on about paging memory in and out, which is related, but it's a long topic. Further reading is encouraged.
It works because global variables are in static storage.
This is because the space for the global/static variable is allocated at compile time within the binary your compiler generates, in a region next to the program's machine code called the "data" segment. When the binary is copied and loaded into memory, the data segment becomes read-write.
This Wikipedia article includes a nice diagram of where the "data" segment fits into the virtual address space:
https://en.wikipedia.org/wiki/Data_segment
Automatic variables are not stored in the data segment because they may be instantiated as many times as their parent function is called. Moreover, they may be allocated at any depth of the stack. Thus it is not possible to know the address of an automatic variable at compile time in the general case.
This is not the case for global variables, which are clearly unique throughout the lifetime of the program. This allows the compiler to assign a fixed address for the variable which is separate from the stack.
these are the some silly question ..i want to ask..please help me to comprehend it
const int i=100; //1
///some code
long add=(long)&i; //2
Doubt:for the above code..will compiler first go through the whole code
for deciding whether memory should be allocated or not..or first it ll store the
variable in read only memory place and then..allocate stroage as well at 2
doubt:why taking address of variable enforce compiler to store variable on memory..even
though rom or register too have address
In your code example, add contains the address, not the value, of i. I believe you may have thought that i was not stored in normal memory unless/until you take its address. This is not the case.
const does not mean the value is stored in ROM. It is stored in normal memory (often the stack) just like any other variable. const means the compiler will go to some lengths to prevent you from modifying the value.
const is not, and was never intended, to be some sort of security mechanism. If you obtain the address of the memory and want to modify it, you can do so. Of course this is almost always a bad idea, but if you really need to do it, it is possible.
I never wrote a compiler implementing this, but I think that it would be simple to just handle the variable as a normal variable but using the constant value where the variable value is used and using the address of the variable if the address is used.
If at the end of the scope of the variable no one took the address then I can just drop it instead of doing a real allocation because for all other uses the constant value has been used instead of compiling a variable loading operation.
constant values (not the only use for const, but the one used here) are not 'stored in normal memory' (nor in ROM, of course). the compiler simply uses the value (100 in this case) whenever the code uses the variable.
Of course, if the value isn't stored anywhere, there's no meaning of an address for the constant.
Other uses of const are stored in 'normal memory', and you can take their address, but the result is a 'pointer to const value', so it's (in principle) unusable for modification of the value. A hard cast would of course change that, so they trigger a nasty compiler warning.
also, remember that the C/C++ compiler operates totally at compile time (by definition!), it's nothing unusual that some use at a later part affects the code generation of an early part.
A very obvious example is the declaration of stack variables: the compiler has to take into account all the variables declared at any given level to be able to generate the stack allocation at the block entry.
I am a little confused about what you are asking but looking at your code:
i = 100 with a address of 0x?????????????
add = whatever the address is stored as a long int
There is no (dynamic) memory allocation in this code. The two local variables are created on stack. The address of i is taken and brutally cast into long, which is then assigned to the second variable.