This question already has answers here:
Does the address of a function change per runtime [duplicate]
(3 answers)
Why does the address of a function change with every run?
(4 answers)
Closed 3 years ago.
The address of executable code is decided at link time, isn't it?
#include <stdio.h>
int main ()
{
printf("%p", (void*)&main);
return 0;
}
example output #1:
0x563ac3667139
example output #2:
0x55e3903a9139
On many modern systems, at link time it will determine the address of the function relative to the base address module. When the module (exe, dll, or so) is loaded, Address Space Layout Randomization (ASLR) gives it a different base address.
This is for security, it means the addresses of functions is not predictable. This means certain attacks that might for example overflow a stack variable to overwrite the return address or a function pointer with some other function (for malicious purposes), can't easily predict what address to overwrite it with, it will vary from run to run.
The ability to relocate the base address also solves the practical problem of a conflict, if you load a.dll and b.dll which were independently compiled for the same base address, that won't work, so being able to relocate one resolves the conflict.
At the machine code level, this is fine because most jumps and calls use a relative instruction offset, not an absolute. Although certain constructs are dynamically patched when the module is loaded, or use some form of "table" that is populated with the correct addresses.
See also Relocation (computing)
This is a security technique called address space layout randomization.
It deliberately moves things around on each execution, to make it more difficult for attackers to know where bits of data are in your process and hack them.
Related
I have been learning C++ for a while, but currently I know nearly nothing about assembly/machine language and how compiler and hardware work. Sorry if this question is really naive...
Consider the following very simple code:
#include <iostream>
using namespace std;
int main()
{
int x = 0;
cout << &x << endl;
}
In my current understanding, the first line in main() asks the compiler to reserve enough memory to hold an int, associate it with the identifier x, and put 0 into that memory location. And the second line in main() prints out the address of the start of that memory location.
I run the above code twice (consecutively), and I got different outputs as follows:
0000002EECAFFB84 // first time
0000007F1FAFF854 // second time
It is also in my current understanding that (please correct me if I have any misunderstandings):
when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware).
since I modified nothing between my first run and my second run, the compiler will NOT generate machine code again, and thus the machine code used in the second run is the same as in the first run.
If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.
Is such mechanism done directly by the hardware (e.g. CPU?)? Or is it done by the operating system (so can I say OS is a kinda "virtual machine" for C++?)? May I ask who determines the exact memory address of x and how?
May I also ask how exactly does the compiler generates machine code for &x at compile-time, so that the memory address which hasn't been determined yet can be ensured to be retrieved successfully at a later point in runtime?
when I first run the program, C++ compiler translate my source code directly into machine code (or also called object code, the code that runs directly on hardware)
The compiler doesn't generate machine code when you run the program. It generates the machine code when you compile it. The compiler is also a program. C++ code is textual data. The C++ language is a standard. The compiler implements the C++ standard by writing code that can read the textual data of your C++ program and understand what it should do according to the standard. It will then write a file called an executable containing the machine code.
When you launch the executable, the desktop, which is also an executable but with a higher privilege, will use a system call to ask the os to create a new process which will run the code in that executable.
Assembly is also textual data. It is considered lower level than C++ because every line of code is almost on par with the CPU instructions (one line of code = one instruction) but not always. Assembly remains textual data that an assembler understands and can translate it to individual CPU instructions.
I don't think machine code is normally called object code. Object code normally refers to code that isn't yet linked. It means that, for some symbols that you call in higher level languages, the address to reach them isn't yet known. If the compiler cannot determine the address to reach for a certain symbol (like a function), then it will leave an unresolved symbol in the object code. For example, if you include an header and call a function in it, it contains only a declaration. The actual definition of the function is either in another object file or in a library. When you link your object files together, the linker looks at unresolved symbols and attempts to find them in the other object files you passed. If it doesn't find them, then it throws an error. Compilation is done in those two steps to allow for parallel compilation. Basically, your source code file doesn't need any other file to be compiled. It just needs that every symbol is declared so it can create an object file and leave unresolved symbols in it. Then, the linker patches them. It speeds up compilation by a lot because several threads can be used.
If my above understandings are correct, then the memory address of x is not determined in the machine code generated by the compiler (otherwise the outputs would be the same), and there must be some intermediate mechanism (between executing the machine code generated by the compiler and creating the int on memory) which decides on the exact memory address where the int will reside.
The memory address of x is not determined by the machine code but its relative position within the stack is. The stack's address is stored in a register called the stack pointer. The compiler doesn't know what is the address in advance and it doesn't care. It will access the content of the stack with a relative offset from the stack pointer register. This allows relative addressing for data local to your function.
When a function ends (if you call other functions from main let's say), the compiler puts an instruction to increment the stack pointer. The data that was there is still the same but the stack pointer is pointing above so, when the compiler accesses the stack relative to it, the data isn't in the way. The data is basically forgotten. If you call another function, then that data will most likely be overwritten by what this function initializes (its variables).
For data outside functions (global data), the executable contains a section called the data section which has room for it. The global data will thus have reserved space for the whole execution of your program.
I am currently studying reverse engineering and inner workings of the memory. I made a simple program that can access another processes memory and read the the specified address. This works without a problem and it is not the focus of my question.
I would like to understand more in-depth how exactly the application looks in the memory and I would like to know the general look of it (Scheme or whatever).
To be more specific I do not understand the following:
I have an application called TestApp.exe. It has 2 variables in the stack which I am trying to read using another app MemoryReader.exe
The handle to the process TestApp.exe has the address 0x12C.
My first question is what exactly starts at the address 0x12C. Is this the starting address of the entire exe, or some meta data to the exe? The module has the address of 0x00250000 which is quite a far away from the handle itself making me wonder what is between those two. Please elaborate.
I have two variables with the address 0x012BD1F8 and 0x012BD1FC. Both are in the stack. Since they were initialized in the scope of the main right after each other, it makes sense that they are allocated in the stack right next to each other. As we can see the offset is exactly 4 bytes which fits.
However what I do not understand is why do the offsets from the base address (Module address) to the variables change each time I launch the program? Is the program not structured the same way and therefore should not it allocate the memory in exactly the same way except on a different address meaning that the offsets should stay the same? What am I missing? From what I understand there must be something dynamic that changes between the variable addresses and the base address of the module, my question is what it is and why? Please explain.
Thank you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How can one get the virtual address of the data & code in a program?
One might say use %u or %p or something else.
printf("%u", &data);
printf("%p", &data);
I'm always confused; which one gives correct address? Both give addresses but what's the difference?
Is there any way we can say which part of memory a given virtual address belongs to? Can we identify that it's a stack address or a heap address or something else?
For (1) Only printf("%p", &data) can be used to print the pointer address although you must write (void*)&data (C-standard ref C11, 7.21.6.1p8), the behaviour of printf("%u", &data) is undefined as the format specifier is invalid for a pointer type. But note that the address you see may well not have any correspondence to a physical address; many operating systems and runtimes place one or two levels of abstraction between physical addresses and the pointer values you see.
For (2), the printf call is also valid in C++.
For (3), neither the C nor the C++ standard (aside from a couple of standard library functions in the latter) have a notion of a stack or a heap, so, no, there is no portable way of telling.
From the rest of your questions, it suggests you are trying to use C++ to identify pedagogical concepts that have no real existence in running code.
How can one get the virtual address of the data & code in a program?
The linker assembles code and data into program segments. There can be multiple program segments containing either. However, the default is usually to have one of each. If you want to find that information, you need to create a linker MAP file as part of your program build and read that.
As you want to do that in the code, you would need to write something that parsed the contents of the executable file.
Your operating system may have system services that you can use to inspect pages to see (a) if they are valid and (b) what their attributes are. From that you could determine where code resides.
I'm always confused; which one gives correct address? Both give addresses but what's the difference?
%p is correct for pointers. %u is correct for unsigned integers. On most systems they are effectively the same. However, on some they are not (i.e., sizeof (int) != sizeof (int*), or the pointer has a weird format as on segmented intel).
Use %p for pointers.
How can we get the same virtual address details in C++? (Since it is downward compatible with C, we can use same thing, but is there any other way?)
Is there any way we can say which part of memory a given virtual address belongs to? Can we identify that it's a stack address or a heap address or something else?
Memory is just memory. As a teaching tool, memory is often described in terms of data/heap/stack. That does not exist in reality. A heap and a stack are simply blocks of memory that are managed in different ways. A heap can be a stack.
You may not use %u specifier to print a pointer. That specifier is for unsigned int. %p is the correct format specifier for pointers. The difference is that using %u is technically undefined behaviour because the argument is of a different type than is required. Furthermore, you must cast &data to void*.
Indeed, printf is in the c library portion that is included in c++ standard library so you may use it. But if you prefer c++ streams, then you can use std::cout << &data;
That is not possible in standard c++ (nor in c).
In neither standard, is there is distinction between virtual and physical memory. There is just memory. Whatever addresses of that memory represent is specified by the OS. If the program does not run in an OS that uses virtual memory, then the addresses may physical.
I've been looking a bit into Cheat Engine, which allows you to inspect and manipulate the memory of running processes on Windows: You scan for variables based on their value, then you can modify them, e.g. to cheat in a game.
In order to write a bot or something similar, you need to find a static address for the variable you want to change - i.e. one that stays the same if the process is restarted. The method for that goes roughly like this:
Look for the address of the variable you're interested in, searching by value
Look for code using that address, e.g. to find the address of the struct it belongs to (since struct offsets are fixed)
Look for another pointer pointing to that pointer until you find one with a static address (shows as green in Cheat Engine)
It seems to work just fine judging from the tutorials I've looked at, but I have trouble understanding why it works.
Don't all variables, including global static ones, get a pretty random address at runtime time?
Bonus questions:
How can Cheat Engine tell if an address is static (i.e. will stay the same on restart)?
A tutorial referred to the fact that many older and some modern games (e.g. Call of Duty 4) use only static addresses. How is that possible?
I will answer the bonus questions first because they introduce some concepts you may need to know to understand the answer for the main question.
Answering the first bonus question is easy if you know how an executable file works: all the global/static variables are inside the .data section, in which the .exe stores the address offset for the section so Cheat Engine just checks if the variable is in this address range (from this section to the next one).
For the second question, it is possible to use only static addresses, but that is nearly impossible for a game. Even the older ones. What the tutorial creator was probably trying to say is that all variables that he wants, actually had a static pointer pointing to them. But solely by the fact that you create a local variable, or even pass an argument to a function, their values are being stored into the stack. That's why it is nearly impossible to have a "static-only" program. Even if you compile a program that actually doesn't do anything, it will probably have some stuff being stored in the stack.
For the whole question itself, not all dynamic address variables are pointed by a global variable. It depends totally on the programmer. I can create a local variable and never assign its address to a global/static pointer in a C program, for example. The only certain way to find that address in this case is to actually know the code when the variable was first assigned a value in the stack.
Some variables have a dynamic address because they are just local variables, which are stored in the stack the first time they have a value assigned to them.
Some other variables have a static address because they are declared either as a global or a static variable to the compiler. These variables have a fixed address offset that is part of the .data section in the executable file.
The executable file has a fixed offset address for each section inside it, and the .data section is no exception.
But it is worth to note that the offset inside the executable itself is fixed. In the operating system things might be different (all random addresses), but that is the job of an OS, abstracting this kind of stuff for you (creating the executable's virtual address space in this case). So it just looks like static variables are actually static, but only inside the executable's memory space. On the RAM things might be anywhere.
Finally, it is difficult to try to explain this to you because you'll have to understand how executable files work. A good start would be to search for some explanations regarding low-level programming, like stack frame, calling conventions, the Assembly language itself and how compilers use some well-known techniques to manage functions (scopes in general), global/static/local/constant variables, and the memory system (sections, the stack, etc.), and maybe some research into PE (and even ELF) files.
As far as I understand it, variables declared static have a permanent offset within the program data. This means that when the program is loaded into RAM, the offset of the variable will always be the same. Because the beginning address of the program is known globally, finding a static variable based on offset, as you mentioned, should be a trivial task. Therefore, while a pointer to a static variable might be random in the scheme of things, its offset to the beginning of program memory should remain the same no matter when the program starts. So Cheat Engine (though I don't know the software) most likely stores the offset of the static variable, and then when the software starts, applies this logic to find that variable.
As to how it can tell it's a static variable in the first place... well, this is partially a guess, but when you declare a variable static in C, I'm assuming the compiler/linker puts some kind of flag so the OS knows that it's a static variable. It could also be that all static variables are stored in a certain way, or at a certain address offset, for all programs compiled for a certain target system. Again, not too sure about that, but from what I understand about memory management, that seems to make the most sense. With these assumptions, it's quite possible for a program to contain solely static variables. The difference is that memory is assigned statically at program runtime, as a opposed to dynamically (as with a call to malloc() or similar). If the variables were stored dynamically, I'm sure there'd be a way to find them easily, so I don't think it matters to Cheat Engine whether or not a variable is static or not. However, as I'm assuming Cheat Engine wants to modify a game upon startup (just like the old GameSharks used to... ahh, miss those days) it's probably more reliable to modify variables that are static, instead of trying to locate pointers and disassemble the code, etc. etc.
If you're interested in learning more, I'd recommend checking out something like this tutorial over at OSDev!
This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.