What is the easiest way to determine the address of a global variable during the LLVM compilation process?
I want to determine the address of a global variable as well as modifying its content.
Related
Can someone please explain this part by Levine from Linker and Loaders, page 131?
the trickiest part of the symbol information is the location information. The location of a static variable doesn't change, but a local variable within a routine may be static, on the stack, in a register, or in optimized code, moved from place to place in different parts of the routineā¦
Does it mean that a static local variable changes its location in memory depending on the calls of the function it belongs to?
No, even a "static local variable" is a static variable. It does not change its location during the runtime of the program.
The "locality" of such a variable just limits its visibility. The compiler allows to reference it only inside its local scope.
However, you could return its address to other scopes and access it by this.
I've been looking a bit into Cheat Engine, which allows you to inspect and manipulate the memory of running processes on Windows: You scan for variables based on their value, then you can modify them, e.g. to cheat in a game.
In order to write a bot or something similar, you need to find a static address for the variable you want to change - i.e. one that stays the same if the process is restarted. The method for that goes roughly like this:
Look for the address of the variable you're interested in, searching by value
Look for code using that address, e.g. to find the address of the struct it belongs to (since struct offsets are fixed)
Look for another pointer pointing to that pointer until you find one with a static address (shows as green in Cheat Engine)
It seems to work just fine judging from the tutorials I've looked at, but I have trouble understanding why it works.
Don't all variables, including global static ones, get a pretty random address at runtime time?
Bonus questions:
How can Cheat Engine tell if an address is static (i.e. will stay the same on restart)?
A tutorial referred to the fact that many older and some modern games (e.g. Call of Duty 4) use only static addresses. How is that possible?
I will answer the bonus questions first because they introduce some concepts you may need to know to understand the answer for the main question.
Answering the first bonus question is easy if you know how an executable file works: all the global/static variables are inside the .data section, in which the .exe stores the address offset for the section so Cheat Engine just checks if the variable is in this address range (from this section to the next one).
For the second question, it is possible to use only static addresses, but that is nearly impossible for a game. Even the older ones. What the tutorial creator was probably trying to say is that all variables that he wants, actually had a static pointer pointing to them. But solely by the fact that you create a local variable, or even pass an argument to a function, their values are being stored into the stack. That's why it is nearly impossible to have a "static-only" program. Even if you compile a program that actually doesn't do anything, it will probably have some stuff being stored in the stack.
For the whole question itself, not all dynamic address variables are pointed by a global variable. It depends totally on the programmer. I can create a local variable and never assign its address to a global/static pointer in a C program, for example. The only certain way to find that address in this case is to actually know the code when the variable was first assigned a value in the stack.
Some variables have a dynamic address because they are just local variables, which are stored in the stack the first time they have a value assigned to them.
Some other variables have a static address because they are declared either as a global or a static variable to the compiler. These variables have a fixed address offset that is part of the .data section in the executable file.
The executable file has a fixed offset address for each section inside it, and the .data section is no exception.
But it is worth to note that the offset inside the executable itself is fixed. In the operating system things might be different (all random addresses), but that is the job of an OS, abstracting this kind of stuff for you (creating the executable's virtual address space in this case). So it just looks like static variables are actually static, but only inside the executable's memory space. On the RAM things might be anywhere.
Finally, it is difficult to try to explain this to you because you'll have to understand how executable files work. A good start would be to search for some explanations regarding low-level programming, like stack frame, calling conventions, the Assembly language itself and how compilers use some well-known techniques to manage functions (scopes in general), global/static/local/constant variables, and the memory system (sections, the stack, etc.), and maybe some research into PE (and even ELF) files.
As far as I understand it, variables declared static have a permanent offset within the program data. This means that when the program is loaded into RAM, the offset of the variable will always be the same. Because the beginning address of the program is known globally, finding a static variable based on offset, as you mentioned, should be a trivial task. Therefore, while a pointer to a static variable might be random in the scheme of things, its offset to the beginning of program memory should remain the same no matter when the program starts. So Cheat Engine (though I don't know the software) most likely stores the offset of the static variable, and then when the software starts, applies this logic to find that variable.
As to how it can tell it's a static variable in the first place... well, this is partially a guess, but when you declare a variable static in C, I'm assuming the compiler/linker puts some kind of flag so the OS knows that it's a static variable. It could also be that all static variables are stored in a certain way, or at a certain address offset, for all programs compiled for a certain target system. Again, not too sure about that, but from what I understand about memory management, that seems to make the most sense. With these assumptions, it's quite possible for a program to contain solely static variables. The difference is that memory is assigned statically at program runtime, as a opposed to dynamically (as with a call to malloc() or similar). If the variables were stored dynamically, I'm sure there'd be a way to find them easily, so I don't think it matters to Cheat Engine whether or not a variable is static or not. However, as I'm assuming Cheat Engine wants to modify a game upon startup (just like the old GameSharks used to... ahh, miss those days) it's probably more reliable to modify variables that are static, instead of trying to locate pointers and disassemble the code, etc. etc.
If you're interested in learning more, I'd recommend checking out something like this tutorial over at OSDev!
Imagine you have a class A with a static field int mstatic.
Imagine if that class has a method mymethod that modifies mstatic. When compiling mymethod, how can the adress of mstatic be known ? I know that in case of non-static fields, a pointer to the calling object (the famous "this") is implicitly passed to the method so it is used to find the adresses, but how do we do for static fields ?
Static fields are allocated similarly to namespace-scope or global variables... there's basically one or two areas (variables needing 0 initialisation may be separated from those needing initial non-0 values) sequentially populated with all such variables in the translation unit. If the variable's defined in another variable, the address will be patched in during linking or loading. Note that the addresses are typically effectively hard-coded (fixed address, perhaps from a specific data segment register), unlike stack (which may be stack register relative, but the stack register is modified as functions are called and return, unlike data segment registers which may be set to the same value while the thread is running) or heap hosted variables (where the address is determined during malloc or new.
I am trying to manually build a list of instructions where a particular variable is getting assigned a value in the LLVM IR.
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction. This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field? If not, is there some way to create a dummy instruction which can be treated as a special marker for initial definition of the global variables?
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction.
That's not entirely accurate. It's true as long as the variable is in memory (and assignment is done via store), but if it is promoted to registers you'll need to rely on llvm.dbg.value calls to track assignments into it.
This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Assignments to globals also appear as stores - except for the initial assignment.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field?
If by "where" you mean in which source line, you'll have to rely on the debug-info metadata.
I have a few global vars I need to set the value to, should I set it into the main/winmain function? or should I set it the first time I use each var?
Instead, how about not using global variables at all?
Pass the variables as function parameters to the functions that need them, or store pointers or references to them as members of classes that use them.
Is there a chance that you won't be using the global var? Is calculating any of them expensive? If so then you have a argument for lazy initialization. If they are quick to calculate or always going to be used then init them on startup. There is no reason not to, and you will save yourself the head ache of having to check for initialization every time you use it.
When the linker links your program together, global variables (also known as writable static data) are assigned to their own section of memory (the ELF .data section) and have a value pre-assigned to them. This will mean that the compiler will not need to generate instructions to initialize them. If you initialize them in the main function, the compiler will generate initialization instructions unless it is clever enough to optimize them out.
This is certainly true for ELF file formats, I am not sure about other executable formats.