How to control the location of a global variable in LLVM IR? - c++

I'm trying to modify LLVM so that it keeps certain constants and functions contiguous in memory.
In other words, I need to ensure that the machine codes for certain functions are always preceded by some ~4-byte constant in memory. The function body itself must not be modified.
Could I achieve this simply through modifying the LLVM IR somehow?
If yes:
How would I state in the LLVM IR to keep a variable and a function contiguous in memory?
If no:
What part of the code generation process (i.e. which pass(es)) should I modify in order to achieve this? Any links to the projects/files I should look at would be helpful, since I'm not sure where to begin yet.

As far as I know, I don't think you can do that by just modifying the IR; you'd have to write something to handle it, yourself. It shouldn't be a pass, either - it's too low-level, it should run during the target-specific code generation. You can piggyback on an existing target and just modify this aspect, of course, you don't have to write a new target from scratch. I don't know which location exactly will be good for this, though.
I think a good way to pass this information from the IR level to the DAG during the code generation would be using metadata: attach metadata to either the function or the associated constant that will link them with each other, then later on use that link for emitting them together. See this thread on llvm-dev for information how to transfer the metadata.

Related

Stack/Frame pointer as external variable

I was writing some logging logic and wanted to make some indentations. The easiest way to understand whether any function call was present or if some function has finished is to look at the current address of the stack/frame. Let's suppose that stack grows upside down. Then if the stack address in the log() call is smaller than during the previous call, we can increase the indent since some function call was present. I know there are functions like backtrace() that know how to dump it, or you can use some assembly. However, I remember reading about external variables that can be used to retrieve this information. Can someone name these variables or give a reference where I can find them (as far as I remember, it was in some computer systems book like "Computer Systems: A Programmer's Perspective "). Otherwise, what is the most convenient/fast way of getting this information?
Update: I have accidentally found the link I was referring to - Print out value of stack pointer
TLDR: There is no portable way to do what I have described...
This method is highly nonportable and will break under various transformations, but if you're just using it for debug logging it might be suitable.
The easiest way to get something resembling the current stack frame address is just take the address of any automatic-storage (local, non-static) variable. If you want a baseline to compare it against, save the address of some local in main or similar to a global variable. If your program is or might be multi-threaded, use a thread-local variable for this if needed.

Tracking global definitions in LLVM

I am trying to manually build a list of instructions where a particular variable is getting assigned a value in the LLVM IR.
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction. This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field? If not, is there some way to create a dummy instruction which can be treated as a special marker for initial definition of the global variables?
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction.
That's not entirely accurate. It's true as long as the variable is in memory (and assignment is done via store), but if it is promoted to registers you'll need to rely on llvm.dbg.value calls to track assignments into it.
This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Assignments to globals also appear as stores - except for the initial assignment.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field?
If by "where" you mean in which source line, you'll have to rely on the debug-info metadata.

is it possible to use function pointers this way?

This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.

How to place a variable at a given absolute address in memory (with Visual C++)

How can I statically tell Visual C++ to place a global variable at a given absolute address in memory, like what __attribute__((at(address))) does?
It can be done but I don't believe there is a predefined way to do it so it will take some experimentation. Even though I don't see much benefit if you create your variable at run time just at the start of user code execution.
So first specify the section/segment where to init your variable using the allocate MS specific specifier. Then either start your application in real scenario, dump it or debug it and see where your variable appears. Watch for relocations (there is some ways to try to enforce no relocation but they are not guaranteed to be honored all the time). Another way is to use some code in your app like this one to find the address of the section you defined.
If you for some reason cannot get a consistent behavior you can use this utility to manipulate the virtual address of your object file. All in all except hurdles along the way but overall I don't see why you wouldn't be able to get it to work for your specific scenario if you are persistent enough.

Debugging - detect a function writing a memory location

I need to know if there is a way with linux debugger gdb to detect if a function (any function) of a specific C++ class (represented by file Chord.cc) access a specific memory location (let's say 0xffffbc).
That will help me a lot.
Thanks.
GDB watchpoints are what you're looking for:
Quote from that page:
You can use a watchpoint to stop execution whenever the value of an
expression changes, without having to predict a particular place where
this may happen. (This is sometimes called a data breakpoint.) The
expression may be as simple as the value of a single variable, or as
complex as many variables combined by operators. Examples include:
A reference to the value of a single variable.
An address cast to an appropriate data type. For example, `*(int
*)0x12345678' will watch a 4-byte region at the specified address (assuming an int occupies 4 bytes).
You can then try to apply the techniques from this post to make it a conditional watchpoint, and see if you can find a way to restrict it to particular function calls from that class. You may also find this discussion relevant in that respect.