I noticed that llvm IR tries to allocate spaces in the very beginning of a function regardless of the location of declaring variables inside of the function in the C source code. I want to know how these alloca instructions are ordered. My guess is first function argument and then local variables. Is there any specific rules I can refer to?
No particular order. Also, different variables may use the same stack slot.
Related
I am trying to manually build a list of instructions where a particular variable is getting assigned a value in the LLVM IR.
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction. This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field? If not, is there some way to create a dummy instruction which can be treated as a special marker for initial definition of the global variables?
For local variables in a function, i can easily get the right set of instructions by using the instruction iterator and checking the operands of a particular instruction.
That's not entirely accurate. It's true as long as the variable is in memory (and assignment is done via store), but if it is promoted to registers you'll need to rely on llvm.dbg.value calls to track assignments into it.
This approach doesn't seem to work for the global variables since there's no store instruction associated with them.
Assignments to globals also appear as stores - except for the initial assignment.
Is there some way to get keep track of where the global variable is being defined without looking at the metadata field?
If by "where" you mean in which source line, you'll have to rely on the debug-info metadata.
This is something that recently crossed my mind, quoting from wikipedia: "To initialize a function pointer, you must give it the address of a function in your program."
So, I can't make it point to an arbitrary memory address but what if i overwrite the memory at the address of the function with a piece of data the same size as before and than invoke it via pointer ? If such data corresponds to an actual function and the two functions have matching signatures the latter should be invoked instead of the first.
Is it theoretically possible ?
I apologize if this is impossible due to some very obvious reason that i should be aware of.
If you're writing something like a JIT, which generates native code on the fly, then yes you could do all of those things.
However, in order to generate native code you obviously need to know some implementation details of the system you're on, including how its function pointers work and what special measures need to be taken for executable code. For one example, on some systems after modifying memory containing code you need to flush the instruction cache before you can safely execute the new code. You can't do any of this portably using standard C or C++.
You might find when you come to overwrite the function, that you can only do it for functions that your program generated at runtime. Functions that are part of the running executable are liable to be marked write-protected by the OS.
The issue you may run into is the Data Execution Prevention. It tries to keep you from executing data as code or allowing code to be written to like data. You can turn it off on Windows. Some compilers/oses may also place code into const-like sections of memory that the OS/hardware protect. The standard says nothing about what should or should not work when you write an array of bytes to a memory location and then call a function that includes jmping to that location. It's all dependent on your hardware and your OS.
While the standard does not provide any guarantees as of what would happen if you make a function pointer that does not refer to a function, in real life and in your particular implementation and knowing the platform you may be able to do that with raw data.
I have seen example programs that created a char array with the appropriate binary code and have it execute by doing careful casting of pointers. So in practice, and in a non-portable way you can achieve that behavior.
It is possible, with caveats given in other answers. You definitely do not want to overwrite memory at some existing function's address with custom code, though. Not only is typically executable memory not writeable, but you have no guarantees as to how the compiler might have used that code. For all you know, the code may be shared by many functions that you think you're not modifying.
So, what you need to do is:
Allocate one or more memory pages from the system.
Write your custom machine code into them.
Mark the pages as non-writable and executable.
Run the code, and there's two ways of doing it:
Cast the address of the pages you got in #1 to a function pointer, and call the pointer.
Execute the code in another thread. You're passing the pointer to code directly to a system API or framework function that starts the thread.
Your question is confusingly worded.
You can reassign function pointers and you can assign them to null. Same with member pointers. Unless you declare them const, you can reassign them and yes the new function will be called instead. You can also assign them to null. The signatures must match exactly. Use std::function instead.
You cannot "overwrite the memory at the address of a function". You probably can indeed do it some way, but just do not. You're writing into your program code and are likely to screw it up badly.
I need to know if there is a way with linux debugger gdb to detect if a function (any function) of a specific C++ class (represented by file Chord.cc) access a specific memory location (let's say 0xffffbc).
That will help me a lot.
Thanks.
GDB watchpoints are what you're looking for:
Quote from that page:
You can use a watchpoint to stop execution whenever the value of an
expression changes, without having to predict a particular place where
this may happen. (This is sometimes called a data breakpoint.) The
expression may be as simple as the value of a single variable, or as
complex as many variables combined by operators. Examples include:
A reference to the value of a single variable.
An address cast to an appropriate data type. For example, `*(int
*)0x12345678' will watch a 4-byte region at the specified address (assuming an int occupies 4 bytes).
You can then try to apply the techniques from this post to make it a conditional watchpoint, and see if you can find a way to restrict it to particular function calls from that class. You may also find this discussion relevant in that respect.
I'm trying to modify LLVM so that it keeps certain constants and functions contiguous in memory.
In other words, I need to ensure that the machine codes for certain functions are always preceded by some ~4-byte constant in memory. The function body itself must not be modified.
Could I achieve this simply through modifying the LLVM IR somehow?
If yes:
How would I state in the LLVM IR to keep a variable and a function contiguous in memory?
If no:
What part of the code generation process (i.e. which pass(es)) should I modify in order to achieve this? Any links to the projects/files I should look at would be helpful, since I'm not sure where to begin yet.
As far as I know, I don't think you can do that by just modifying the IR; you'd have to write something to handle it, yourself. It shouldn't be a pass, either - it's too low-level, it should run during the target-specific code generation. You can piggyback on an existing target and just modify this aspect, of course, you don't have to write a new target from scratch. I don't know which location exactly will be good for this, though.
I think a good way to pass this information from the IR level to the DAG during the code generation would be using metadata: attach metadata to either the function or the associated constant that will link them with each other, then later on use that link for emitting them together. See this thread on llvm-dev for information how to transfer the metadata.
I have a few global vars I need to set the value to, should I set it into the main/winmain function? or should I set it the first time I use each var?
Instead, how about not using global variables at all?
Pass the variables as function parameters to the functions that need them, or store pointers or references to them as members of classes that use them.
Is there a chance that you won't be using the global var? Is calculating any of them expensive? If so then you have a argument for lazy initialization. If they are quick to calculate or always going to be used then init them on startup. There is no reason not to, and you will save yourself the head ache of having to check for initialization every time you use it.
When the linker links your program together, global variables (also known as writable static data) are assigned to their own section of memory (the ELF .data section) and have a value pre-assigned to them. This will mean that the compiler will not need to generate instructions to initialize them. If you initialize them in the main function, the compiler will generate initialization instructions unless it is clever enough to optimize them out.
This is certainly true for ELF file formats, I am not sure about other executable formats.