trace register value in llvm

trace register value in llvm - llvm

In llvm, can one trace back to the instruction that defines the value for a particular register? For example, if I have an instruction as:
%add14 = add i32 %add7, %add5
Is here a way for me to trace back to the instruction where add5 is defined?

First of all, there are no registers in LLVM IR: all those things with % in their names are just names of values. You don't store information inside those things, they are not variables or memory locations, they are just names. I recommend reading about SSA form, which helps explains this further.
In any case, what you need to do is invoke the getOperand(n) method on the instruction to get its nth operand - for example, getOperand(0) in your example will return the value named %add7. You can then check whether that value is indeed an instruction (as opposed to, say, a function argument) by checking its type (isa<Instruction>).
To emphasize - calling the getOperand method will give you the actual place in which the operand is defined, nothing else is required.

Related

[3.0]Question about how to use the store IR instruction to obtain the blockaddress

I am writing to enquire about a question.
When I read the IR language generated by a piece of C programs, I found that in C programs, the behavior of getting tag addresses is handled by a store directive after it is translated into IR.
store i8* blockaddress(#func_name, %label_name), i8** %val_name
However, I read the official documents. Here's how Blockaddress works:
blockaddress(#function, %block)
The 'blockaddress' constant computes the address of the specified basic block in the specified function, and always has an i8* type. Taking the address of the entry block is illegal.
This value only has defined behavior when used as an operand to the '[indirectbr](file:///D:/opensourse/llvm-3.0.src/docs/LangRef.html#i_indirectbr)' instruction, or for comparisons against null. Pointer equality tests between labels addresses results in undefined behavior — though, again, comparison against null is ok, and no label is equal to the null pointer. This may be passed around as an opaque pointer sized value as long as the bits are not inspected. This allows ptrtoint and arithmetic to be performed on these values so long as the original value is reconstituted before the indirectbr instruction.
Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific.
So I want to figure out how stores construct blockaddress in the IR program by storing them in%5.
What should I do if I want to use C++ to construct this store directive to get Addresses of Basic Blocks?
I made some attempts, such as constructing an indirectbr:
irBuilder.SetInsertPoint(indirectbr_bb);
IndirectBrInst *indirect_br = IndirectBrInst::Create(BlockAddress::get(func, instr2_bb), 0, indirectbr_bb);
indirect_br->addDestination(instr1_bb);
indirect_br->addDestination(instr2_bb);
The IR program generated is as follows:
indirectbr_bb: ; preds = %dispatch_then_bb
indirectbr i8* blockaddress(#jit_func, %instr2_bb), [label %instr1_bb, label %instr2_bb]
After my test, it can be executed correctly. Therefore, I want to know how to construct a similar store IR to store the address of the basic block in the array.

Blockaddress::get(basicblock *bb) returns a blockaddress pointer, which is a subclass of constant and a derived class of Value.
In LLVM IR, all variables are of type Value.
So we can do this:
ArrayType *arrayType = ArrayType::get(irBuilder.getInt8PtrTy(), 1024);
module->getOrInsertGlobal("label_array", arrayType);
GlobalVariable *label_array = module->getNamedGlobal("label_array");
vector <Constant *> array_elems;
array_elems.push_back(BlockAddress::get(func, ret_bb));
array_elems.push_back(BlockAddress::get(func, instr1_bb));
array_elems.push_back(BlockAddress::get(func, instr2_bb));
label_array->setInitializer(ConstantArray::get(arrayType, array_elems));

How to change returned value of function

There is a function in this program, that currently returns a 1. I would prefer for it to return a 0.
uregs[R_PC] is the program counter.
arg0 is the program counter offset from where we left the function (assembly, "ret").
From this I deduce: we can add the offset to the program counter, uregs[R_PC]+arg0, to find the address of the return value.
I have allocated a 32-bit "0", and I try to write 2 bytes of it into the address where the return value lives (our function expects to return a BOOL16, so we only need 2 bytes of 0):
sudo dtrace -p "$(getpid)" -w -n '
int *zero;
BEGIN { zero=alloca(4); *zero=0; }
pid$target::TextOutA:return {
copyout(zero, uregs[R_PC]+arg0, 2);
}'
Of course I get:
dtrace: error on enabled probe ID 2 (ID 320426: pid60498:gdi32.dll.so:TextOutA:return): invalid address (0x41f21c) in action #1 at DIF offset 60
uregs[R_PC] is presumably a userspace address. Probably copyout() wants a kernel address.
How do I translate the userspace address uregs[R_PC] to kernel-space? I know that with copyin() we can read data stored at user-space address, into kernel-space. But that doesn't give us the kernel address of that memory.
Alternatively: is there some other way to change the return value using DTrace?

DTrace is not the right tool for this. You should instead use a debugger like dbx, mdb or gdb.
In the meantime, I'll try to clarify some of the concepts that you've mentioned.
To begin, you may well see in the source code for a simple function that there is a single return. It is quite possible that the compiled result, i.e. the function's machine-specific implementation, also contains only a single point of exit. Typically, however, the implementation is likely to contain more than one exit point and it may be useful for a developer to know from which specific one a function returned. It is this information, described as an offset from the start of the function, that is given by a return probe's arg0. Your D script, then, is attempting to update part of the program or library itself; although the addition of arg0 makes the destination address somewhat random, the result is most likely still within the text section, which is read-only.
Secondly, in the common case, a function's implementation returns a value by storing it in a specific register; e.g. %rax on amd64. Thus overriding a return value would neccessitate overriding a register value. This is impossible because DTrace's access to the user-land registers is read-only.
It is possible that a function is implemented in such a way that, as it returns, it recovers the return value from a specific memory location before writing it into the appropriate register. If this were the case then one could, indeed, modify the value in memory (given its location) just before it is accessed. However, this is going to work for only a subset of cases: the return value might equally be contained in another register or else simply expressed as a constant in the program text itself. In any case, it would be far more trouble than it's worth given the existence of more appropriate debugging tools.

C++ what happens to a value that is returned but is not stored?

Say I have a function that returns an int. I don't store the value from the function call. I presume that it is not stored in memory and fades into the aether, but I don't know.
Thank you.

An int return value will normally be stored in a register (e.g., EAX or RAX on 32-bit or 64-bit Intel, respectively).
It won't fade. It'll simply be overwritten when the compiler needs that register for some other purpose. If the function in question is expanded inline, the compiler may detect that the value isn't used, and elide the code to write or compute the value at all.

LLVM: Replacing all instances of an address with a constant

I'm trying to replace all instances of an address with a constant.
I'm getting & testing the address of store with the following (i is an instruction)
//already know it's a store instruction at this point
llvm::Value *addy = i->getOperand(0);
if(llvm::ConstantInt* c = dyn_cast<llvm:::ConstantInt>(addy)){
//replace all uses of the address with the constant
//operand(1) will be the address the const would be stored at
i->getOperand(1)->replaceAllUsesWith(c);
}
I'd think this would work, but I'm getting the error that
"Assertion: New->getType()== getType() && replaceAllUses of value with new value of different type!" failed
and I'm not sure why...my understanding of replaceAllUses is that it would replace usage of address (i->getOperand(1) with the constant?

The error message is pretty straightforward: the type of the new value is not identical to the type of the old value that you are replacing.
LLVM IR is strongly typed, and as you can see in the language reference, every instruction has a specific type it expects as each operand. For example, store requires that the address's type will always be a pointer to the type of the value being stored.
As a result, whenever you replace the usage of a value, you must ensure first that they both have the same type - replaceAllUsesWith actually has an assert to verify it, as you can see, and you failed it. It's also simple to see why: operand 1 of a store instruction is always of some pointer type, and a ConstantInt always represents something of some integer type, so surely they can never match.
What exactly are you trying to achieve? Perhaps you are thinking about replacing each load of that store's address with a usage of the constant? In that case, you'll have to find yourself all the loads that use that address, and for each of them (for each of the loads, I mean, not of the addresses) perform replaceAllUsesWith with the constant. There are standard LLVM passes that can do those things for you, by the way - check out the pass list. I'm guessing mem2reg followed by some constant propagation pass will take care of this.

gdb : findind every jumps to an address

I'm trying to understand a small binary using gdb but there is something I can't find a way to achieve : how can I find the list of jumps that point to a specified address?
I have a small set of instructions in the disassembled code and I want to know where it is called.
I first thought about searching the corresponding instruction in .text, but since there are many kind of jumps, and address can be relative, this can't work.
Is there a way to do that?
Alternatively, if I put a breakpoint on this address, is there a way to know the address of the previous instruction (in this case, the jump)?

If this is some subroutine being called from other places, then it must respect some ABI while it's called.
Depending on a CPU used, the return address (and therefore a place from where it was called) will be stored somewhere (on stack or in some registers). If you replace original code with the one that examines this, you can create a list of return addresses. Or simpler, as you suggested, if you use gdb and put a breakpoint at that routine, you can see from where it was called by using a bt command.
If it was actual jump (versus a "jump to subroutine") that led you there (which I doubt, if it's called from many places, unless it's a kind of longjmp/setjmp), then you will probably not be able to determine where this was called from, unless the CPU you are using allows you to trace the execution in some way.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js