Use of inserted instruction is necessary or not - llvm

In LLVM is it necessary that if we insert some instruction in LLVM IR through LLVM Pass ,than also we have to insert an instruction which will use the result of our previous inserted instruction or we have to store result of our inserted instruction into some variable already present in LLVM IR that is not useless.
for example cant i insert instruction
%result = add i32 4 3
and %result is not used in subsequent instructions.

You should be able to insert it but if an optimization pass runs after your pass it might be eliminated because it's unused and doesn't have side effects.

No, it's absolutely not necessary. If you insert the instruction properly (i.e. use the API correctly), it can be left unused.
As a matter of fact, unused values can be left around by various optimization passes as well. LLVM has other passes like DCE (dead code elimination) that will remove unused instructions.

Related

llvm no optimization for certain instructions

I am inserting some instructions into llvm IR to do some Integer Overflow(IO) checking. However, I am annoyed to find some of my instructions inserted get optimized by the optimizer.
For example, when there is
int result = data + 1;
My pass would see this operation as a potential overflow site and add(IR of course, I wrote C here to make my life easier):
int64_t result_64bit = (int64_t)data + 1
if (result_64bit > 2147483647) { report(); }
However, this instruction gets optimized out.
Here's what I tried:
add -O0 flag. It won't work. I have dumped IR after every pass and that happened on pass Remove redundant operation. (Not Redundant from my perspective...)
Move the pass to the end of optimization by setting EP_OptimizerLast flag.
Well, this is not optimal as some information gets lost. For example, I heavily rely on nsw/nuw flag to decide signs, some of these flags gets removed after optimizations.
So what I am asking is if there is some kind of instruction guard in LLVM that guards an instruction against been optimized or removed?

Using inline assembly with serialization instructions

We consider that we are using GCC (or GCC-compatible) compiler on a X86_64 architecture, and that eax, ebx, ecx, edx and level are variables (unsigned int or unsigned int*) for input and output of the instruction (like here).
asm("CPUID":::);
asm volatile("CPUID":::);
asm volatile("CPUID":::"memory");
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx)::"memory");
asm volatile("CPUID":"=a"(eax):"0"(level):"memory");
asm volatile("CPUID"::"a"(level):"memory"); // Not sure of this syntax
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory");
asm("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory");
asm volatile("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level));
I am not used to the inline assembly syntax, and I am wondering what would be the difference between all these calls, in a context where I just want to use CPUID as a serializing instruction (e.g. nothing will be done with the output of the instruction).
Can some of these calls lead to errors?
Which one(s) of these calls would be the most suited (given that I want the least overhead as possible, but at the same time the "strongest" serialization possible)?
First of all, lfence may be strongly serializing enough for your use-case, e.g. for rdtsc. If you care about performance, check and see if you can find evidence that lfence is strong enough (at least for your use-case). Possibly even using both mfence; lfence might be better than cpuid, if you want to e.g. drain the store buffer before an rdtsc.
But neither lfence nor mfence are serializing on the whole pipeline in the official technical-terminology meaning, which could matter for cross-modifying code - discarding instructions that might have been fetched before some stores from another core became visible.
2. Yes, all the ones that don't tell the compiler that the asm statement writes E[A-D]X are dangerous and will likely cause hard-to-debug weirdness. (i.e. you need to use (dummy) output operands or clobbers).
You need volatile, because you want the asm code to be executed for the side-effect of serialization, not to produce the outputs.
If you don't want to use the CPUID result for anything (e.g. do double duty by serializing and querying something), you should simply list the registers as clobbers, not outputs, so you don't need any C variables to hold the results.
// volatile is already implied because there are no output operands
// but it doesn't hurt to be explicit.
// Serialize and block compile-time reordering of loads/stores across this
asm volatile("CPUID"::: "eax","ebx","ecx","edx", "memory");
// the "eax" clobber covers RAX in x86-64 code, you don't need an #ifdef __i386__
I am wondering what would be the difference between all these calls
First of all, none of these are "calls". They're asm statements, and inline into the function where you use them. CPUID itself is not a "call" either, although I guess you could look at it as calling a microcode function built-in to the CPU. But by that logic, every instruction is a "call", e.g. mul rcx takes inputs in RAX and RCX, and returns in RDX:RAX.
The first three (and the later one with no outputs, just a level input) destroy RAX through RDX without telling the compiler. It will assume that those registers still hold whatever it was keeping in them. They're obviously unusable.
asm("CPUID":"=a"(eax),"=b"(ebx),"=c"(ecx),"=d"(edx):"0"(level):"memory"); (the one without volatile) will optimize away if you don't use any of the outputs. And if you do use them, it can still be hoisted out of loops. A non-volatile asm statement is treated by the optimizer as a pure function with no side effects. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#index-asm-volatile
It has a memory clobber, but (I think) that doesn't stop it from optimizing away, it just means that if / when / where it does run, any variables it could possibly read / write are synced to memory, so memory contents match what the C abstract machine would have at that point. This may exclude locals that haven't had their address taken, though.
asm("" ::: "memory") is very similar to std::atomic_thread_fence(std::memory_order_seq_cst), but note that that asm statement has no outputs, and thus is implicitly volatile. That's why it isn't optimized away, not because of the "memory" clobber itself. A (volatile) asm statement with a memory clobber is a compiler barrier against reordering loads or stores across it.
The optimizer doesn't care at all what's inside the first string literal, only the constraints / clobbers, so asm volatile("anything" ::: register clobbers, "memory") is also a compile-time-only memory barrier. I assume this is what you want, to serialize some memory operations.
"0"(level) is a matching constraint for the first operand (the "=a"). You could equally have written "a"(level), because in this case the compiler doesn't have a choice of which register to select; the output constraint can only be satisfied by eax. You could also have used "+a"(eax) as the output operand, but then you'd have to set eax=level before the asm statement. Matching constraints instead of read-write operands are sometimes necessary for x87 stack stuff; I think that came up once in an SO question. But other than weird stuff like that, the advantage is being able to use different C variables for input and output, or not using a variable at all for the input. (e.g. a literal constant, or an lvalue (expression)).
Anyway, telling the compiler to provide an input will probably result in an extra instruction, e.g. level=0 would result in an xor-zeroing of eax. This would be a waste of an instruction if it didn't already need a zeroed register for anything earlier. Normally xor-zeroing an input would break a dependency on the previous value, but the whole point of CPUID here is that it's serializing, so it has to wait for all previous instructions to finish executing anyway. Making sure eax is ready early is pointless; if you don't care about the outputs, don't even tell the compiler your asm statement takes an input. Compilers make it difficult or impossible to use an undefined / uninitialized value with no overhead; sometimes leaving a C variable uninitialized will result in loading garbage from the stack, or zeroing a register, instead of just using a register without writing it first.

How to get "phi" instruction in llvm without optimization

When I use the command clang -emit-llvm -S test.c -o test.ll, there is no any "phi" instruction in the IR file. How can I get it?
I know that I can use the pass "-mem2reg" or "-gvn" to get "phi" instruction. But they would do some optimization. I just want to get "phi" without any optimization.
I'm not sure what you mean by "do some optimization" but it seems to me that mem2reg is exactly what you need. Here is how it's described in the documentation:
This file promotes memory references to be register references. It
promotes alloca instructions which only have loads and stores as uses.
An alloca is transformed by using dominator frontiers to place phi
nodes, then traversing the function in depth-first order to rewrite
loads and stores as appropriate. This is just the standard SSA
construction algorithm to construct “pruned” SSA form.
Clang itself does not produce optimized LLVM IR. It produces fairly straightforward IR wherein locals are kept in memory (using allocas). The optimizations are done by opt on LLVM IR level, and one of the most important optimizations is indeed mem2reg which makes sure that locals are represented in LLVM's SSA values instead of memory.

trace register value in llvm

In llvm, can one trace back to the instruction that defines the value for a particular register? For example, if I have an instruction as:
%add14 = add i32 %add7, %add5
Is here a way for me to trace back to the instruction where add5 is defined?
First of all, there are no registers in LLVM IR: all those things with % in their names are just names of values. You don't store information inside those things, they are not variables or memory locations, they are just names. I recommend reading about SSA form, which helps explains this further.
In any case, what you need to do is invoke the getOperand(n) method on the instruction to get its nth operand - for example, getOperand(0) in your example will return the value named %add7. You can then check whether that value is indeed an instruction (as opposed to, say, a function argument) by checking its type (isa<Instruction>).
To emphasize - calling the getOperand method will give you the actual place in which the operand is defined, nothing else is required.

Checking for backedges in an LLVM pass

I am writing an LLVM pass that modifies the intermediate code. I want to check each terminating instruction of a basic block to see if it has a back edge. To make it more clear, in the following example, I want to see if to reach labels land.lhs.true or if.end, a back jump is required.
entry:
%pa = alloca %struct.Vertex, align 4
.........
br i1 %cmp, label %land.lhs.true, label %if.end
Not sure what you mean by back edge or back jump here, as LLVM intermediate code has no explicit layout in memory. You should think of basic blocks within each function has having no explicit order and no explicit assignment to memory addresses. This is handled by the backend when emitting assembly code.