Write or Read Instructions in LLVM - llvm

I just wanted to make sure I understand getOperand() right. It seems like getOperand() return operands in a reverse order:
so if I have:
%1 = mul nsw i32 7, 2 # The c source code is: a = 7; b = a*2
ret i32 %1 # The c source code is: return a;
Correct me if I'm wrong:
In the first instruction, getOperand(0) gives me 'i32' (what is being read) and getOpernad(1) 'nsw' (what is being written to).
In the second instruction, the only operand is i32 which is being read.
So I guess my question is, if the instruction is writing to something, is it the last operand?

The mul instruction is multiplication, so no, its operand do not correspond to those C expressions. You see this instruction instead of allocas and stores because Clang figured out your code is constant expression and propagated it. And AFAIK, there is nothing you can do to stop it - Clang performs constant propagation even with -O0.

Related

Is it possible to tell the compiler that an object reachable through a pointer has changed? [duplicate]

Consider the following small function:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
}
Using gcc, this compiles to:
foo:
nop
mov DWORD PTR [rdi+40], 2
ret
Note in particular, that the first write to iptr, iptr[10] = 1 doesn't occur at all: the inline asm nop is the first thing in the function, and only the final write of 2 appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr itself, but not the memory it points to.
I can tell the compiler that memory must be up to date with a memory clobber, like so:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):"memory");
iptr[10] = 2;
}
which results in the expected code:
foo:
mov DWORD PTR [rdi+40], 1
nop
mov DWORD PTR [rdi+40], 2
ret
However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:
void foo2(int* iptr, long* lptr) {
iptr[10] = 1;
lptr[20] = 100;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
lptr[20] = 200;
}
The desired behavior is to let the compiler optimize away the first write to lptr[20], but not the first write to iptr[10]. The "memory" clobber cannot achieve this because it means both writes have to occur:
foo2:
mov DWORD PTR [rdi+40], 1
mov QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
nop
mov DWORD PTR [rdi+40], 2
mov QWORD PTR [rsi+160], 200
ret
Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?
That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)
The simple option is asm volatile and a "memory" clobber1.
The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.
Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr). Or of course "+m" with the same syntax.
That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)
If you leave the size unspecified with [], that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10] or [some_variable], that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1] is not part of the input.
GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char.
(from the GCC manual)
An x86 example where the string memory argument is of unknown length.
asm("repne scasb"
: "=c" (count), "+D" (p)
: "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.
But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.
Background:
This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.
GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m" or "=m" operand constraint, and it picks the addressing mode.
Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile plus a "memory" clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.
See also Looping over arrays with inline assembly for using an asm statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m" and "=m" operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.
Footnote 1: A "memory" clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm statement with a "memory" clobber.
asm volatile is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).
Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile, the asm statement could be CSEd out of a loop. (A "memory" clobber does not make the optimizer treat all memory as an input when considering whether the asm statement even needs to run.)
asm with no output operands is implicitly volatile, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).
e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") has an output operand so is not implicitly volatile. If you used it like
arr[5] = 1;
total += asm_sum(arr, len);
memcpy(arr, foo, len);
total += asm_sum(arr, len);
Without volatile the 2nd asm_sum could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory" clobber will have the desired effect of requiring memory to be in sync.

RISC-V inline assembly struct optimized away [duplicate]

Consider the following small function:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
}
Using gcc, this compiles to:
foo:
nop
mov DWORD PTR [rdi+40], 2
ret
Note in particular, that the first write to iptr, iptr[10] = 1 doesn't occur at all: the inline asm nop is the first thing in the function, and only the final write of 2 appears (after the ASM call). Apparently the compiler decides that it only needs to provide an up-to-date version of the value of iptr itself, but not the memory it points to.
I can tell the compiler that memory must be up to date with a memory clobber, like so:
void foo(int* iptr) {
iptr[10] = 1;
__asm__ volatile ("nop"::"r"(iptr):"memory");
iptr[10] = 2;
}
which results in the expected code:
foo:
mov DWORD PTR [rdi+40], 1
nop
mov DWORD PTR [rdi+40], 2
ret
However, this is too strong of a condition, since it tells the compiler all memory has to be written. For example, in the following function:
void foo2(int* iptr, long* lptr) {
iptr[10] = 1;
lptr[20] = 100;
__asm__ volatile ("nop"::"r"(iptr):);
iptr[10] = 2;
lptr[20] = 200;
}
The desired behavior is to let the compiler optimize away the first write to lptr[20], but not the first write to iptr[10]. The "memory" clobber cannot achieve this because it means both writes have to occur:
foo2:
mov DWORD PTR [rdi+40], 1
mov QWORD PTR [rsi+160], 100 ; lptr[10] written unecessarily
nop
mov DWORD PTR [rdi+40], 2
mov QWORD PTR [rsi+160], 200
ret
Is there some way to tell compilers accepting gcc extended asm syntax that the input to the asm includes the pointer and anything it can point to?
That's correct; asking for a pointer as input to inline asm does not imply that the pointed-to memory is also an input or output or both. With a register input and register output, for all gcc knows your asm just aligns a pointer by masking off the low bits, or adds a constant to it. (In which case you would want it to optimize away a dead store.)
The simple option is asm volatile and a "memory" clobber1.
The narrower more specific way you're asking for is to use a "dummy" memory operand as well as the pointer in a register. Your asm template doesn't reference this operand (except maybe inside an asm comment to see what the compiler picked). It tells the compiler which memory you actually read, write, or read+write.
Dummy memory input: "m" (*(const int (*)[]) iptr)
or output: "=m" (*(int (*)[]) iptr). Or of course "+m" with the same syntax.
That syntax is casting to a pointer-to-array and dereferencing, so the actual input is a C array. (If you actually have an array, not pointer, you don't need any casting and can just ask for it as a memory operand.)
If you leave the size unspecified with [], that tells GCC that any memory accessed relative to that pointer is an input, output, or in/out operand. If you use [10] or [some_variable], that tells the compiler the specific size. With runtime-variable sizes, gcc in practice misses the optimization that iptr[size+1] is not part of the input.
GCC documents this and therefore supports it. I think it's not a strict-aliasing violation if the array element type is the same as the pointer, or maybe if it's char.
(from the GCC manual)
An x86 example where the string memory argument is of unknown length.
asm("repne scasb"
: "=c" (count), "+D" (p)
: "m" (*(const char (*)[]) p), "0" (-1), "a" (0));
If you can avoid using an early-clobber on the pointer input operand, the dummy memory input operand will typically pick a simple addressing mode using that same register.
But if you do use an early-clobber for strict correctness of an asm loop, sometimes a dummy operand will make gcc waste instructions (and an extra register) on a base address for the memory operand. Check the asm output of the compiler.
Background:
This is a widespread bug in inline-asm examples which often goes undetected because the asm is wrapped in a function that doesn't inline into any callers that tempt the compiler into reordering stores for merging doing dead-store elimination.
GNU C inline asm syntax is designed around describing a single instruction to the compiler. The intent is that you tell the compiler about a memory input or memory output with a "m" or "=m" operand constraint, and it picks the addressing mode.
Writing whole loops in inline asm requires care to make sure the compiler really knows what's going on (or asm volatile plus a "memory" clobber), otherwise you risk breakage when changing the surrounding code, or enabling link-time optimization that allows for cross-file inlining.
See also Looping over arrays with inline assembly for using an asm statement as the loop body, still doing the loop logic in C. With actual (non-dummy) "m" and "=m" operands, the compiler can unroll the loop by using displacements in the addressing modes it chooses.
Footnote 1: A "memory" clobber gets the compiler to treat the asm like a non-inline function call (that could read or write any memory except for locals that escape analysis has proved have not escaped). The escape analysis includes input operands to the asm statement itself, but also any global or static variables that any earlier call could have stored pointers into. So usually local loop counters don't have to be spilled/reloaded around an asm statement with a "memory" clobber.
asm volatile is necessary to make sure the asm isn't optimized away even if its output operands are unused (because you require the un-declared the side-effect of writing memory to happen).
Or for memory that is only read by asm, you you need the asm to run again if the same input buffer contains different input data. Without volatile, the asm statement could be CSEd out of a loop. (A "memory" clobber does not make the optimizer treat all memory as an input when considering whether the asm statement even needs to run.)
asm with no output operands is implicitly volatile, but it's a good idea to make it explicit. (The GCC manual has a section on asm volatile).
e.g. asm("... sum an array ..." : "=r"(sum) : "r"(pointer), "r"(end_pointer) : "memory") has an output operand so is not implicitly volatile. If you used it like
arr[5] = 1;
total += asm_sum(arr, len);
memcpy(arr, foo, len);
total += asm_sum(arr, len);
Without volatile the 2nd asm_sum could optimize away, assuming that the same asm with the same input operands (pointer and length) will produce the same output. You need volatile for any asm that's not a pure function of its explicit input operands. If it doesn't optimize away, then the "memory" clobber will have the desired effect of requiring memory to be in sync.

LLVM IR: Get LVALUE operand

I have following instruction:
%ptrA = getelementptr float, float addrspace(1)* %A, i32 %id
I can get the operands %A and %id using getOperand(0) and getOperand(1). I was wondering if getOperand will work on %ptrA? If yes, would it be getOperand(3)?
------------------------------------Edit----------------------------
So I changed my code as follows:
for (Instruction &I : instructions(F)){
if (cast<Operator>(I).getOpcode() == Instruction::GetElementPtr){
Value* AddrPointer = cast<Value>(I);
I keep getting error:
error: cannot convert ‘llvm::Value’ to ‘llvm::Value*’ in initialization
Value* AddrPointer = cast<Value>(I);
^
I see that there is some problem with type mismatch.
Thank you.
Your question lacks quite a bit of context, but I will assume you're working with an llvm::Instruction * representing that particular getelementptr instruction. No, getOperand() will not allow you to access %ptrA. In general, getOperand() only allows access to the instruction's operands, or arguments, but not its return value. In IR, %ptrA is not so much an operand of the instruction like in traditional assembly, but can be thought of more like the return value of the instruction.
The syntax for what you're trying to do is actually very convenient. An llvm::Instruction object itself represents its own return value. In fact, llvm::Instruction is a derived class of llvm::Value. You can use llvm::cast, with llvm::Value as the template argument and the result will actually be an llvm::Value * which represents the return value of getelementptr.
llvm::Instruction * instruc;
//next line assumes instruc has your getelementptr instruction
llvm::Value * returnval = llvm::cast<llvm::Value>(instruc);
//returnval now contains the result of the instruction
//you could potentially create new instructions with IRBuilder using returnval as an argument, and %ptrA would actually be passed as an operand to those instructions
Furthermore, many of the functions that actually create instructions (the llvm::IRBuilder::Create* instructions, for instance) don't even return llvm::Instruction *s but rather llvm::Value *s. This is very convenient, because most of the time if you need to feed the return value of an instruction into another instruction, you can simply pass the return value of whatever Create function you called into the next Create function, without needing to do any casting.

Optimization switches - what do they really do? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Probably everyone uses some kind of optimization switches (in case of gcc, the most common one is -O2 I believe).
But what does gcc (and other compilers like VS, Clang) really do in presence of such options?
Of course there is no definite answer, since it depends very much on platform, compiler version, etc.
However, if possible, I would like to collect a set of "rules of thumb".
When should I think about some tricks to speed-up the code and when should I just leave the job to the compiler?
For example, how far will compiler go in such (a little bit artifficial...)
cases, for different optimization levels:
1) sin(3.141592)
// will it be evaluated at compile time or should I think of a look-up table to speed-up the calculations?
2) int a = 0; a = exp(18), cos(1.57), 2;
// will the compiler evaluate exp and cos, although not needed, as the value of the expression is equal 2?
3)
for (size_t i = 0; i < 10; ++i) {
int a = 10 + i;
}
// will the compiler skip the whole loop as it has no visible side-effects?
Maybe you can think of other examples.
If you want to know what a compiler does, your best bet is to have a look at the compiler documentation. For optimizations, you may look at the LLVM's Analysis and Transform Passes for example.
1) sin(3.141592) // will it be evaluated at compile time ?
Probably. There are very precise semantics for IEEE float computations. This might be surprising if you change the processor flags at runtime, by the way.
2) int a = 0; a = exp(18), cos(1.57), 2;
It depends:
whether the functions exp and cos are inline or not
if they are not, whether they correctly annotated (so the compiler know they have no side-effect)
For functions taken from your C or C++ Standard library, they should be correctly recognized/annotated.
As for the eliminitation of the computation:
-adce: Aggressive Dead Code Elimination
-dce: Dead Code Elimination
-die: Dead Instruction Elimination
-dse: Dead Store Elimination
compilers love finding code that is useless :)
3)
Similar to 2) actually. The result of the store is not used and the expression as no side-effect.
-loop-deletion: Delete dead loops
And for the final: what not put the compiler to the test ?
#include <math.h>
#include <stdio.h>
int main(int argc, char* argv[]) {
double d = sin(3.141592);
printf("%f", d);
int a = 0; a = (exp(18), cos(1.57), 2); /* need parentheses here */
printf("%d", a);
for (size_t i = 0; i < 10; ++i) {
int a = 10 + i;
}
return 0;
}
Clang tries to be helpful already during the compilation:
12814_0.c:8:28: warning: expression result unused [-Wunused-value]
int a = 0; a = (exp(18), cos(1.57), 2);
^~~ ~~~~
12814_0.c:12:9: warning: unused variable 'a' [-Wunused-variable]
int a = 10 + i;
^
And the emitted code (LLVM IR):
#.str = private unnamed_addr constant [3 x i8] c"%f\00", align 1
#.str1 = private unnamed_addr constant [3 x i8] c"%d\00", align 1
define i32 #main(i32 %argc, i8** nocapture %argv) nounwind uwtable {
%1 = tail call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([3 x i8]* #.str, i64 0, i64 0), double 0x3EA5EE4B2791A46F) nounwind
%2 = tail call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([3 x i8]* #.str1, i64 0, i64 0), i32 2) nounwind
ret i32 0
}
We remark that:
as predicted the sin computation has been resolved at compile-time
as predicted the exp and cos have been stripped completely.
as predicted the loop has been stripped too.
If you want to delve deeper into compiler optimizations I would encourage you to:
learn to read IR (it's incredibly easy, really, much more so that assembly)
use the LLVM Try Out page to test your assumptions
The compiler has a number of optimization passes. Every optimization pass is responsible for a number of small optimizations. For example, you may have a pass that calculates arithmetic expressions at compile time (so that you can express 5MB as 5 * (1024*1024) without a penalty, for example). Another pass inlines functions. Another searches for unreachable code and kills it. And so on.
The developers of the compiler then decide which of these passes they want to execute in which order. For example, suppose you have this code:
int foo(int a, int b) {
return a + b;
}
void bar() {
if (foo(1, 2) > 5)
std::cout << "foo is large\n";
}
If you run dead-code elimination on this, nothing happens. Similarly, if you run expression reduction, nothing happens. But the inliner might decide that foo is small enough to be inlined, so it substitutes the call in bar with the function body, replacing arguments:
void bar() {
if (1 + 2 > 5)
std::cout << "foo is large\n";
}
If you run expression reduction now, it will first decide that 1 + 2 is 3, and then decide that 3 > 5 is false. So you get:
void bar() {
if (false)
std::cout << "foo is large\n";
}
And now the dead-code elimination will see an if(false) and kill it, so the result is:
void bar() {
}
But now bar is suddenly very tiny, when it was larger and more complicated before. So if you run the inliner again, it would be able to inline bar into its callers. That may expose yet more optimization opportunities, and so on.
For compiler developers, this is a trade-off between compile time and generated code quality. They decide on a sequence of optimizers to run, based on heuristics, testing, and experience. But since one size does not fit all, they expose some knobs to tweak this. The primary knob for gcc and clang is the -O option family. -O1 runs a short list of optimizers; -O3 runs a much longer list containing more expensive optimizers, and repeats passes more often.
Aside from deciding which optimizers run, the options may also tweak internal heuristics used by the various passes. The inliner, for example, usually has lots of parameters that decide when it's worth inlining a function. Pass -O3, and those parameters will lean more towards inlining functions whenever there is a chance of improved performance; pass -Os, and the parameters will cause only really tiny functions (or functions provably called exactly once) to be inlined, as anything else would increase executable size.
The compilers does all sort of optimization that you event cannot think off. Especially the C++ compilers.
They do things like unrolling loops, make functions inline, eliminating dead code, replacing multiple instruction with just one and so on.
A piece of advice I can give is: In the C/C++ compilers you can have faith that they will perform a lot of optimizations.
Take a look at [1].
[1] http://en.wikipedia.org/wiki/Compiler_optimization

Would "if ... ASSERT" be removed in release build?

Sometimes I write code like
if (ptr)
ASSERT(ptr->member);
instead of
ASSERT(!ptr || ptr->member);
because it's more straightforward IMO. Would the redundant comparison remain in the release build?
I'd say that depends on your compiler.
In release mode, the ASSERT macro won't evaluate ptr->member and will resolve to a trivial expression that the compiler will optimize out, but the if statement and the associated comparison will remain as is.
However, if the compiler is smart enough to determine that the condition does not have any side effect, it might optimize the entire if statement away. Compiling to assembly (using the /FA option) would give you a definite answer.
As long as the compiler is not stupid, yes it would be trimmed.
Try writing this in the compiler:
if (x);
It gives you a warning that statement has no effect and like I said, if it is not stupid, it would remove the code.
If you want to be sure, you could compile it with your compiler and see the assembly.
LLVM removes it when optimization is required (by the user):
int main(int argc, char **argv) {
if (argc) {}
return 0;
}
Becomes:
define i32 #main(i32 %argc, i8** nocapture %argv) nounwind readnone {
ret i32 0
}