How to dyn_cast in LLVM to identify StoreInst? - c++

I'm trying to identify StoreInst. I read LLVM manual, and tried to use dyn_cast to do that. But the following program returns very weird result.
bool runOnFunction(Function &F) override{
for (const BasicBlock &BB : F){
for (const Instruction &I : BB){
const char *s = I.getOpcodeName();
std::string str(s);
errs()<<"at the instruction of "<<str<<"\n";
if (const StoreInst *SI = dyn_cast<StoreInst>(&I))
errs()<<"FOUND STORE\n";
}
}
return true;
}
The result is as followed. dyn_cast somehow returns true when the instruction is actually a CallInst. Anyone know why this happened? And how can I fix it? btw, I've installed some old versions of LLVM on the same machine, but think I compiled the pass under LLVM-7.0.0, and get the .ll file using clang-7.0.0 by clang-7 -O0 -S -emit-llvm HelloWorld.cpp. Would the previous installed versions affect this version? Thanks!!
at the instruction of call
at the instruction of call
at the instruction of ret
at the instruction of alloca
FOUND STORE
at the instruction of alloca
FOUND STORE
at the instruction of alloca
FOUND STORE
at the instruction of store
at the instruction of store
at the instruction of store
at the instruction of call
at the instruction of call
at the instruction of ret
at the instruction of call
at the instruction of ret

I suspect, since your pass is a FunctionPass, it runs in parallel on multiple functions, so your log messages are printed out of order.

Related

LLVM IR: Get LVALUE operand

I have following instruction:
%ptrA = getelementptr float, float addrspace(1)* %A, i32 %id
I can get the operands %A and %id using getOperand(0) and getOperand(1). I was wondering if getOperand will work on %ptrA? If yes, would it be getOperand(3)?
------------------------------------Edit----------------------------
So I changed my code as follows:
for (Instruction &I : instructions(F)){
if (cast<Operator>(I).getOpcode() == Instruction::GetElementPtr){
Value* AddrPointer = cast<Value>(I);
I keep getting error:
error: cannot convert ‘llvm::Value’ to ‘llvm::Value*’ in initialization
Value* AddrPointer = cast<Value>(I);
^
I see that there is some problem with type mismatch.
Thank you.
Your question lacks quite a bit of context, but I will assume you're working with an llvm::Instruction * representing that particular getelementptr instruction. No, getOperand() will not allow you to access %ptrA. In general, getOperand() only allows access to the instruction's operands, or arguments, but not its return value. In IR, %ptrA is not so much an operand of the instruction like in traditional assembly, but can be thought of more like the return value of the instruction.
The syntax for what you're trying to do is actually very convenient. An llvm::Instruction object itself represents its own return value. In fact, llvm::Instruction is a derived class of llvm::Value. You can use llvm::cast, with llvm::Value as the template argument and the result will actually be an llvm::Value * which represents the return value of getelementptr.
llvm::Instruction * instruc;
//next line assumes instruc has your getelementptr instruction
llvm::Value * returnval = llvm::cast<llvm::Value>(instruc);
//returnval now contains the result of the instruction
//you could potentially create new instructions with IRBuilder using returnval as an argument, and %ptrA would actually be passed as an operand to those instructions
Furthermore, many of the functions that actually create instructions (the llvm::IRBuilder::Create* instructions, for instance) don't even return llvm::Instruction *s but rather llvm::Value *s. This is very convenient, because most of the time if you need to feed the return value of an instruction into another instruction, you can simply pass the return value of whatever Create function you called into the next Create function, without needing to do any casting.

LLVM IR insertion

i am trying to insert a very simple instruction into my basic block by the code
Value *ten=ConstantInt::get(Type::getInt32Ty(con),10,true);
Instruction *newinst=new AllocaInst(Type::getInt32Ty(con),ten,"jst");
b->getInstList().push_back(newinst);
Instruction *add=BinaryOperator::Create(Instruction :: Add,ten,ten,"twenty");
b->getInstList().push_back(add);
it giving stack dump while i am running it on a very small file:
While deleting: i32 %
use still stuck around after Def is Destroyed : %twenty = add i32 10, 10
I'm rather new to LLVM, so I'll take any advice if this code makes no sense.
LLVM instruction constructors and Create factories accept either an Instruction to insert after, or a BasicBlock to insert at the end of. Do not use getInstList to do this.
Here are samples for AllocaInst:
AllocaInst (Type *Ty, Value *ArraySize=nullptr,
const Twine &Name="", Instruction *InsertBefore=nullptr)
AllocaInst (Type *Ty, Value *ArraySize,
const Twine &Name, BasicBlock *InsertAtEnd)

Unconventional Calls with Inline ASM

I'm working with a proprietary MCU that has a built-in library in metal (mask ROM). The compiler I'm using is clang, which uses GCC-like inline ASM. The issue I'm running into, is calling the library since the library does not have a consistent calling convention. While I found a solution, I've found that in some cases the compiler will make optimizations that clobber registers immediately before the call, I think there is just something wrong with how I'm doing things. Here is the code I'm using:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))(); //MASKROM_EchoByte is a 16-bit integer with the memory location of the function
}
Now this has the obvious problem that while the variable "asmHex" is asserted to register R1, the actual call does not use it and therefore the compiler "doesn't know" that R1 is reserved at the time of the call. I used the following code to eliminate this case:
int EchoByte()
{
register int asmHex __asm__ ("R1") = Hex;
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
((volatile void (*)(void))(MASKROM_EchoByte))();
asm volatile("//Assert Input to R1 for MASKROM_EchoByte"
:
:"r"(asmHex)
:"%R1");
}
This seems really ugly to me, and like there should be a better way. Also I'm worried that the compiler may do some nonsense in between, since the call itself has no indication that it needs the asmHex variable. Unfortunately, ((volatile void (*)(int))(MASKROM_EchoByte))(asmHex) does not work as it will follow the C-convention, which puts arguments into R2+ (R1 is reserved for scratching)
Note that changing the Mask ROM library is unfortunately impossible, and there are too many frequently used routines to recreate them all in C/C++.
Cheers, and thanks.
EDIT: I should note that while I could call the function in the ASM block, the compiler has an optimization for functions that are call-less, and by calling in assembly it looks like there's no call. I could go this route if there is some way of indicating that the inline ASM contains a function call, but otherwise the return address will likely get clobbered. I haven't been able to find a way to do this in any case.
Per the comments above:
The most conventional answer is that you should implement a stub function in assembly (in a .s file) that simply performs the wacky call for you. In ARM, this would look something like
// void EchoByte(int hex);
_EchoByte:
push {lr}
mov r1, r0 // move our first parameter into r1
bl _MASKROM_EchoByte
pop pc
Implement one of these stubs per mask-ROM routine, and you're done.
What's that? You have 500 mask-ROM routines and don't want to cut-and-paste so much code? Then add a level of indirection:
// typedef void MASKROM_Routine(int r1, ...);
// void GeneralPurposeStub(MASKROM_Routine *f, int arg, ...);
_GeneralPurposeStub:
bx r0
Call this stub by using the syntax GeneralPurposeStub(&MASKROM_EchoByte, hex). It'll work for any mask-ROM entry point that expects a parameter in r1. Any really wacky entry points will still need their own hand-coded assembly stubs.
But if you really, really, really must do this via inline assembly in a C function, then (as #JasonD pointed out) all you need to do is add the link register lr to the clobber list.
void EchoByte(int hex)
{
register int r1 asm("r1") = hex;
asm volatile(
"bl _MASKROM_EchoByte"
:
: "r"(r1)
: "r1", "lr" // Compare the codegen with and without this "lr"!
);
}

Direct C function call using GCC's inline assembly

If you want to call a C/C++ function from inline assembly, you can do something like this:
void callee() {}
void caller()
{
asm("call *%0" : : "r"(callee));
}
GCC will then emit code which looks like this:
movl $callee, %eax
call *%eax
This can be problematic since the indirect call will destroy the pipeline on older CPUs.
Since the address of callee is eventually a constant, one can imagine that it would be possible to use the i constraint. Quoting from the GCC online docs:
`i'
An immediate integer operand (one with constant value) is allowed. This
includes symbolic constants whose
values will be known only at assembly
time or later.
If I try to use it like this:
asm("call %0" : : "i"(callee));
I get the following error from the assembler:
Error: suffix or operands invalid for `call'
This is because GCC emits the code
call $callee
Instead of
call callee
So my question is whether it is possible to make GCC output the correct call.
I got the answer from GCC's mailing list:
asm("call %P0" : : "i"(callee)); // FIXME: missing clobbers
Now I just need to find out what %P0 actually means because it seems to be an undocumented feature...
Edit: After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case.
For this to be safe, you need to tell the compiler about all registers that the function call might modify, e.g. : "eax", "ecx", "edx", "xmm0", "xmm1", ..., "st(0)", "st(1)", ....
See Calling printf in extended inline ASM for a full x86-64 example of correctly and safely making a function call from inline asm.
Maybe I am missing something here, but
extern "C" void callee(void)
{
}
void caller(void)
{
asm("call callee\n");
}
should work fine. You need extern "C" so that the name won't be decorated based on C++ naming mangling rules.
If you're generating 32-bit code (e.g. -m32 gcc option), the following asm inline emits a direct call:
asm ("call %0" :: "m" (callee));
The trick is string literal concatenation. Before GCC starts trying to get any real meaning from your code it will concatenate adjacent string literals, so even though assembly strings aren't the same as other strings you use in your program they should be concatenated if you do:
#define ASM_CALL(X) asm("\t call " X "\n")
int main(void) {
ASM_CALL( "my_function" );
return 0;
}
Since you are using GCC you could also do
#define ASM_CALL(X) asm("\t call " #X "\n")
int main(void) {
ASM_CALL(my_function);
return 0;
}
If you don't already know you should be aware that calling things from inline assembly is very tricky. When the compiler generates its own calls to other functions it includes code to set up and restore things before and after the call. It doesn't know that it should be doing any of this for your call, though. You will have to either include that yourself (very tricky to get right and may break with a compiler upgrade or compilation flags) or ensure that your function is written in such a way that it does not appear to have changed any registers or condition of the stack (or variable on it).
edit this will only work for C function names -- not C++ as they are mangled.

Using bts assembly instruction with gcc compiler

I want to use the bts and bt x86 assembly instructions to speed up bit operations in my C++ code on the Mac. On Windows, the _bittestandset and _bittest intrinsics work well, and provide significant performance gains. On the Mac, the gcc compiler doesn't seem to support those, so I'm trying to do it directly in assembler instead.
Here's my C++ code (note that 'bit' can be >= 32):
typedef unsigned long LongWord;
#define DivLongWord(w) ((unsigned)w >> 5)
#define ModLongWord(w) ((unsigned)w & (32-1))
inline void SetBit(LongWord array[], const int bit)
{
array[DivLongWord(bit)] |= 1 << ModLongWord(bit);
}
inline bool TestBit(const LongWord array[], const int bit)
{
return (array[DivLongWord(bit)] & (1 << ModLongWord(bit))) != 0;
}
The following assembler code works, but is not optimal, as the compiler can't optimize register allocation:
inline void SetBit(LongWord* array, const int bit)
{
__asm {
mov eax, bit
mov ecx, array
bts [ecx], eax
}
}
Question: How do I get the compiler to fully optimize around the bts instruction? And how do I replace TestBit by a bt instruction?
BTS (and the other BT* insns) with a memory destination are slow. (>10 uops on Intel). You'll probably get faster code from doing the address math to find the right byte, and loading it into a register. Then you can do the BT / BTS with a register destination and store the result.
Or maybe shift a 1 to the right position and use OR with with a memory destination for SetBit, or AND with a memory source for TestBit. Of course, if you avoid inline asm, the compiler can inline TestBit and use TEST instead of AND, which is useful on some CPUs (since it can macro-fuse into a test-and-branch on more CPUs than AND).
This is in fact what gcc 5.2 generates from your C source (memory-dest OR or TEST). Looks optimal to me (fewer uops than a memory-dest bt). Actually, note that your code is broken because it assumes unsigned long is 32 bits, not CHAR_BIT * sizeof(unsigned_long). Using uint32_t, or char, would be a much better plan. Note the sign-extension of eax into rax with the cqde instruction, due to the badly-written C which uses 1 instead of 1UL.
Also note that inline asm can't return the flags as a result (except with a new-in-gcc v6 extension!), so using inline asm for TestBit would probably result in terrible code code like:
... ; inline asm
bt reg, reg
setc al ; end of inline asm
test al, al ; compiler-generated
jz bit_was_zero
Modern compilers can and do use BT when appropriate (with a register destination). End result: your C probably compiles to faster code than what you're suggesting doing with inline asm. It would be even faster after being bugfixed to be correct and 64bit-clean. If you were optimizing for code size, and willing to pay a significant speed penalty, forcing use of bts could work, but bt probably still won't work well (because the result goes into the flags).
inline void SetBit(*array, bit) {
asm("bts %1,%0" : "+m" (*array) : "r" (bit));
}
This version efficiently returns the carry flag (via the gcc-v6 extension mentioned by Peter in the top answer) for a subsequent test instruction. It only supports a register operand since use of a memory operand is very slow as he said:
int variable_test_and_set_bit64(unsigned long long &n, const unsigned long long bit) {
int oldbit;
asm("bts %2,%0"
: "+r" (n), "=#ccc" (oldbit)
: "r" (bit));
return oldbit;
}
Use in code is then like so. The wasSet variable is optimized away and the produced assembly will have bts followed immediately by jb instruction checking the carry flag.
unsigned long long flags = *(memoryaddress);
unsigned long long bitToTest = someOtherVariable;
int wasSet = variable_test_and_set_bit64(flags, bitToTest);
if(!wasSet) {
*(memoryaddress) = flags;
}
Although it seems a bit contrived, this does save me several instructions vs the "1ULL << bitToTest" version.
Another slightly indirect answer, GCC exposes a number of atomic operations starting with version 4.1.