LLVM IR Loading type name mangling - c++

I'm having a hard time fixing this problem with loading LLVM-IR modules.
First of all here is the problem.
I have this LLVM-IR File with this instruction:
%7 = getelementptr inbounds %struct.hoge, %struct.hoge* %6, i32 0, i32 0, !dbg !34
Now, when I load it using my tool, it turns to this:
%7 = getelementptr inbounds %struct.hoge.0, %struct.hoge.0* %6, i32 0, i32 0, !dbg !34
My goal is to get rid of this .0 at the end of all struct types.
Here is some code,
Module loading:
Module *FPR_::load(const path &input) const {
// Construct module
SMDiagnostic err;
unique_ptr<Module> module{parseIRFile(input.string(), err, ctx, true, "")};
...
Module printing:
ofstream ofs("hoge.ll");
string str;
raw_string_ostream rso(str);
module->print(rso, NULL, true, true);
ofs << str;
Now, let me tell you what I've done so far.
I've debugged my instrumentation tool and found out that this .0 gets added when the module is LOADED not PRINTED. I found this out because my tool outputs some instructions for debug purposed and they already have the .0s right after loading.
I've tried to load and print the same module multiple times and this is the result.
%7 = getelementptr inbounds %struct.hoge.0.0.0.0.0.0, %struct.hoge.0.0.0.0.0.0* %6, i32 0, i32 0, !dbg !34
Which would be funny if this isn't stopping me from developing my tool.
Verified that this does not happen with the opt tool.
[user#host]opt hoge.ll -o foo.ll
[user#host]diff hoge.ll foo.ll
[user#host]
Which means there must be something I'm missing that opt does.
Read opt's source code.
Some of the code felt like they mattered. And here is what I've ended up with.
LLVMContext ctx;
ctx.setDiscardValueNames(false);
ctx.enableDebugTypeODRUniquing();
FPR_::FPR_ FPR_(path(src_ll.getValue()), path(dep_ll.getValue()), ctx, options); // Module gets loaded in this constructor.
The options in the Context is all I found that opt seemed to do different. But no results so far.
Frankly, I'm out of ideas on how to fix this. So any help would be greatly appreciated.
And as a side note, I know that it is recommended to use the opt tool itself for static analysis or instrumentation.
But the problem is, the tool I am trying to make uses the static analysis for dynamic analysis as well. So I do not want my program to end when it outputs the instrumented LLVM-IR.
I could do some dirty hack and not return from the runOnModule method and just print the Module from there. But I do want to avoid that if possible.
Thank you in advance!

Short Answer:
Do not use the same LLVMContext to read modules if they are NOT supposed to be in the same program.
Long Answer:
By reading multiple Modules with the same LLVMContext, you imply that those Modules are supposed to be within the same program.
So what happens is that, when there are multiple Modules that have the same type name , they get indexed in order to avoid conflict.
Thank you for all the help in the comments. :)

Related

LLVM, how to get variable name?

I've read many posts here and on other sites, but still didn't get clear answer.
Let's say I have instruction and I can print it out by
errs() << inst << "\n";
and I got:
%9 = add nsw i32 %7, %8
I want to get variables names behind %9, %7, %8.
Is it possible? and if it is, how?
Thank you :)
From here How to save the variable name when use clang to generate llvm ir?
clang -fno-discard-value-names <your-command-line>
does the job :)

LLVM traverse the IR and look for all calls to #llvm.dbg.declare

I am writing an LLVM pass that bookeeps the number of declared variables in an openCL Kernel. To do so, I need to enable debugging information and access the #llvm.dbg.declare info.
I iterate through all the instructions of a function and I use the isa<CallInst> template to identify the calling instructions.
Now there are two cases here, I can have either
call void #llvm.dbg.declare(metadata float addrspace(1)** %4, metadata !20, metadata !DIExpression()), !dbg !21
or
%6 = call i32 #get_global_id(i32 0), !dbg !25
How can I check that a CallInst has metadata associated with it, that is, it has #llvm.dbg.declare inside and then how do I extract the name of the variable declaration (I suspect via a getOperand() method ?

Tracing global variables use in llvm

I am new at LLVM and trying to write a custom analysis pass. Below is a llvm-ir snippet from my module.
1 #my_string = common global i8* null, align 8
2 %tmp1 = load i8*, i8** #my_string, align 8
3 call void #copy_string(i8* %tmp1, i8* %tmp2, i8* %tmp3)
I wish to make check, if one of the function parameters passed is a global variable or not? For instance in above example code, I wish to check if %tmp1 is a global variable or not?
Could you suggest the best way to achieve this?
Thanks in advance.

Using global variables in JIT yields garbage result

I'm using the llvm-c API and want to use the JIT. I've created the following module
; ModuleID = '_tmp'
#a = global i64 5
define i64 #__tempfunc() {
entry:
%a_val = load i64* #a
ret i64 %a_val
}
This output is generated by LLVMDumpModule just before I call LLVMRunFunction. Which yields a LLVMGenericValueRef. However converting the result to a 64bit integer via LLVMGenericValueToInt(gv, true), it results in 360287970189639680 or something similar - not 5. Converting via LLVMGenericValueToInt(gv, false) didn't help either.
How can I use global variables in a JIT situation? Is anything wrong with the IR?
Edit: Well, i figured out that it has to do with the datalayout, since 360287970189639680 actually is 0x50...0. So I'd like to change the question to "How do I set the correct datalayout for a module? I've tried: LLVMSetDataLayout(mod, "x86_64-pc-linux") which aborts my program.
The data layout format is described in http://llvm.org/docs/LangRef.html#data-layout. And it's certainly not a target triple. Best, if you would simply feed dummy .c file to clang for your target, compile via -S -emit-llvm and grab the full data layout string from there.

LLVM get operand and lvalue name of an instruction

For a LLVM IR instruction like %cmp7 = icmp eq i32 %6 %7 I want to get all three register/symbol names (i.e. %cmp %6 and %7)
Now I can get string %cmp by command pi->getName() where pi is Instruction pointer. But when I try to get oprand names I got empty string by typing pi->getOperand(0)->getName().
I tried isa<Instruction>(pi->getOperand(0)) to check whether this is an instruction and it returned true but pi->getOperand(0)->hasName() returned false. Things make me feeling strange is that why both pi and pi->getOperand(0) are instructions but only pi has name?
Is there any thoughts I can get operand name (string %6 and %7 here)by using API?
LLVM version I'm using is 3.4.2
Names are optional for LLVM instructions, and indeed the two operands of your icmp instruction in this case don't have a name, hence the empty string.
When you print an LLVM module to an .ll file then the writer allocates a %<num> name for each instruction to make it human-readable, but this is only something the writer does during printing, that string does not exist in the actual module.