How to write a custom intermodular pass in LLVM? - llvm

I've written a standard Analysis pass in LLVM, by extending the FunctionPass class. Everything seems to make sense.
Now what I'd like to do is write a couple of intermodular passes, that is, passes that allows me to analyze more than one module at a time. The purpose of one such pass is to construct a call graph of the entire application. The purpose of the other such pass is that I have an idea for an optimization involving function calls and their parameters.
I know about interprocedural passes in LLVM, via extending the ModulePass class, but that only allows analysis within a single module.
I know about link time optimization (LTO) in LLVM, but (a) I'm not quite clear if this is what I want and (b) I've found no examples or documentation on how to actually write an LTO pass.
How can I write an intermodular pass, i.e., a pass that has access to all the modules in an application, in LLVM?

I found one way to achieve my goal: write a simple program that uses llvm::parseBitcodeFile() to read in a bitcode file and create a Module object that can be traversed and analyzed. It's not ideal, because it's not a Pass that can be run within the LLVM framework. However, it is a way to achieve my goal of analyzing multiple modules at once.
For future readers, here's what I did.
Create a simple tool to read in a bitcode file and produce a Module
//ReadBitcode.cpp
#include <iostream>
#include "llvm/IR/Module.h"
#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/SourceMgr.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/Bitcode/ReaderWriter.h"
using namespace llvm;
int main(int argc, char *argv[])
{
if (argc != 2)
{
std::cerr << "Usage: " << argv[0] << " bitcode_filename" << std::endl;
return 1;
}
StringRef filename = argv[1];
LLVMContext context;
ErrorOr<std::unique_ptr<MemoryBuffer>> fileOrErr = MemoryBuffer::getFileOrSTDIN(filename);
if (std::error_code ec = fileOrErr.getError())
{
std::cerr << "Error opening input file: " + ec.message() << std::endl;
return 2;
}
ErrorOr<llvm::Module *> moduleOrErr = parseBitcodeFile(fileOrErr.get()->getMemBufferRef(), context);
if (std::error_code ec = fileOrErr.getError())
{
std::cerr << "Error reading Module: " + ec.message() << std::endl;
return 3;
}
Module *m = moduleOrErr.get();
std::cout << "Successfully read Module:" << std::endl;
std::cout << " Name: " << m->getName().str() << std::endl;
std::cout << " Target triple: " << m->getTargetTriple() << std::endl;
for (auto iter1 = m->getFunctionList().begin(); iter1 != m->getFunctionList().end(); iter1++)
{
Function &f = *iter1;
std::cout << " Function: " << f.getName().str() << std::endl;
for (auto iter2 = f.getBasicBlockList().begin(); iter2 != f.getBasicBlockList().end();
iter2++)
{
BasicBlock &bb = *iter2;
std::cout << " BasicBlock: " << bb.getName().str() << std::endl;
for (auto iter3 = bb.begin(); iter3 != bb.end(); iter3++)
{
Instruction &i = *iter3;
std::cout << " Instruction: " << i.getOpcodeName() << std::endl;
}
}
}
return 0;
}
Compile the tool
$ clang++ ReadBitcode.cpp -o reader `llvm-config --cxxflags --libs --ldflags --system-libs`
Create a bitcode file to analyze
$ cat foo.c
int my_fun(int arg1){
int x = arg1;
return x+1;
}
int main(){
int a = 11;
int b = 22;
int c = 33;
int d = 44;
if (a > 10){
b = c;
} else {
b = my_fun(d);
}
return b;
}
$ clang -emit-llvm -o foo.bc -c foo.c
Run the reader tool on the bitcode
$ ./reader foo.bc
Successfully read Module:
Name: foo.bc
Target triple: x86_64-pc-linux-gnu
Function: my_fun
BasicBlock:
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: load
Instruction: store
Instruction: load
Instruction: add
Instruction: ret
Function: main
BasicBlock:
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: load
Instruction: icmp
Instruction: br
BasicBlock:
Instruction: load
Instruction: store
Instruction: br
BasicBlock:
Instruction: load
Instruction: call
Instruction: store
Instruction: br
BasicBlock:
Instruction: load
Instruction: ret

This can be done using a module pass. Below is my code, and if you need help running it you can look here.
bar.c
int your_fun(int arg2) {
int x = arg2;
return x+2;
}
Skeleton.cpp
#include "llvm/Pass.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
using namespace llvm;
namespace {
struct SkeletonPass : public ModulePass {
static char ID;
SkeletonPass() : ModulePass(ID) {}
virtual bool runOnModule(Module &M) {
for (auto& F : M) {
errs() << "\tFunction: " << F.getName() << "\n";
for (auto& BB : F) {
errs() << "\t\tBasic Block: " << BB.getName() << "\n";
for (auto& I : BB) {
errs() << "\t\t\tInstruction: " << I.getOpcodeName() << "\n";
}
}
}
return false;
}
};
}
char SkeletonPass::ID = 0;
// Automatically enable the pass.
// http://adriansampson.net/blog/clangpass.html
static void registerSkeletonPass(const PassManagerBuilder &,
legacy::PassManagerBase &PM) {
PM.add(new SkeletonPass());
}
static RegisterStandardPasses RegisterMyPass(PassManagerBuilder::EP_ModuleOptimizerEarly,
registerSkeletonPass);
static RegisterStandardPasses RegisterMyPass1(PassManagerBuilder::EP_EnabledOnOptLevel0,
registerSkeletonPass);
Output:
| => clang -Xclang -load -Xclang build/skeleton/libSkeletonPass.so foo.c bar.c
Module: foo.c!
Function: my_fun!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: load
Instruction: store
Instruction: load
Instruction: add
Instruction: ret
Function: main!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: load
Instruction: icmp
Instruction: br
Basicblock: if.then!
Instruction: load
Instruction: store
Instruction: br
Basicblock: if.else!
Instruction: load
Instruction: call
Instruction: store
Instruction: br
Basicblock: if.end!
Instruction: load
Instruction: ret
Module: bar.c!
Function: your_fun!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: load
Instruction: store
Instruction: load
Instruction: add
Instruction: ret
Output: If you include header file linking to bar.c
Module: foo.c!
Function: your_fun!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: load
Instruction: store
Instruction: load
Instruction: add
Instruction: ret
Function: my_fun!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: load
Instruction: store
Instruction: load
Instruction: add
Instruction: ret
Function: main!
Basicblock: entry!
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: alloca
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: store
Instruction: load
Instruction: icmp
Instruction: br
Basicblock: if.then!
Instruction: load
Instruction: store
Instruction: br
Basicblock: if.else!
Instruction: load
Instruction: call
Instruction: store
Instruction: load
Instruction: call
Instruction: store
Instruction: br
Basicblock: if.end!
Instruction: load
Instruction: ret

In LTO all the modules are combined and you can see the whole program IR in one module.
You need to write a module pass like any module pass and add it to the list of LTO passes in populateLTOPassManager function in PassManagerBuilder.cpp. Here is the doc for PassManagerBuilder:
http://llvm.org/docs/doxygen/html/classllvm_1_1PassManagerBuilder.html
When you do this, your pass will be executed with other LTO passes.

Related

C++ lambda "__closure" address in gdb

I understand that there's a hidden closure class in C++11 or above for lambdas. But
when is the value of __closure when debugging in gdb 0x0 when calling a lambda? What does that mean?
// a.cpp
#include <iostream>
using std::cout;
int main() {
auto l1 = []() {
cout << "lambda 1\n";
};
auto l2 = []() {
cout << "lambda 2\n";
};
l1();
l2();
using F = void (*)();
F lv = l2;
lv();
}
In gdb:
Breakpoint 1, <lambda()>::operator()(void) const (
__closure=0x7fffffffd83e) at a.cpp:7
7 cout << "lambda 1\n";
(gdb) c
Continuing.
lambda 1
Breakpoint 2, <lambda()>::operator()(void) const (
__closure=0x7fffffffd83f) at a.cpp:10
10 cout << "lambda 2\n";
(gdb) c
Continuing.
lambda 2
Breakpoint 2, <lambda()>::operator()(void) const (__closure=0x0)
at a.cpp:10
10 cout << "lambda 2\n";
(gdb) c
Continuing.
lambda 2
[Inferior 1 (process 27152) exited normally]
g++ & gdb:
$ g++ --version
g++ (GCC) 8.2.0
$ g++ -g -O0 -std=c++17 a.cpp -o a
$ gdb --version
GNU gdb (GDB) 8.1.1

Why tcmalloc don't print function name, which provided via dlopen

I have next some project:
main.cpp
#include <iostream>
#include <cstddef>
#include <dlfcn.h>
int main()
{
void* handle = dlopen("./shared_libs/libshared.so", RTLD_LAZY);
if (NULL == handle)
{
std::cerr << "Cannot open library: " << dlerror() << '\n';
return -1;
}
typedef int (*foo_t)(const std::size_t);
foo_t foo = reinterpret_cast<foo_t>(dlsym(handle, "foo"));
const char* dlsym_error = dlerror();
if (dlsym_error)
{
std::cerr << "Cannot load symbol 'foo': " << dlsym_error << '\n';
dlclose(handle);
return -2;
}
std::cout << "call foo" << std::endl;
foo(10);
dlclose(handle);
return 0;
}
shared.cpp:
#include <cstddef>
#include <iostream>
extern "C"
{
int foo(const std::size_t size)
{
int b = size / size;
int* a = new int[size];
std::cout << "leaky code here" << std::endl;
}
}
and Makefile:
all:
g++ -fPIC -g -c shared.cpp
g++ -shared -o shared_libs/libshared.so -g shared.o
g++ -L shared_libs/ -g main.cpp -ldl
I use tcmalloc for debug this test program, which load dynamically libshared.so:foo and execute it.run command:
LD_PRELOAD=/usr/local/lib/libtcmalloc.so HEAPCHECK=normal ./a.out
The 1 largest leaks:
Using local file ./a.out.
Leak of 40 bytes in 1 objects allocated from:
# 7fe3460bd9ba 0x00007fe3460bd9ba
# 400b43 main
# 7fe346c33ec5 __libc_start_main
# 400999 _start
# 0 _init
Why I get address 0x00007fe3460bd9ba instead of line in foo function?
please help
P.s. I tried to use gdb with LD_PRELOAD=.../tcmalloc.so, but I get:
"Someone is ptrace()ing us; will turn itself off Turning perftools heap leak checking off"
Try removing dlclose call.
It's known issue that heap checker & profilers can't handle unloaded
shared objects.

LLVM ParseIR Segfault

I'm trying to compile a function ("fun") to LLVM IR and create a Module using the ParseIR function. The program segfaults at the call to ParseIR. I'm using LLVM 3.5 and the code is below.
#include <cstdio>
#include <iostream>
#include <sstream>
#include <string>
#include "llvm/ADT/StringRef.h"
#include "llvm/Bitcode/ReaderWriter.h"
#include "llvm/ExecutionEngine/ExecutionEngine.h"
#include "llvm/ExecutionEngine/MCJIT.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"
#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/raw_os_ostream.h"
using std::cout;
using std::endl;
using std::ostringstream;
using std::string;
using llvm::getGlobalContext;
using llvm::LLVMContext;
using llvm::MemoryBuffer;
using llvm::Module;
using llvm::ParseIR;
using llvm::SMDiagnostic;
using llvm::StringRef;
int main() {
string fun = "int fun(int x) {return x;}\n";
string cmd = "echo '" + fun + "' |"
+ " clang++ -cc1 -xc++ -O0 -std=c++1y -fno-use-cxa-atexit"
+ " -I/usr/local/lib/clang/3.5.0/include/"
+ " -I/usr/include/c++/4.9/"
+ " -I/usr/include/x86_64-linux-gnu/c++/4.9/bits/"
+ " -I/usr/include/x86_64-linux-gnu -I/usr/include/"
+ " -I/usr/include/x86_64-linux-gnu/c++/4.9/"
+ " -I/usr/lib/gcc/x86_64-linux-gnu/4.9/include/"
+ " -I/usr/include/c++/4.9/backward/"
+ " -I/usr/lib/gcc/x86_64-linux-gnu/4.9/include-fixed/"
+ " -stdlib=libstdc++ -S -emit-llvm"
+ " -o /dev/stdout 2> /dev/stdout";
FILE *file = popen(cmd.c_str(), "r");
ostringstream llvm;
char line[1024];
while (fgets(line, 1024, file))
llvm << line;
pclose(file);
LLVMInitializeNativeTarget();
LLVMInitializeNativeAsmPrinter();
LLVMContext &ctx = getGlobalContext();
SMDiagnostic *err;
StringRef ref = StringRef(llvm.str().c_str());
MemoryBuffer *buff = MemoryBuffer::getMemBuffer(ref);
cout << "***** C++ *****\n" << fun << "\n"
<< "***** LLVM *****\n" << llvm.str() << endl;
//segfault
Module *mod = ParseIR(buff, *err, ctx);
return 0;
}
I compiled and ran the above code using the following command:
g++ -std=c++14 fun.cpp -o fun `llvm-config --cxxflags --ldflags --libs --system-libs`
./fun
***** C++ *****
int fun(int x) {return x;}
***** LLVM *****
; ModuleID = '-'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind
define i32 #_Z3funi(i32 %x) #0 {
%1 = alloca i32, align 4
store i32 %x, i32* %1, align 4
%2 = load i32* %1, align 4
ret i32 %2
}
attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5.1 (branches/release_35 225591)"}
Segmentation fault (core dumped)
You do not initialize your SMDiagnostic* but dereference it in the call to ParseIR. That is why your program seg faults. See the comments in the code for a fix.
LLVMInitializeNativeTarget();
LLVMInitializeNativeAsmPrinter();
LLVMContext &ctx = getGlobalContext();
SMDiagnostic err; // create an SMDiagnostic instance
MemoryBuffer* buff = MemoryBuffer::getMemBuffer(llvm.str());
cout << "***** C++ *****\n" << fun << "\n"
<< "***** LLVM *****\n" << llvm.str() << endl;
Module *mod = ParseIR(buff, err, ctx); // use err directly

Lua shared object loading with C++ segfaults

For a project I'm writing I need to write a custom Lua module loading system, and I've done it before on my Raspberry Pi, but not on my Mac. The problem is that as soon as I try to access the lua_State in the shared object, the program segfaults.
main.cpp
#include <lua.hpp>
#include <dlfcn.h>
#include <iostream>
typedef void Register(lua_State*);
int main(){
lua_State* L = luaL_newstate();
void* lib = dlopen("module.so", RTLD_NOW);
if(!lib){
std::cerr << "Error opening module \"" << "\": " << dlerror() << std::endl;
return;
}
Register* loadFunc = (Register*)dlsym(lib, "RegisterModule");
if(!loadFunc){
std::cerr << "Error loading symbols from module \"" << "\": " << dlerror() << std::endl;
return;
}
loadFunc(L);
for(;;){}
return 1;
}
module.cpp
#include <lua.hpp>
#include <iostream>
static int Foo(lua_State* L){
std::cout << "Hello World!" << std::endl;
}
extern "C" void RegisterModule(lua_State* L){
lua_pushcfunction(L, Foo);
lua_setglobal(L, "Foo");
}
Makefile
lua = -L /usr/lib/lua5.2 -I /usr/include/lua5.2 -llua
luaHeaders = -I /usr/include/lua5.2
all: main module.so
rm -f main.o
main: main.o
clang++ main.o -o main $(lua) -ldl
main.o: main.cpp
clang++ -c main.cpp $(luaHeaders)
module.so: module.cpp
clang++ -fPIC -shared module.cpp -o module.so $(lua)
My setup is:
Mac OS X 10.9 Mavericks, and Elementary OS Luna
Lua 5.2
Clang
Output from the debugger (lldb)
Process 19943 stopped
* thread #2: tid = 0x23ec1c, 0x0000000100295c31 myModule.so`luaH_newkey + 913, stop reason = EXC_BAD_ACCESS (code=2, address=0x100073db0)
frame #0: 0x0000000100295c31 myModule.so`luaH_newkey + 913
myModule.so`luaH_newkey + 913:
-> 0x100295c31: movq %rax, 16(%r12)
0x100295c36: movl 8(%rbx), %eax
0x100295c39: movl %eax, 24(%r12)
0x100295c3e: testb $64, 8(%rbx)

Cancelling pthread_cond_wait() hangs with PRIO_INHERIT mutex

Update, 4/10 2012:
Fixed by libc patch
I have a problem canceling threads in pthread_cond_wait, that use mutexes with the PTHREAD_PRIO_INHERIT attribute set. This only happens on certain platforms though.
The following minimal example demonstrates this: (compile with g++ <filename>.cpp -lpthread)
#include <pthread.h>
#include <iostream>
pthread_mutex_t mutex;
pthread_cond_t cond;
void clean(void *arg) {
std::cout << "clean: Unlocking mutex..." << std::endl;
pthread_mutex_unlock((pthread_mutex_t*)arg);
std::cout << "clean: Mutex unlocked..." << std::endl;
}
void *threadFunc(void *arg) {
int ret = 0;
pthread_mutexattr_t mutexAttr;
ret = pthread_mutexattr_init(&mutexAttr); std::cout << "ret = " << ret << std::endl;
//Comment out the following line, and everything works
ret = pthread_mutexattr_setprotocol(&mutexAttr, PTHREAD_PRIO_INHERIT); std::cout << "ret = " << ret << std::endl;
ret = pthread_mutex_init(&mutex, &mutexAttr); std::cout << "ret = " << ret << std::endl;
ret = pthread_cond_init(&cond, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "threadFunc: Init done, entering wait..." << std::endl;
pthread_cleanup_push(clean, (void *) &mutex);
ret = pthread_mutex_lock(&mutex); std::cout << "ret = " << ret << std::endl;
while(1) {
ret = pthread_cond_wait(&cond, &mutex); std::cout << "ret = " << ret << std::endl;
}
pthread_cleanup_pop(1);
return 0;
}
int main() {
pthread_t thread;
int ret = 0;
ret = pthread_create(&thread, 0, threadFunc, 0); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Thread created, waiting a bit..." << std::endl;
sleep(2);
std::cout << "main: Cancelling threadFunc..." << std::endl;
ret = pthread_cancel(thread); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joining threadFunc..." << std::endl;
ret = pthread_join(thread, NULL); std::cout << "ret = " << ret << std::endl;
std::cout << "main: Joined threadFunc, done!" << std::endl;
return 0;
}
Every time I run it, main() hangs on pthread_join(). A gdb backtrace shows the following:
Thread 2 (Thread 0xb7d15b70 (LWP 257)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fcf362 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xb7fcc9f9 in __condvar_w_cleanup () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:434
#3 0x08048fbe in threadFunc (arg=0x0) at /home/pthread_cond_wait.cpp:22
#4 0xb7fc8ca0 in start_thread (arg=0xb7d15b70) at pthread_create.c:301
#5 0xb7de73ae in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
Thread 1 (Thread 0xb7d166d0 (LWP 254)):
#0 0xb7fde430 in __kernel_vsyscall ()
#1 0xb7fc9d64 in pthread_join (threadid=3083950960, thread_return=0x0) at pthread_join.c:89
#2 0x0804914a in main () at /home/pthread_cond_wait.cpp:41
If PTHREAD_PRIO_INHERIT isn't set on the mutex, everything works as it should, and the program exits cleanly.
Platforms with problems:
Embedded AMD Fusion board, running a PTXDist based 32-bit Linux 3.2.9-rt16 (with RTpatch 16). We are using the newest OSELAS i686 cross toolchain (2011.11.1), using gcc 4.6.2, glibc 2.14.1, binutils 2.21.1a, kernel 2.6.39.
Same board with the 2011.03.1 toolchain also (gcc 4.5.2 / glibc 2.13 / binutils 2.18 / kernel 2.6.36).
Platforms with no problems:
Our own ARM-board, also running a PTXDist Linux (32-bit 2.6.29.6-rt23), using OSELAS arm-v4t cross toolchain (1.99.3) with gcc 4.3.2 / glibc 2.8 / binutils 2.18 / kernel 2.6.27.
My laptop (Intel Core i7), running 64-bit Ubuntu 11.04 (virtualized / kernel 2.6.38.15-generic), gcc 4.5.2 / eglibc 2.13-0ubuntu13.1 / binutils 2.21.0.20110327.
I have been looking around the net for solutions, and have come across a few patches that I've tried, but without any effect:
Making the condition variables priority inheritance aware.
Handling EAGAIN from FUTEX_WAIT_REQUEUE_PI
Are we doing something wrong in our code, which just happens to work on certain platforms, or is this a bug in the underlying systems? If anyone has any idea about where to look, or knows of any patches or similar to try out, I'd be happy to hear about it.
Thanks!
Updates:
libc-help mailing list discussion
glibc bug report
This has been fixed by a libc patch.
I've confirmed it to work on my own problematic platform (our custom AMD Fusion board), patched onto glibc-2.14.1.
Thanks go out to Siddhesh Poyarekar for the fix!