Create debug location for function calls in LLVM function pass - llvm

I have created an optimization (function) pass that instruments specific instructions and creates function calls before target instructions. It works fine, but I cannot enable debug symbols (-g) due to not having a debug location for my custom function calls.
i8* %381 = call i8* #my_function(i64* %375)
inlinable function call in a function with debug info must have a !dbg location
How can I create a debug location for a custom function call (e.g., my_function) in an LLVM optimization pass?

That limitation only applies to inlinable function calls. If your function isn't inlinable, you can mark it as such, my_function->addAttribute(AttributeList::FunctionIndex, Attribute::NoInline); and avoid the problem.

Related

Linking to external functions in dynamic libraries with LLVM

In my project, I am emitting LLVM IR which makes calls to external functions in dynamic libraries.
I declare my external functions like:
declare %"my_type"* #"my_function"()
In the external library, functions are declared like:
extern "C" {
my_type* my_function();
}
When I compile the IR and run it, the process immediately crashes. The same behavior happens if I declare and call a nonsense function that I know doesn't exist, so I assume what's happening is that the external function is not being found/linked. (I don't think that the function itself is crashing).
I am using Python's llvmlite library for this task, and within the same process where I JIT and invoke my LLVM IR, I have another python library imported which requires the external dynamic library; so I assume that library is loaded and in-memory.
The procedure I'm using to compile and execute my LLVM code is basically the same as what's in this document, except that the IR declares and invokes an external function. I have tried invoking cos(), as in the Kaleidoscope tutorial, and this succeeds, so I am not sure what is different about my own library functions.
I have tried adding an underscore to the beginning of the function name, but I get the same result. (Do I need to add the underscore in the LLVM function declaration?)
How do I verify my assumption that the process is crashing because the named function isn't found?
How do I diagnose why the function isn't being found?
What do I need to do in order to make use of external functions in a dynamic library, from LLVM code?
Edit: It seems there is indeed trouble getting the function pointers to my external function. If I try to merely print the function address by replacing my calls with %"foo" = ptrtoint %"my_type"* ()* #"my_function" to i64 and return/print the result, it still segfaults. Merely trying to obtain the pointer is enough to cause a crash! Why is this, and how do I fix it?
Edit: Also forgot to mention— this is on Ubuntu (in a Docker container, on OSX).
I figured it out— I was missing that I need to call llvmlite.binding.load_library_permanently(filename) in order for the external symbols to become available. Even though the library is already in memory, the function still needs to be invoked. (This corresponds to the LLVM native function llvm::sys::DynamicLibrary::LoadLibraryPermanently()).
From this answer, it appears calling the above function with nullptr will import all the symbols available to the process.
Oddly, on OSX, I found that the external symbols were available even though load_library_permanently() had not been explicitly called— I'm not sure why this is (perhaps the OSX build of llvmlite itself happened to call the function with nullptr, as above?).

LLVM Pass: to change the function call's argument values

part of my project, based on some analysis, I have to change the function call's arguments. I am doing it in the llvm-ir level. something like this,
doWork("work",functionBefore)
based on my results my llvm-pass should be able to transform the function pointer passed to the function call like this
doWork("work",functionAfter)
assume both functionBefore and functionAfter have the same return type.
Is it possible to change the arguments using llvm pass?
Or should i delete the instruction and recreate the one I needed?
Please give some suggestions or directions how to do this ?
llvm ir to the call the function would be something like this-
invoke void #_Z7processNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPFvS4_E(%"c lass.std::__cxx11::basic_string"* nonnull %1, void (%"class.std::__cxx11::basic_string"*)* nonnull #_Z9functionBNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE) to label %7 unwind label %13

Clang does not inline calls having pointer casts (indirect function calls)

I was trying to inline functions in llvm using this command:
opt -inline -inline-threshold=1000000 a.bc -o a.inline.bc
The (indirect) function calls involving pointer casts were not been able to inline. For eg.
%call4 = call i32 (...)* bitcast (i32 (%struct.token_type*)* #print_token to i32 (...)*)(%struct.token_type* %5)
But the functions calls like the one below are being inlined:
%call49 = call i32 #special(i32 %43)
Can I inline all the function calls irrespective of the fact whether they are direct or indirect??
Thanks!
You can't inline something if you don't know what it is, and a function pointer that is assigned at run time can not be know at any point during the build process... If it is defined in such a way as to be reassign-able then it couldn't be possibly inlined... Calling code could be inlined, but calls to function pointers can't be....
It is possible that there are some scenarios that could possibly be inlined that llvm is overly cautious about, but that would probably be an issue for the llvm dev list...
And you haven't given a concrete example to look at for someone wiser than me to look at, to know if it should be possible to inline in your scenario.

Automatically wrap C/C++ function at compile-time with annotation

In my C/C++ code I want to annotate different functions and methods so that additional code gets added at compile-time (or link-time). The added wrapping code should be able to inspect context (function being called, thread information, etc.), read/write input variables and modify return values and output variables.
How can I do that with GCC and/or Clang?
Take a look at instrumentation functions in GCC. From man gcc:
-finstrument-functions
Generate instrumentation calls for entry and exit to functions. Just after function entry and just before function exit, the following profiling functions will be called with the address of the current function and its call site. (On some platforms,
"__builtin_return_address" does not work beyond the current function, so the call site information may not be available to the profiling functions otherwise.)
void __cyg_profile_func_enter (void *this_fn,
void *call_site);
void __cyg_profile_func_exit (void *this_fn,
void *call_site);
The first argument is the address of the start of the current function, which may be looked up exactly in the symbol table.
This instrumentation is also done for functions expanded inline in other functions. The profiling calls will indicate where, conceptually, the inline function is entered and exited. This means that addressable versions of such functions must be available. If
all your uses of a function are expanded inline, this may mean an additional expansion of code size. If you use extern inline in your C code, an addressable version of such functions must be provided. (This is normally the case anyways, but if you get lucky
and the optimizer always expands the functions inline, you might have gotten away without providing static copies.)
A function may be given the attribute "no_instrument_function", in which case this instrumentation will not be done. This can be used, for example, for the profiling functions listed above, high-priority interrupt routines, and any functions from which the
profiling functions cannot safely be called (perhaps signal handlers, if the profiling routines generate output or allocate memory).

Detouring and using a _thiscall as a hook (GCC calling convention)

I've recently been working on detouring functions (only in Linux) and so far I've had great success. I was developing my own detouring class until I found this. I modernized the code a bit and converted it to C++ (as a class of course). That code is just like any other detour implementation, it replaces the original function address with a JMP to my own specified 'hook' function. It also creates a 'trampoline' for the original function.
Everything works flawlessly but I'd like to do one simple adjustement. I program in pure C++, I use no global functions and everything is enclosed in classes (just like Java/C#). The problem is that this detouring method breaks my pattern. The 'hook' function needs to be a static/non-class function.
What I want to do is to implement support for _thiscall hooks (which should be pretty simple with the GCC _thiscall convention). I've had no success modifying this code to work with _thiscall hooks. What I want as an end result is something just as simple as this; PatchAddress(void * target, void * hook, void * class);. I'm not asking anyone to do this for me, but I would like to know how to solve/approach my problem?
From what I know, I should only need to increase the 'patch' size (i.e it's now 5 bytes, and I should require an additional 5 bytes?), and then before I use the JMP call (to my hook function), I push my 'this' pointer to the stack (which should be as if I called it as a member function). To illustrate:
push 'my class pointer'
jmp <my hook function>
Instead of just having the 'jmp' call directly/only. Is that the correct approach or is there something else beneath that needs to be taken into account (note: I do not care about support for VC++ _thiscall)?
NOTE: here's is my implementation of the above mentioned code: header : source, uses libudis86
I tried several different methods and among these were JIT compile (using libjit) which proved successful but the method did not provide enough performance for it to be usable. Instead I turned to libffi, which is used for calling functions dynamically at run-time. The libffi library had a closure API (ffi_prep_closure_loc) which enabled me to supply my 'this' pointer to each closure generated. So I used a static callback function and converted the void pointer to my object type and from there I could call any non-static function I wished!