How can I disassemble the result of LLVM MCJIT compilation? - llvm

I have a program I wrote which uses LLVM 3.5 as a JIT compiler, which I'm trying to update to use MCJIT in LLVM 3.7. I have it mostly working, but I'm struggling to reproduce one debug-only feature I implemented with LLVM 3.5.
I would like to be able to see the host machine code (e.g. x86, x64 or ARM, not LLVM IR) generated by the JIT process; in debug builds I log this out as my program is running. With LLVM 3.5 I was able to do this by invoking ExecutionEngine::runJITOnFunction() to fill in a llvm::MachineCodeInfo object, which gave me the start address and size of the generated code. I could then disassemble that code.
I can't seem to find any equivalent in MCJIT. I can get the start address of the function (e.g. via getPointerToFunction()) but not the size.
I have seen Disassemble Memory but apart from not having that much detail in the answers, it seems to be more about how to disassemble a sequence of bytes. I know how to do that, my question is: how can I get hold of the sequence of bytes in the first place?
If it would help to make this more concrete, please reinterpret this question as: "How can I extend the example Kaleidoscope JIT to show the machine code (x86, ARM, etc) it produces, not just the LLVM IR?"
Thanks.

You have at least two options here.
Supply your own memory manager. This must be well documented and is done in many projects using MCJIT. But for the sake of completeness here's the code:
class MCJITMemoryManager : public llvm::RTDyldMemoryManager {
public:
static std::unique_ptr<MCJITMemoryManager> Create();
MCJITMemoryManager();
virtual ~MCJITMemoryManager();
// Allocate a memory block of (at least) the given size suitable for
// executable code. The section_id is a unique identifier assigned by the
// MCJIT engine, and optionally recorded by the memory manager to access a
// loaded section.
byte* allocateCodeSection(uintptr_t size, unsigned alignment,
unsigned section_id,
llvm::StringRef section_name) override;
// Allocate a memory block of (at least) the given size suitable for data.
// The SectionID is a unique identifier assigned by the JIT engine, and
// optionally recorded by the memory manager to access a loaded section.
byte* allocateDataSection(uintptr_t size, unsigned alignment,
unsigned section_id, llvm::StringRef section_name,
bool is_readonly) override;
...
}
Pass a memory manager instance to EngineBuilder:
std::unique_ptr<MCJITMemoryManager> manager = MCJITMemoryManager::Create();
llvm::ExecutionEngine* raw = lvm::EngineBuilder(std::move(module))
.setMCJITMemoryManager(std::move(manager))
...
.create();
Now via these callbacks you have control over the memory where the code gets emitted. (And the size is passed directly to your method). Simply remember the address of the buffer you allocated for code section and, stop the program in gdb and disassemble the memory (or dump it somewhere or even use LLVM's disassembler).
Just use llc on your LLVM IR with appropriate options (optimization level, etc.). As I see it, MCJIT is called so for a reason and that reason is that it reuses the existing code generation modules (same as llc).

Include the following header llvm/Object/SymbolSize.h, and use the function llvm::object::computeSymbolSizes(ObjectFile&). You will need to get an instance of the ObjectFile somehow.
To get that instance, here is what you could do:
Declare a class that is called to convert a Module to an ObjectFile, something like:
class ModuleToObjectFileCompiler {
...
// Compile a Module to an ObjectFile.
llvm::object::OwningBinary<llvm::object::ObjectFile>
operator() (llvm::Module&);
};
To implement the operator() of ModuleToObjectFileCompiler, take a look at llvm/ExecutionEngine/Orc/CompileUtils.h where the class SimpleCompiler is defined.
Provide an instance of ModuleToObjectFileCompiler to an instance of llvm::orc::IRCompileLayer, for instance:
new llvm::orc::IRCompileLayer
<llvm::orc::ObjectLinkingLayer
<llvm::orc::DoNothingOnNotifyLoaded> >
(_object_layer, _module_to_object_file);
The operator() of ModuleToObjectFileCompiler receives the instance of ObjectFile which you can provide to computeSymbolSizes(). Then check the returned std::vector to find out the sizes in bytes of all symbols defined in that Module. Save the information for the symbols you are interested in. And that's all.

Related

ELF INIT section code to prepopulate objects used at runtime

I'm fairly new to c++ and am really interested in learning more. Have been reading quite a bit. Recently discovered the init/fini elf sections.
I started to wonder if & how one would use the init section to prepopulate objects that would be used at runtime. Say for example you wanted
to add performance measurements to your code, recording the time, filename, linenumber, and maybe some ID (monotonic increasing int for ex) or name.
You would place for example:
PROBE(0,"EventProcessing",__FILE__,__LINE__)
...... //process event
PROBE(1,"EventProcessing",__FILE__,__LINE__)
......//different processing on same event
PROBE(2,"EventProcessing",__FILE__,__LINE__)
The PROBE could be some macro that populates a struct containing this data (maybe on an array/list, etc using the id as an indexer).
Would it be possible to have code in the init section that could prepopulate all of this data for each PROBE (except for the time of course), so only the time would need to be retrieved/copied at runtime?
As far as I know the __attribute__((constructor)) can not be applied to member functions?
My initial idea was to create some kind of
linked list with each node pointing to each probe and code in the init secction could iterate it populating the id, file, line, etc, but
that idea assumed I could use a member function that could run in the "init" section, but that does not seem possible. Any tips appreciated!
As far as I understand it, you do not actually need an ELF constructor here. Instead, you could emit descriptors for your probes using extended asm statements (using data, instead of code). This also involves switching to a dedicated ELF section for the probe descriptors, say __probes.
The linker will concatenate all the probes and in an array, and generate special symbols __start___probes and __stop___probes, which you can use from your program to access thes probes. See the last paragraph in Input Section Example.
Systemtap implements something quite similar for its userspace probes:
User Space Probe Implementation
Adding User Space Probing to an Application (heapsort example)
Similar constructs are also used within the Linux kernel for its self-patching mechanism.
There's a pretty simple way to have code run on module load time: Use the constructor of a global variable:
struct RunMeSomeCode
{
RunMeSomeCode()
{
// your code goes here
}
} do_it;
The .init/.fini sections basically exist to implement global constructors/destructors as part of the ABI on some platforms. Other platforms may use different mechanisms such as _start and _init functions or .init_array/.deinit_array and .preinit_array. There are lots of subtle differences between all these methods and which one to use for what is a question that can really only be answered by the documentation of your target platform. Not all platforms use ELF to begin with…
The main point to understand is that things like the .init/.fini sections in an ELF binary happen way below the level of C++ as a language. A C++ compiler may use these things to implement certain behavior on a certain target platform. On a different platform, a C++ compiler will probably have to use different mechanisms to implement that same behavior. Many compilers will give you tools in the form of language extensions like __attributes__ or #pragmas to control such platform-specific details. But those generally only make sense and will only work with that particular compiler on that particular platform.
You don't need a member function (which gets a this pointer passed as an arg); instead you can simply create constructor-like functions that reference a global array, like
#define PROBE(id, stuff, more_stuff) \
__attribute__((constructor)) void \
probeinit##id(){ probes[id] = {id, stuff, 0/*to be written later*/, more_stuff}; }
The trick is having this macro work in the middle of another function. GNU C / C++ allows nested functions, but IDK if you can make them constructors.
You don't want to declare a static int dummy#id = something because then you're adding overhead to the function you profile. (gcc has to emit a thread-safe run-once locking mechanism.)
Really what you'd like is some kind of separate pass over the source that identifies all the PROBE macros and collects up their args to declare
struct probe global_probes[] = {
{0, "EventName", 0 /*placeholder*/, filename, linenum},
{1, "EventName", 0 /*placeholder*/, filename, linenum},
...
};
I'm not confident you can make that happen with CPP macros; I don't think it's possible to #define PROBE such that every time it expands, it redefines another macro to tack on more stuff.
But you could easily do that with an awk/perl/python / your fave scripting language program that scans your program and constructs a .c that declares an array with static storage.
Or better (for a single-threaded program): keep the runtime timestamps in one array, and the names and stuff in a separate array. So the cache footprint of the probes is smaller. For a multi-threaded program, stores to the same cache line from different threads is called false sharing, and creates cache-line ping-pong.
So you'd have #define PROBE(id, evname, blah blah) do { probe_times[id] = now(); }while(0)
and leave the handling of the later args to your separate preprocessing.

gcc - how to auto instrument every basic block

GCC has an auto-instrument options for function entry/exit.
-finstrument-functions Generate instrumentation calls for entry and exit to functions. Just after function entry and just before function
exit, the following profiling functions will be called with the
address of the current function and its call site. (On some platforms,
__builtin_return_address does not work beyond the current function, so the call site information may not be available to the profiling
functions otherwise.)
void __cyg_profile_func_enter (void *this_fn,
void *call_site);
void __cyg_profile_func_exit (void *this_fn,
void *call_site);
I would like to have something like this for every "basic block" so that I can log, dynamically, execution of every branch.
How would I do this?
There is a fuzzer called American Fuzzy Lop, it solves very similar problem of instrumenting jumps between basic blocks to gather edge coverage: if basic blocks are vertices what jumps (edges) were encountered during execution. It may be worth to see its sources. It has three approaches:
afl-gcc is a wrapper for gcc that substitutes as by a wrapper rewriting assembly code according to basic block labels and jump instructions
plugin for Clang compiler
patch for QEMU for instrumenting already compiled code
Another and probably the simplest option may be to use DynamoRIO dynamic instrumentation system. Unlike QEMU, it is specially designed to implement custom instrumentation (either as rewriting machine code by hand or simply inserting calls that even may be automatically inlined in some cases, if I get documentation right). If you think dynamic instrumentation is something very hard, look at their examples -- they are only about 100-200 lines (but you still need to read their documentation at least here and for used functions since it may contain important points: for example DR constructs dynamic basic blocks, which are distinct from a compiler's classic basic blocks). With dynamic instrumentation you can even instrument used system libraries. In case it is not what you want, you may use something like
static module_data_t *traced_module;
// in dr_client_main
traced_module = dr_get_main_module();
// in basic block event handler
void *app_pc = dr_fragment_app_pc(tag);
if (!dr_module_contains_addr(traced_module, app_pc)) {
return DR_EMIT_DEFAULT;
}

Can I get the address of a singleton during compile or link time from gcc?

I am working on a embedded project and ask me, if it is possible to get the address of a singleton class during compile or link time.
To create my singleton, I use the following code and would be interested in the address of instance.
class A
{
public:
static A& get()
{
static A instance;
return instance;
}:
What I want to do, is of course changing the value from outside using a debug probe, but not using a real debug session.
Best regards
Andreas
Without signficant knowledge of exactly what development tools, hardware architecture, etc, you are using, it's very hard to say exactly what you should do, but it's typically possible to assign certain variables to a specific data-segment or functions in a specific code-segment, and then in the linking phase assign a specific address to that segment.
For example you can use the gcc section attribute:
int init_data __attribute__ ((section ("INITDATA")));
or
MyObj obj __attribute__((section ("BATTERY_BACKED")));
and then use the same section name in a linker script that places it to the "right" address.
Most (reasonable) embedded toolchains will support this in some manner, but exactly how it is done varies quite a lot.
Another option is to use placement new:
MyObj *obj = new ((void *)0x11220000) MyObj(args);
Usually debug probes only see physical addresses while user applications only operate on virtual addresses, which change all the times the application is loaded, so no linker trick will work. You didn't say which OS you use but I guess it's Linux. If so, you can do something like this: reserve yourself a scratchpad memory area you know the physical address of and which is not used by the OS. For example if your SoC has an embedded static memory, use that, if not just ask you local Linux expert how to reserve a page of RAM into the kernel memory configuration.
Then look at this article to understand how to map a physical address into the virtual memory space of your application:
how to access kernel space from user space(in linux)?
After getting the virtual address of the scratchpad area your application can read/write there whatever it wants. The debug probe will be able to to read/write into the same area with the physical address.
You can use placement-new with a buffer whose address is available at compile or link time.
#include <new>
extern unsigned char placeA[];
class A {
public:
static A& get()
{
static A *p_instance;
if(!p_instance) {
p_instance = new(placeA) A();
}
return *p_instance;
}
};
unsigned char placeA[sizeof(A)] __attribute__ ((aligned (__BIGGEST_ALIGNMENT__)));
Not exactly sure if this is what you're trying to do, but using "-S" with gcc will stop everything after the compile stage. That way you can dive into the assembly code and evaluate your variables. Here is the man page excerpt:
If you only want some of the stages of compilation, you can use -x (or
filename suffixes) to tell gcc where to start,
and one of the options -c, -S, or -E to say where gcc is to stop. Note that
some combinations (for example, -x cpp-output -E) instruct gcc to do nothing at all.
-c Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate
output is in the form of an object file for each source file.
By default, the object file name for a source file is made by replacing the suffix .c, .i, .s, etc., with .o.
Unrecognized input files, not requiring compilation or assembly, are ignored.
-S Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file
for each non-assembler input file specified.
By default, the assembler file name for a source file is made by replacing the suffix .c, .i, etc., with .s.
Input files that don't require compilation are ignored.
-E Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed
source code, which is sent to the standard output.
Input files which don't require preprocessing are ignored.

STM32 C++ operator new (CoIDE)

I'm new in ARM programming, I'm using CoIDE, I'm trying to write some application to read PWM from 8 channels, in C++.
My problem is using operator new; if I write:
RxPort rxPort = RxPort(RCC_AHB1Periph_GPIOA, GPIOA, GPIO_Pin_6, GPIO_PinSource6, GPIO_AF_TIM3, RCC_APB1Periph_TIM3, TIM3, TIM_Channel_1, TIM_IT_CC1, TIM3_IRQn);
it works fine, but if I write:
RxPort* rxPort1 = new RxPort;
rxPort1->setTimerParameters(RCC_APB1Periph_TIM3, TIM3, TIM_Channel_1, TIM_IT_CC1, TIM3_IRQn);
rxPort1->setGPIOParameters(RCC_AHB1Periph_GPIOA, GPIOA, GPIO_Pin_6, GPIO_PinSource6, GPIO_AF_TIM3);
rxPort1->init();
program goes to:
static void Default_Handler(void)
{
/* Go into an infinite loop. */
while (1)
{
}
}
after first line.
I've found one topic on my.st.com here, and tried to add "--specs=nano.specs" to "Misc Controls" in "Link" and "Compile" section, but nothing changes.
To support new/delete and malloc/free in GCC with then newlib C library, you must implement the _sbrk_r() syscalls stub, and allocate an area of memory for the heap. Typically the latter is done via the linker script, but you can also simply allocate a large static array. A smart linker script however can be written so that the heap automatically uses all available memory after static object and system stack allocation.
An example sbrk_r() implementation (as well as the other syscall stubs for supporting library features such as stream I/O) can be found on Bill Gatliff's site. If you are using CoOS or any other multitasking OS or executive, and are intending to allocate from multiple threads you will also need to implement __malloc_lock() and __malloc_unlock() too.
Your code ended up in Default_Handler because new is required to throw an exception when it fails and you had no explicit try/catch block. If you would rather have malloc() style semantics and simply return null on failure, you can use the new (std::nothrow).
Apparently your active GCC toolchain newlib stubs don't support use of low level dynamic memory allocation (malloc(),free(), etc.). The usage of new() or delete() for C++ bindings might raise a default 'exception' handler at run time.
The details depend on the newlib stubs provided with your configuration. Note that you can override the stub functions with your own implementations.
You'll find some useful additional hints in this article: Building GCC 4.7.1 ARM cross toolchain on Suse 12.2

Does an arbitrary instruction pointer reside in a specific function?

I have a very difficult problem I'm trying to solve: Let's say I have an arbitrary instruction pointer. I need to find out if that instruction pointer resides in a specific function (let's call it "Foo").
One approach to this would be to try to find the start and ending bounds of the function and see if the IP resides in it. The starting bound is easy to find:
void *start = &Foo;
The problem is, I don't know how to get the ending address of the function (or how "long" the function is, in bytes of assembly).
Does anyone have any ideas how you would get the "length" of a function, or a completely different way of doing this?
Let's assume that there is no SEH or C++ exception handling in the function. Also note that I am on a win32 platform, and have full access to the win32 api.
This won't work. You're presuming functions are contigous in memory and that one address will map to one function. The optimizer has a lot of leeway here and can move code from functions around the image.
If you have PDB files, you can use something like the dbghelp or DIA API's to figure this out. For instance, SymFromAddr. There may be some ambiguity here as a single address can map to multiple functions.
I've seen code that tries to do this before with something like:
#pragma optimize("", off)
void Foo()
{
}
void FooEnd()
{
}
#pragma optimize("", on)
And then FooEnd-Foo was used to compute the length of function Foo. This approach is incredibly error prone and still makes a lot of assumptions about exactly how the code is generated.
Look at the *.map file which can optionally be generated by the linker when it links the program, or at the program's debug (*.pdb) file.
OK, I haven't done assembly in about 15 years. Back then, I didn't do very much. Also, it was 680x0 asm. BUT...
Don't you just need to put a label before and after the function, take their addresses, subtract them for the function length, and then just compare the IP? I've seen the former done. The latter seems obvious.
If you're doing this in C, look first for debugging support --- ChrisW is spot on with map files, but also see if your C compiler's standard library provides anything for this low-level stuff -- most compilers provide tools for analysing the stack etc., for instance, even though it's not standard. Otherwise, try just using inline assembly, or wrapping the C function with an assembly file and a empty wrapper function with those labels.
The most simple solution is maintaining a state variable:
volatile int FOO_is_running = 0;
int Foo( int par ){
FOO_is_running = 1;
/* do the work */
FOO_is_running = 0;
return 0;
}
Here's how I do it, but it's using gcc/gdb.
$ gdb ImageWithSymbols
gdb> info line * 0xYourEIPhere
Edit: Formatting is giving me fits. Time for another beer.