What're all the LLVM layers for? - c++

I'm playing with LLVM 3.7 and wanted to use the new ORC stuff. But I've been going at this for a few hours now and still don't get what the each layer is for, when to use them, how to compose them or at the very least the minimum set of things I need in place.
Been through the Kaleidoscope tutorial but these don't explain what the constituent parts are, just says put this here and this here (plus the parsing etc distracts from the core LLVM bits). While that's great to get started it leaves a lot of gaps. There are lots of docs on various things in LLVM but there's so much its actually bordering on overwhelming. Stuff like http://llvm.org/releases/3.7.0/docs/ProgrammersManual.html but I can't find anything that explains how all the pieces fit together. Even more confusing there seems to be multiple APIs for doing the same thing, thinking of the MCJIT and the newer ORC API. I saw Lang Hames post explaining, a fair few things seem to have changed since the patch he posted in that link.
So for a specific question, how do all these layers fit together?
When I previously used LLVM I could link to C functions fairly easily, using the "How to use JIT" example as a base, I tried linking to an externed function extern "C" double doIt but end up with LLVM ERROR: Tried to execute an unknown external function: doIt.
Having a look at this ORC example it seems I need to configure where it searches for the symbols. But TBH while I'm still swinging at this, its largely guess work. Here's what I got:
#include "llvm/ADT/STLExtras.h"
#include "llvm/ExecutionEngine/GenericValue.h"
#include "llvm/ExecutionEngine/Interpreter.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/raw_ostream.h"
#include "std.hpp"
using namespace llvm;
int main() {
InitializeNativeTarget();
LLVMContext Context;
// Create some module to put our function into it.
std::unique_ptr<Module> Owner = make_unique<Module>("test", Context);
Module *M = Owner.get();
// Create the add1 function entry and insert this entry into module M. The
// function will have a return type of "int" and take an argument of "int".
// The '0' terminates the list of argument types.
Function *Add1F = cast<Function>(M->getOrInsertFunction("add1", Type::getInt32Ty(Context), Type::getInt32Ty(Context), (Type *) 0));
// Add a basic block to the function. As before, it automatically inserts
// because of the last argument.
BasicBlock *BB = BasicBlock::Create(Context, "EntryBlock", Add1F);
// Create a basic block builder with default parameters. The builder will
// automatically append instructions to the basic block `BB'.
IRBuilder<> builder(BB);
// Get pointers to the constant `1'.
Value *One = builder.getInt32(1);
// Get pointers to the integer argument of the add1 function...
assert(Add1F->arg_begin() != Add1F->arg_end()); // Make sure there's an arg
Argument *ArgX = Add1F->arg_begin(); // Get the arg
ArgX->setName("AnArg"); // Give it a nice symbolic name for fun.
// Create the add instruction, inserting it into the end of BB.
Value *Add = builder.CreateAdd(One, ArgX);
// Create the return instruction and add it to the basic block
builder.CreateRet(Add);
// Now, function add1 is ready.
// Now we're going to create function `foo', which returns an int and takes no
// arguments.
Function *FooF = cast<Function>(M->getOrInsertFunction("foo", Type::getInt32Ty(Context), (Type *) 0));
// Add a basic block to the FooF function.
BB = BasicBlock::Create(Context, "EntryBlock", FooF);
// Tell the basic block builder to attach itself to the new basic block
builder.SetInsertPoint(BB);
// Get pointer to the constant `10'.
Value *Ten = builder.getInt32(10);
// Pass Ten to the call to Add1F
CallInst *Add1CallRes = builder.CreateCall(Add1F, Ten);
Add1CallRes->setTailCall(true);
// Create the return instruction and add it to the basic block.
builder.CreateRet(Add1CallRes);
std::vector<Type *> args;
args.push_back(Type::getDoubleTy(getGlobalContext()));
FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), args, false);
Function *F = Function::Create(FT, Function::ExternalLinkage, "doIt", Owner.get());
// Now we create the JIT.
ExecutionEngine *EE = EngineBuilder(std::move(Owner)).create();
outs() << "We just constructed this LLVM module:\n\n" << *M;
outs() << "\n\nRunning foo: ";
outs().flush();
// Call the `foo' function with no arguments:
std::vector<GenericValue> noargs;
GenericValue gv = EE->runFunction(FooF, noargs);
auto ax = EE->runFunction(F, noargs);
// Import result of execution:
outs() << "Result: " << gv.IntVal << "\n";
outs() << "Result 2: " << ax.IntVal << "\n";
delete EE;
llvm_shutdown();
return 0;
}
doIt is declared in std.hpp.

Your question is very vague, but maybe I can help a bit. This code sample is a simple JIT built with Orc - it's well commented so it should be easy to follow.
Put simply, Orc builds on top of the same building blocks used by MCJIT (MC for compiling LLVM modules down to object files, RuntimeDyld for the dynamic linking at runtime), but provides more flexibility with its concept of layers. It can thus support things like "lazy" JIT compilation, which MCJIT doesn't support. This is important for the LLVM community because the "old JIT" that was removed not very long ago supported these things. Orc JIT lets us gain back these advanced JIT capabilities while still building on top of MC and thus not duplicating the code emission logic.
To get better answers, I suggest you ask more specific questions.

Related

Why does my llvm function jit-evaluate to 0?

I am playing with llvm (and antlr), working vaguely along the lines of the Kaleidoscope tutorial. I successfully created LLVM-IR code from basic arithmetic expressions both on top-level and as function definitions, which corresponds to the tutorial chapters up to 3.
Now I would like to incrementally add JIT support, starting with the top-level arithmetic expressions. Here is my problem:
Basic comparison makes it seem as if I follow the same sequence of function calls as the tutorial, only with a simpler code organization
The generated IR code looks good
The function definition is apparently found, since otherwise the code would exit (i verified this by intentionally looking for a wrongly spelled function name)
However the call of the function pointer created by JIT evaluation always returns zero.
These snippets (excerpt) are executed as part of the antlr visitor of the main/entry-node of my grammar:
//Top node main -- top level expression
antlrcpp::Any visitMain(ExprParser::MainContext *ctx)
{
llvm::InitializeNativeTarget();
llvm::InitializeNativeTargetAsmPrinter();
llvm::InitializeNativeTargetAsmParser();
TheJIT = ExitOnErr( llvm::orc::KaleidoscopeJIT::Create() );
InitializeModuleAndPassManager();
// ... Code which visits the child nodes ...
}
InitializeModuleAndPassManager() is the same as in the tutorial:
static void InitializeModuleAndPassManager()
{
// Open a new context and module.
TheContext = std::make_unique<llvm::LLVMContext>();
TheModule = std::make_unique<llvm::Module>("commandline", *TheContext);
TheModule->setDataLayout(TheJIT->getDataLayout());
// Create a new builder for the module.
Builder = std::make_unique<llvm::IRBuilder<>>(*TheContext);
// Create a new pass manager attached to it.
TheFPM = std::make_unique<llvm::legacy::FunctionPassManager>(TheModule.get());
// Do simple "peephole" optimizations and bit-twiddling optzns.
TheFPM->add(llvm::createInstructionCombiningPass());
// Reassociate expressions.
TheFPM->add(llvm::createReassociatePass());
// Eliminate Common SubExpressions.
TheFPM->add(llvm::createGVNPass());
// Simplify the control flow graph (deleting unreachable blocks, etc).
TheFPM->add(llvm::createCFGSimplificationPass());
TheFPM->doInitialization();
}
This is the function which handles the top-level expression and which is also supposed to do JIT evaluation:
//Bare expression without function definition -- create anonymous function
antlrcpp::Any visitBareExpr(ExprParser::BareExprContext *ctx)
{
string fName = "__anon_expr";
llvm::FunctionType *FT = llvm::FunctionType::get(llvm::Type::getDoubleTy(*TheContext), false);
llvm::Function *F = llvm::Function::Create(FT, llvm::Function::ExternalLinkage, fName, TheModule.get());
llvm::BasicBlock *BB = llvm::BasicBlock::Create(*TheContext, "entry", F);
Builder->SetInsertPoint(BB);
llvm::Value* Expression=visit(ctx->expr()).as<llvm::Value* >();
Builder->CreateRet(Expression);
llvm::verifyFunction(*F);
//TheFPM->run(*F);//outcommented this because i wanted to try JIT before optimization-
//it causes a compile error right now because i probably lack some related code.
//However i do not assume that a missing optimization run will cause the problem that i have
F->print(llvm::errs());
// Create a ResourceTracker to track JIT'd memory allocated to our
// anonymous expression -- that way we can free it after executing.
auto RT = TheJIT->getMainJITDylib().createResourceTracker();
auto TSM = llvm::orc::ThreadSafeModule(move(TheModule), move(TheContext));
ExitOnErr(TheJIT->addModule(move(TSM), RT));
InitializeModuleAndPassManager();
// Search the JIT for the __anon_expr symbol.
auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));
// Get the symbol's address and cast it to the right type (takes no
// arguments, returns a double) so we can call it as a native function.
double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
double ret = FP();
fprintf(stderr, "Evaluated to %f\n", ret);
// Delete the anonymous expression module from the JIT.
ExitOnErr(RT->remove());
return F;
}
Now this is what happens as an example:
[robert#robert-ux330uak test4_expr_llvm_2]$ ./testmain '3*4'
define double #__anon_expr() {
entry:
ret float 1.200000e+01
}
Evaluated to 0.000000
I would be thankful for any ideas about what I might be doing wrong.

What obvious thing am I overlooking in this failing C++ class instantiation?

I'm writing code for an Arduino-based retirement countdown clock (a gift for a coworker), and have grouped some code into a simple class. This chunk of code gives me the error "error: 'rd' does not name a type" when I compile it in the current (1.6.7) Arduino IDE:
#include "RetirementDisplay.h"
RetirementDisplay* rd;
rd = new RetirementDisplay(&update_lcd);
Oddly, this code compiles without error:
#include "RetirementDisplay.h"
RetirementDisplay* rd = new RetirementDisplay(&update_lcd);
But then when I try to use a member function (like rd->add_screen()) of the newly created rd object, I get the same "rd does not define a type" error, which is completely illogical to me. My C++ is super rusty, though, so I assume there's probably something I'm overlooking here.
The update_lcd method is defined earlier in the same file, and does take two String arguments, so I don't think it's upset about that.
The contents of RetirementDisplay.h are similarly simple; it's just a linked list and a couple of convenience functions to bundle related functionality together while tracking what "screen" (just a couple of printf formats) should currently be active. Don't judge me on my sketchy naming convention; this was supposed to be a quick project. ;)
#ifndef RetirementDisplay_h
#define RetirementDisplay_h
#include "RetirementScreen.h"
class RetirementDisplay {
protected:
RetirementScreen* head;
RetirementScreen* current;
void (*updater)(String, String);
public:
RetirementDisplay( void(*)(String,String) );
void add_screen(RetirementScreen*);
void update();
void next();
void prev();
};
#endif
It looks like this line is intended to be an assignment statement:
rd = new RetirementDisplay(&update_lcd);
but statements must be inside functions, like this:
#include "RetirementDisplay.h"
void myfunction()
{
RetirementDisplay* rd;
rd = new RetirementDisplay(&update_lcd);
}
However, variable declarations can be outside functions, so that is why you don't get an error on this line:
RetirementDisplay* rd = new RetirementDisplay(&update_lcd);

Getting a function name (__func__) from a class T and a pointer to member function void(T::*pmf)()

Is it possible to write some f() template function that takes a type T and a pointer to member function of signature void(T::*pmf)() as (template and/or function) arguments and returns a const char* that points to the member function's __func__ variable (or to the mangled function name)?
EDIT: I am asked to explain my use-case. I am trying to write a unit-test library (I know there is a Boost Test library for this purpose). And my aim is not to use any macros at all:
struct my_test_case : public unit_test::test {
void some_test()
{
assert_test(false, "test failed.");
}
};
My test suite runner will call my_test_case::some_test() and if its assertion fails, I want it log:
ASSERTION FAILED (&my_test_case::some_test()): test failed.
I can use <typeinfo> to get the name of the class but the pointer-to-member-function is just an offset, which gives no clue to the user about the test function being called.
It seems like what you are trying to achieve, is to get the name of the calling function in assert_test(). With gcc you can use
backtace to do that. Here is a naive example:
#include <iostream>
#include <execinfo.h>
#include <cxxabi.h>
namespace unit_test
{
struct test {};
}
std::string get_my_caller()
{
std::string caller("???");
void *bt[3]; // backtrace
char **bts; // backtrace symbols
size_t size = sizeof(bt)/sizeof(*bt);
int ret = -4;
/* get backtrace symbols */
size = backtrace(bt, size);
bts = backtrace_symbols(bt, size);
if (size >= 3) {
caller = bts[2];
/* demangle function name*/
char *name;
size_t pos = caller.find('(') + 1;
size_t len = caller.find('+') - pos;
name = abi::__cxa_demangle(caller.substr(pos, len).c_str(), NULL, NULL, &ret);
if (ret == 0)
caller = name;
free(name);
}
free(bts);
return caller;
}
void assert_test(bool expression, const std::string& message)
{
if (!expression)
std::cout << "ASSERTION FAILED " << get_my_caller() << ": " << message << std::endl;
}
struct my_test_case : public unit_test::test
{
void some_test()
{
assert_test(false, "test failed.");
}
};
int main()
{
my_test_case tc;
tc.some_test();
return 0;
}
Compiled with:
g++ -std=c++11 -rdynamic main.cpp -o main
Output:
ASSERTION FAILED my_test_case::some_test(): test failed.
Note: This is a gcc (linux, ...) solution, which might be difficult to port to other platforms!
TL;DR: It is not possible to do this in a reasonably portable way, other than using macros. Using debug symbols is really a hard solution, which will introduce a maintenance and architecture problem in the future, and a bad solution.
The names of functions, in any form, is not guaranteed to be stored in the binary [or anywhere else for that matter]. Static free functions certainly won't have to expose their name to the rest of the world, and there is no real need for virtual member functions to have their names exposed either (except when the vtable is formed in A.c and the member function is in B.c).
It is also entirely permissible for the linker to remove ALL names of functions and variables. Names MAY be used by shared libraries to find functions not present in the binary, but the "ordinal" way can avoid that too, if the system is using that method.
I can't see any other solution than making assert_test a macro - and this is actually a GOOD use-case of macros. [Well, you could of course pass __func__ as a an argument, but that's certainly NOT better than using macros in this limited case].
Something like:
#define assert_test(x, y) do_assert_test(x, y, __func__)
and then implment do_assert_test to do what your original assert_test would do [less the impossible bit of figuring out the name of the function].
If it's unit tests, and you can be sure that you will always do this with debug symbols, you could solve it in a very non-portable way by building with debug symbols and then using the debug interface to find the name of the function you are currently in. The reason I say it's non-portable is that the debug API for a given OS is not standard - Windows does it one way, Linux another, and I'm not sure how it works in MacOS - and to make matters worse, my quick search on the subject seems to indicate that reading debug symbols doesn't have an API as such - there is a debug API that allows you to inspect the current process and figure out where you are, what the registers contain, etc, but not to find out what the name of the function is. So that's definitely a harder solution than "convince whoever needs to be convinced that this is a valid use of a macro".

Perfect hash function for strings known in advance

I have 4000 strings and I want to create a perfect hash table with these strings. The strings are known in advance, so my first idea was to use a series of if statements:
if (name=="aaa")
return 1;
else if (name=="bbb")
return 2;
.
.
.
// 4000th `if' statement
However, this would be very inefficient. Is there a better way?
gperf is a tool that does exactly that:
GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only.
According to the documentation, gperf is used to generate the reserved keyword recogniser for lexers in GNU C, GNU C++, GNU Java, GNU Pascal, GNU Modula 3, and GNU indent.
The way it works is described in GPERF: A Perfect Hash Function Generator by Douglas C. Schmidt.
Better later than never, I believe this now finally answers the OP question:
Simply use https://github.com/serge-sans-paille/frozen -- a Compile-time (constexpr) library of immutable containers for C++ (using "perfect hash" under the hood).
On my tests, it performed in pair with the famous GNU's gperf perfect hash C code generator.
On your pseudo-code terms:
#include <frozen/unordered_map.h>
#include <frozen/string.h>
constexpr frozen::unordered_map<frozen::string, int, 2> olaf = {
{"aaa", 1},
{"bbb", 2},
.
.
.
// 4000th element
};
return olaf.at(name);
Will respond in O(1) time rather than OP's O(n)
-- O(n) assuming the compiler wouldn't optimize your if chain, which it might do)
Since the question is still unanswered and I'm about to add the same functionality to my HFT platform, I'll share my inventory for Perfect Hash Algorithms in C++. It is harder than I thought to find an open, flexible and bug free implementation, so I'm sharing the ones I didn't drop yet:
The CMPH library, with a collection of papers and such algorithms -- https://git.code.sf.net/p/cmph/git
BBHash, one more implementation from a paper's author -- https://github.com/rizkg/BBHash
Ademakov's -- another implementation from the paper above -- https://github.com/ademakov/PHF
wahern/phf -- I'm currently inspecting this one and trying to solve some allocation bugs it has when dealing with C++ Strings on huge key sets -- https://github.com/wahern/phf.git
emphf -- seems unmantained -- https://github.com/ot/emphf.git
I believe #NPE's answer is very reasonable, and I doubt it is too much for your application as you seem to imply.
Consider the following example: suppose you have your "engine" logic (that is: your application's functionality) contained in a file called engine.hpp:
// this is engine.hpp
#pragma once
#include <iostream>
void standalone() {
std::cout << "called standalone" << std::endl;
}
struct Foo {
static void first() {
std::cout << "called Foo::first()" << std::endl;
}
static void second() {
std::cout << "called Foo::second()" << std::endl;
}
};
// other functions...
and suppose you want to dispatch the different functions based on the map:
"standalone" dispatches void standalone()
"first" dispatches Foo::first()
"second" dispatches Foo::second()
# other dispatch rules...
You can do that using the following gperf input file (I called it "lookups.gperf"):
%{
#include "engine.hpp"
struct CommandMap {
const char *name;
void (*dispatch) (void);
};
%}
%ignore-case
%language=C++
%define class-name Commands
%define lookup-function-name Lookup
struct CommandMap
%%
standalone, standalone
first, Foo::first
second, Foo::second
Then you can use gperf to create a lookups.hpp file using a simple command:
gperf -tCG lookups.gperf > lookups.hpp
Once I have that in place, the following main subroutine will dispatch commands based on what I type:
#include <iostream>
#include "engine.hpp" // this is my application engine
#include "lookups.hpp" // this is gperf's output
int main() {
std::string command;
while(std::cin >> command) {
auto match = Commands::Lookup(command.c_str(), command.size());
if(match) {
match->dispatch();
} else {
std::cerr << "invalid command" << std::endl;
}
}
}
Compile it:
g++ main.cpp -std=c++11
and run it:
$ ./a.out
standalone
called standalone
first
called Foo::first()
Second
called Foo::second()
SECOND
called Foo::second()
first
called Foo::first()
frst
invalid command
Notice that once you have generated lookups.hpp your application has no dependency whatsoever in gperf.
Disclaimer: I took inspiration for this example from this site.

Error apparently raised by not yet executed code

I'm learning c++ by writing a program to convert MIDI files to Lilypond source files.
My program is composed by two main parts:
a MIDI file parser, that creates an object called MidiFile.
a converter that takes a MidiFile objects and converts it to a Lilypond source.
Today I've started coding the converter, and while I was testing it a strange error occurred: the program dies after an exception being thrown, more specifically a HeaderError, that means that the header chunk in the MIDI file is not as expected. It wouldn't seem that strange, but this error shows up only if I add a line of code after the buggy code! I add the main() function to better explain myself
#include <iostream>
#include "midiToLyConverter.hpp"
int main(){
// a queue to store notes that have not yet been shut down
using MidiToLyConverter::Converter::NoteQueue;
// representation of a note
using MidiToLyConverter::Converter::Note;
// the converter class
using MidiToLyConverter::Converter::Converter;
// the midifile class
using Midi::MidiFile;
// representation of a midi track
using Midi::MidiTrack;
// representation of a midi event
using Midi::MidiEvents::Event;
Parser::Parser parser = Parser::Parser(); // parser class
parser.buildMidiFile(); // builds the midi file from a .mid
Midi::MidiFile* midiFile = parser.getMidiFile(); // gets the MidiFile object
// iterates over all the tracks in the MidiFile
while(midiFile->hasNext()){
std::cout<< "==========\n";
MidiTrack* track = midiFile->nextTrack();
// iterates over all events in a track
while(track->hasNext()){
Event* event = track->nextEvent();
if (event->getEventType() == Midi::MidiEvents::NOTE_ON ||
event->getEventType() == Midi::MidiEvents::NOTE_OFF
)
// print the event if it's a note on or off
event->print();
}
}
return 0;
}
With my main() like this, everything works properly, but, if I add something between buildMidiFile and the while loop, the function buildMidiFile throws the exception!!!
Even if it's a completely unrelated instruction!
#include <iostream>
#include "midiToLyConverter.hpp"
int main(){
using MidiToLyConverter::Converter::NoteQueue;
using MidiToLyConverter::Converter::Note;
using MidiToLyConverter::Converter::Converter;
using Midi::MidiFile;
using Midi::MidiTrack;
using Midi::MidiEvents::Event;
Parser::Parser parser = Parser::Parser(); // parser class
parser.buildMidiFile(); // THE EXCEPTION IS THROWN HERE
Midi::MidiFile* midiFile = parser.getMidiFile(); // gets the MidiFile object
// adding this causes the exception to be thrown by the function
// buildMidiFile() called 5 lines above!
std::vector<bool>* vec = new std::vector<bool>();
// iterates over all the tracks in the MidiFile
while(midiFile->hasNext()){
std::cout<< "==========\n";
MidiTrack* track = midiFile->nextTrack();
// iterates over all events in a track
while(track->hasNext()){
Event* event = track->nextEvent();
if (event->getEventType() == Midi::MidiEvents::NOTE_ON ||
event->getEventType() == Midi::MidiEvents::NOTE_OFF
)
// print the event if it's a note on or off
event->print();
}
}
return 0;
}
I can't explain myself how this is possible. So if anyone has ideas or advices, all the help would be greatly appreciated :) If it's helpful I can post the source code for other classes and/or functions.
Solved! As pointed out in comments to the question, it was a problem caused by some sort of memory corruption. As suggested I used a memory checher (valgrind) and found out that it was a really stupid error: i simply forgot to initialize a variable in a for loop, something like
for (int i; i < limit ; i++)
and this led to that strange error :-) Initializing i to 0 solved the problem, and now the program works with Parser object placed either on the stack or on the heap.
So I suggest others incurring in similar problems to use a memory checker to control the memory usage of their program. Using valgrind is really simple:
valgrind --leak-check=yes yourProgram arg1 arg2
where arg1 and arg2 are the (eventual) arguments that your program requires.
Furthermore compiling your program with the -g flag (at least on g++, I don't know on other compilers), valgrind will also tell you at wich line of code the memory leak occurred.
Thanks to everybody for the help!
Regards
Matteo