What's LLVMBlockAddress() first argument? - llvm

I'm playing a bit with LLVM C-API and I'm somehow stuck with LLVMBuildIndirectBr(), or more exactly with LLVMBlockAddress() because I have no idea what its first argument is and more importantly how can I create it. It's a LLVMValueRef wich is supposed to represent 'the function' but the documentation I found doesn't tell more.

According to its code, this function is just a C wrapper for BlockAddress::get(). So, the first argument is the Function that contains the BB, I presume.
There is no C API for BlockAddress::get() overload taking only BB argument, so you have to call LLVMGetBasicBlockParent() on that BB first to obtain a reference to the Function it belongs to, and then pass it as first parameter to LLVMBlockAddress().
As a rule of thumb in such situations, try to figure out "native" C++ method you are using, and then look for its documentation.

Apparently it's via LLVMFunctionType() and LLVMAddFunction().

try:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int arg_count, char **args)
{
label0:
if (arg_count < 2)
{
uint32_t addr1 = &&label1 - &&label0;
uint32_t addr2 = &&label2 - &&label0;
printf("label1=%d label2=%d\n", addr1, addr2);
return 0;
}
else
{
uint32_t addr = strtol(args[1], NULL, 0);
printf("goto address = %d\n", addr);
void *indirect_addr = &&label0 + addr;
goto *indirect_addr;
}
label1:
printf("label1\n");
return 1;
label2:
printf("label2\n");
return 2;
}
clang ind_br.c -emit-llvm -S
see the output
blockaddress(#main, %10) = LLVMBlockAddress
indirectbr i8* %36, [label %29, label %10, label %31, label %10, label %10] = LLVMBuildIndirectBr + LLVMAddDestination
So first argument of LLVMBuildIndirectBr is result of one of LLVMBlockAddress()

Related

Adding an Object File to JIT and calling it from IR code

I created a modified version of HowToUseJIT.cpp (llvm version 11.x) that uses IRBuilder class to build a function that calls an external defined in an shared object file.
This example works fine (on my system) when the external has an int argument and return value, but it fails when the argument and return value are double.
The Source for the int case is included below. In addition, the source has instructions, at the top, for transforming it to the double case.
What is wrong with the double version of this example ?
/*
This file is a modified version of the llvm 11.x example HowToUseJIT.cpp:
The file callee.c contains the following text:
int callee(int arg)
{ return arg + 1; }
The shared library callee.so is created from callee.c as follows:
clang -shared callee.c -o callee.so
This example calls the funciton callee from a function that is generated using
the IRBuilder class. It links callee by loading callee.so into its LLJIT.
This works on my sytesm where the progam output is
add1(42) = 43
which is correct.
If I change the type of the function callee from "int (*)(int)" to
"double (*)(double)", the program output is
add1(42) = 4.200000e+01
which is incorrect.
I use following command to change callee.c so that it uses double:
sed -i callee.c \
-e 's|int callee(int arg)|double callee(double arg)|' \
-e 's|return arg + 1;|return arg + 1.0;|'
I use the following command to change this file so that it should porperly
link to the double version of callee:
sed -i add_obj2jit.cpp \
-e '30,$s|"int"|"double"|' \
-e '30,$s|getInt32Ty|getDoubleTy|g' \
-e '/getAddress/s|int|double|g' \
-e 's|int Result = Add1(42);|double Result = Add1(42.0);|
What is wrong with the double version of this example ?
*/
#include "llvm/ExecutionEngine/Orc/LLJIT.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/raw_ostream.h"
using namespace llvm;
using namespace llvm::orc;
ExitOnError ExitOnErr;
// --------------------------------------------------------------------------
void add_obj2jit(LLJIT* jit, const std::string filename)
{ // load object file into memory_buffer
ErrorOr< std::unique_ptr<MemoryBuffer> > error_or_buffer =
MemoryBuffer::getFile(filename);
std::error_code std_error_code = error_or_buffer.getError();
if( std_error_code )
{ std::string msg = "add_obj2jit: " + filename + "\n";
msg += std_error_code.message();
std::fprintf(stderr, "%s\n", msg.c_str() );
std::exit( std_error_code.value() );
}
std::unique_ptr<MemoryBuffer> memory_buffer(
std::move( error_or_buffer.get() )
);
// move object file into jit
Error error = jit->addObjectFile( std::move(memory_buffer) );
if( error )
{ std::fprintf(stderr, "Can't load object file %s", filename.c_str());
std::exit(1);
}
}
// --------------------------------------------------------------------------
ThreadSafeModule createDemoModule() {
auto Context = std::make_unique<LLVMContext>();
auto M = std::make_unique<Module>("test", *Context);
// functiont_t
// function has a return type of "int" and take an argument of "int".
FunctionType* function_t = FunctionType::get(
Type::getInt32Ty(*Context), {Type::getInt32Ty(*Context)}, false
);
// declare the callee function
AttributeList empty_attributes;
FunctionCallee callee = M->getOrInsertFunction(
"callee", function_t, empty_attributes
);
// Create the add1 function entry and insert this entry into module M.
Function *Add1F = Function::Create(
function_t, Function::ExternalLinkage, "add1", M.get()
);
// Add a basic block to the function. As before, it automatically inserts
// because of the last argument.
BasicBlock *BB = BasicBlock::Create(*Context, "EntryBlock", Add1F);
// Create a basic block builder with default parameters. The builder will
// automatically append instructions to the basic block `BB'.
IRBuilder<> builder(BB);
// Get pointers to the integer argument of the add1 function...
assert(Add1F->arg_begin() +1 == Add1F->arg_end()); // Make sure there's an arg
Argument *ArgX = &*Add1F->arg_begin(); // Get the arg
ArgX->setName("AnArg"); // Give it a nice symbolic name for fun.
// Create the call instruction, inserting it into the end of BB.
Value *Add = builder.CreateCall( callee, {ArgX}, "Add=callee(ArgX)" );
// Create the return instruction and add it to the basic block
builder.CreateRet(Add);
return ThreadSafeModule(std::move(M), std::move(Context));
}
// --------------------------------------------------------------------------
int main(int argc, char *argv[]) {
// Initialize LLVM.
InitLLVM X(argc, argv);
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
cl::ParseCommandLineOptions(argc, argv, "add_obj2jit");
ExitOnErr.setBanner(std::string(argv[0]) + ": ");
// Create an LLJIT instance.
auto J = ExitOnErr(LLJITBuilder().create());
auto M = createDemoModule();
ExitOnErr(J->addIRModule(std::move(M)));
add_obj2jit(J.get(), "callee.so");
// Look up the JIT'd function, cast it to a function pointer, then call it.
auto Add1Sym = ExitOnErr(J->lookup("add1"));
int (*Add1)(int) = (int (*)(int))Add1Sym.getAddress();
int Result = Add1(42);
outs() << "add1(42) = " << Result << "\n";
// return error number
if( Result != 43 )
return 1;
return 0;
}
Andrea:
Thanks for asking to see the IR outupt. Changing the example code line
// llvm::outs() << *M;
to the line
lvm::outs() << *M;
generates this output.
Looking at the output is was clear to me that second sed command had failed.
This was because it was missing a single quote at the end.
When I fixed this, the double case worked. Here is the outptut, including the IR, for the the int case:
; ModuleID = 'test'
source_filename = "test"
declare i32 #callee(i32)
define i32 #add1(i32 %AnArg) {
EntryBlock:
%0 = call i32 #callee(i32 %AnArg)
ret i32 %0
}
add1(42) = 43
Here is the output for the double case:
; ModuleID = 'test'
source_filename = "test"
declare double #callee(double)
define double #add1(double %AnArg) {
EntryBlock:
%0 = call double #callee(double %AnArg)
ret double %0
}
add1(42) = 4.300000e+01

Create a function with unique function pointer in runtime

When calling WinAPI functions that take callbacks as arguments, there's usually a special parameter to pass some arbitrary data to the callback. In case there's no such thing (e.g. SetWinEventHook) the only way we can understand which of the API calls resulted in the call of the given callback is to have distinct callbacks. When we know all the cases in which the given API is called at compile-time, we can always create a class template with static method and instantiate it with different template arguments in different call sides. That's a hell of a work, and I don't like doing so.
How do I create callback functions at runtime so that they have different function pointers?
I saw a solution (sorry, in Russian) with runtime assembly generation, but it wasn't portable across x86/x64 archtectures.
You can use the closure API of libffi. It allows you to create trampolines each with a different address. I implemented a wrapping class here, though that's not finished yet (only supports int arguments and return type, you can specialize detail::type to support more than just int). A more heavyweight alternative is LLVM, though if you're dealing only with C types, libffi will do the job fine.
I've come up with this solution which should be portable (but I haven't tested it):
#define ID_PATTERN 0x11223344
#define SIZE_OF_BLUEPRINT 128 // needs to be adopted if uniqueCallbackBlueprint is complex...
typedef int (__cdecl * UNIQUE_CALLBACK)(int arg);
/* blueprint for unique callback function */
int uniqueCallbackBlueprint(int arg)
{
int id = ID_PATTERN;
printf("%x: Hello unique callback (arg=%d)...\n", id, arg);
return (id);
}
/* create a new unique callback */
UNIQUE_CALLBACK createUniqueCallback(int id)
{
UNIQUE_CALLBACK result = NULL;
char *pUniqueCallback;
char *pFunction;
int pattern = ID_PATTERN;
char *pPattern;
char *startOfId;
int i;
int patterns = 0;
pUniqueCallback = malloc(SIZE_OF_BLUEPRINT);
if (pUniqueCallback != NULL)
{
pFunction = (char *)uniqueCallbackBlueprint;
#if defined(_DEBUG)
pFunction += 0x256; // variable offset depending on debug information????
#endif /* _DEBUG */
memcpy(pUniqueCallback, pFunction, SIZE_OF_BLUEPRINT);
result = (UNIQUE_CALLBACK)pUniqueCallback;
/* replace ID_PATTERN with requested id */
pPattern = (char *)&pattern;
startOfId = NULL;
for (i = 0; i < SIZE_OF_BLUEPRINT; i++)
{
if (pUniqueCallback[i] == *pPattern)
{
if (pPattern == (char *)&pattern)
startOfId = &(pUniqueCallback[i]);
if (pPattern == ((char *)&pattern) + sizeof(int) - 1)
{
pPattern = (char *)&id;
for (i = 0; i < sizeof(int); i++)
{
*startOfId++ = *pPattern++;
}
patterns++;
break;
}
pPattern++;
}
else
{
pPattern = (char *)&pattern;
startOfId = NULL;
}
}
printf("%d pattern(s) replaced\n", patterns);
if (patterns == 0)
{
free(pUniqueCallback);
result = NULL;
}
}
return (result);
}
Usage is as follows:
int main(void)
{
UNIQUE_CALLBACK callback;
int id;
int i;
id = uniqueCallbackBlueprint(5);
printf(" -> id = %x\n", id);
callback = createUniqueCallback(0x4711);
if (callback != NULL)
{
id = callback(25);
printf(" -> id = %x\n", id);
}
id = uniqueCallbackBlueprint(15);
printf(" -> id = %x\n", id);
getch();
return (0);
}
I've noted an interresting behavior if compiling with debug information (Visual Studio). The address obtained by pFunction = (char *)uniqueCallbackBlueprint; is off by a variable number of bytes. The difference can be obtained using the debugger which displays the correct address. This offset changes from build to build and I assume it has something to do with the debug information? This is no problem for the release build. So maybe this should be put into a library which is build as "release".
Another thing to consider whould be byte alignment of pUniqueCallback which may be an issue. But an alignment of the beginning of the function to 64bit boundaries is not hard to add to this code.
Within pUniqueCallback you can implement anything you want (note to update SIZE_OF_BLUEPRINT so you don't miss the tail of your function). The function is compiled and the generated code is re-used during runtime. The initial value of id is replaced when creating the unique function so the blueprint function can process it.

LLVM IR Function with an array parameter

I want to generate LLVM IR code from two basic c++ functions which are like below.
int newFun2(int x){
int z = x + x;
return z;
}
int newFun(int *y){
int first = y[3]; //How to define it using the LLVM API?
int num = newFun2(first);
return num;
}
My problem is to get an index of the array parameter using the LLVM API. Any ideas ?
Thank you so much
EDITTED
This is my code using the API:
llvm::LLVMContext &context = llvm::getGlobalContext();
llvm::Module *module = new llvm::Module("AST", context);
llvm::IRBuilder<> builder(context);
//newFun2
llvm::FunctionType *newFunc2Type = llvm::FunctionType::get(builder.getInt32Ty(), builder.getInt32Ty(), false);
llvm::Function *newFunc2 = llvm::Function::Create(newFunc2Type, llvm::Function::ExternalLinkage, "newFun2", module);
llvm::Function::arg_iterator argsFun2 = newFunc2->arg_begin();
llvm::Value* x = argsFun2++;
x->setName("x");
llvm::BasicBlock* block = llvm::BasicBlock::Create(context, "entry", newFunc2);
llvm::IRBuilder<> builder2(block);
llvm::Value* tmp = builder2.CreateBinOp(llvm::Instruction::Add,
x, x, "tmp");
builder2.CreateRet(tmp);
//newFun
llvm::FunctionType *newFuncType = llvm::FunctionType::get(builder.getInt32Ty(), builder.getInt32Ty()->getPointerTo(), false);
llvm::Function *newFunc = llvm::Function::Create(newFuncType, llvm::Function::ExternalLinkage, "newFun", module);
llvm::BasicBlock* block2 = llvm::BasicBlock::Create(context, "entry", newFunc);
llvm::IRBuilder<> builder3(block2);
module->dump();
And this is the LLVM IR that is generated :
; ModuleID = 'AST'
define i32 #newFun2(i32 %x) {
entry:
%tmp = add i32 %x, %x
ret i32 %tmp
}
define i32 #newFun(i32*) {
entry:
}
I am stuck on the body of newFun because of the array access.
I think that you first need to understand how the IR should look like. It can be done by peering into the language specification or by using Clang to compile the C code into IR and taking a look at the result.
In any case, the way to access an array element at a given index is either with extractvalue (which only accepts constant indices) or with a gep. Both of these have corresponding constructors / factory methods and IRBuilder methods to construct them, for example
builder.CreateExtractValue(y, 3);
Creating a gep is a little more complicated; I recommend taking a look at the gep guide.
However, a good way to see how to call the LLVM API to create the desired IR is to use llc (one of the LLVM command-line tools) to generate a source file with those calls itself from an IR file, see these two related questions:
Possible to auto-generate llvm c++ api code from LLVM-IR?
Generate LLVM C++ API code as backend

libclang get primitive value

How can I get the value of a primitive literal using libclang?
For example, if I have a CXCursor of cursor kind CXCursor_IntegerLiteral, how can I extract the literal value.
UPDATE:
I've run into so many problems using libclang. I highly recommend avoiding it entirely and instead use the C++ interface clang provides. The C++ interface is highly useable and very well documented: http://clang.llvm.org/doxygen/annotated.html
The only purpose I see of libclang now is to generate the ASTUnit object for you as with the following code (it's not exactly easy otherwise):
ASTUnit * astUnit;
{
index = clang_createIndex(0, 0);
tu = clang_parseTranslationUnit(
index, 0,
clangArgs, nClangArgs,
0, 0, CXTranslationUnit_None
);
astUnit = static_cast<ASTUnit *>(tu->TUData);
}
Now you might say that libclang is stable and the C++ interface isn't. That hardly matters, as the time you spend figuring out the AST with libclang and creating kludges with it wastes so much of your time anyway. I'd just as soon spend a few hours fixing up code that does not compile after a version upgrade (if even needed).
Instead of reparsing the original, you already have all the information you need inside the translation unit :
if (kind == CXCursor_IntegerLiteral)
{
CXSourceRange range = clang_getCursorExtent(cursor);
CXToken *tokens = 0;
unsigned int nTokens = 0;
clang_tokenize(tu, range, &tokens, &nTokens);
for (unsigned int i = 0; i < nTokens; i++)
{
CXString spelling = clang_getTokenSpelling(tu, tokens[i]);
printf("token = %s\n", clang_getCString(spelling));
clang_disposeString(spelling);
}
clang_disposeTokens(tu, tokens, nTokens);
}
You will see that the first token is the integer itself, the next one is not relevant (eg. it's ; for int i = 42;.
If you have access to a CXCursor, you can make use of the clang_Cursor_Evaluate function, for example:
CXChildVisitResult var_decl_visitor(
CXCursor cursor, CXCursor parent, CXClientData data) {
auto kind = clang_getCursorKind(cursor);
switch (kind) {
case CXCursor_IntegerLiteral: {
auto res = clang_Cursor_Evaluate(cursor);
auto value = clang_EvalResult_getAsInt(res);
clang_EvalResult_dispose(res);
std::cout << "IntegerLiteral " << value << std::endl;
break;
}
default:
break;
}
return CXChildVisit_Recurse;
}
Outputs:
IntegerLiteral 42
I found a way to do this by referring to the original files:
std::string getCursorText (CXCursor cur) {
CXSourceRange range = clang_getCursorExtent(cur);
CXSourceLocation begin = clang_getRangeStart(range);
CXSourceLocation end = clang_getRangeEnd(range);
CXFile cxFile;
unsigned int beginOff;
unsigned int endOff;
clang_getExpansionLocation(begin, &cxFile, 0, 0, &beginOff);
clang_getExpansionLocation(end, 0, 0, 0, &endOff);
ClangString filename = clang_getFileName(cxFile);
unsigned int textSize = endOff - beginOff;
FILE * file = fopen(filename.c_str(), "r");
if (file == 0) {
exit(ExitCode::CANT_OPEN_FILE);
}
fseek(file, beginOff, SEEK_SET);
char buff[4096];
char * pBuff = buff;
if (textSize + 1 > sizeof(buff)) {
pBuff = new char[textSize + 1];
}
pBuff[textSize] = '\0';
fread(pBuff, 1, textSize, file);
std::string res(pBuff);
if (pBuff != buff) {
delete [] pBuff;
}
fclose(file);
return res;
}
You can actually use a combination of libclang and the C++ interface.
The libclang CXCursor type contains a data field which contains references to the underlying AST nodes.
I was able to successfully access the IntegerLiteral value by casting data[1] to the IntegerLiteral type.
I'm implementing this in Nim so I will provide Nim code, but you can likely do the same in C++.
let literal = cast[clang.IntegerLiteral](cursor.data[1])
echo literal.getValue().getLimitedValue()
The IntegerLiteral type is wrapped like so:
type
APIntObj* {.importcpp: "llvm::APInt", header: "llvm/ADT/APInt.h".} = object
# https://github.com/llvm-mirror/llvm/blob/master/include/llvm/ADT/APInt.h
APInt* = ptr APIntObj
IntegerLiteralObj* {.importcpp: "clang::IntegerLiteral", header: "clang/AST/Expr.h".} = object
IntegerLiteral* = ptr IntegerLiteralObj
proc getValue*(i: IntegerLiteral): APIntObj {.importcpp: "#.getValue()".}
# This is implemented by the superclass: https://clang.llvm.org/doxygen/classclang_1_1APIntStorage.html
proc getLimitedValue*(a: APInt | APIntObj): culonglong {.importcpp: "#.getLimitedValue()".}
Hope this helps someone :)

LLVM JIT segfaults. What am I doing wrong?

It is probably something basic because I am just starting to learn LLVM..
The following creates a factorial function and tries to git and execute it (I know the generated func is correct because I was able to static compile and execute it).
But I get segmentation fault upon execution of the function (in EE->runFunction(TheF, Args))
#include "llvm/Module.h"
#include "llvm/Function.h"
#include "llvm/PassManager.h"
#include "llvm/CallingConv.h"
#include "llvm/Analysis/Verifier.h"
#include "llvm/Assembly/PrintModulePass.h"
#include "llvm/Support/IRBuilder.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/ExecutionEngine/JIT.h"
#include "llvm/ExecutionEngine/GenericValue.h"
using namespace llvm;
Module* makeLLVMModule() {
// Module Construction
LLVMContext& ctx = getGlobalContext();
Module* mod = new Module("test", ctx);
Constant* c = mod->getOrInsertFunction("fact64",
/*ret type*/ IntegerType::get(ctx,64),
IntegerType::get(ctx,64),
/*varargs terminated with null*/ NULL);
Function* fact64 = cast<Function>(c);
fact64->setCallingConv(CallingConv::C);
/* Arg names */
Function::arg_iterator args = fact64->arg_begin();
Value* x = args++;
x->setName("x");
/* Body */
BasicBlock* block = BasicBlock::Create(ctx, "entry", fact64);
BasicBlock* xLessThan2Block= BasicBlock::Create(ctx, "xlst2_block", fact64);
BasicBlock* elseBlock = BasicBlock::Create(ctx, "else_block", fact64);
IRBuilder<> builder(block);
Value *One = ConstantInt::get(Type::getInt64Ty(ctx), 1);
Value *Two = ConstantInt::get(Type::getInt64Ty(ctx), 2);
Value* xLessThan2 = builder.CreateICmpULT(x, Two, "tmp");
//builder.CreateCondBr(xLessThan2, xLessThan2Block, cond_false_2);
builder.CreateCondBr(xLessThan2, xLessThan2Block, elseBlock);
/* Recursion */
builder.SetInsertPoint(elseBlock);
Value* xMinus1 = builder.CreateSub(x, One, "tmp");
std::vector<Value*> args1;
args1.push_back(xMinus1);
Value* recur_1 = builder.CreateCall(fact64, args1.begin(), args1.end(), "tmp");
Value* retVal = builder.CreateBinOp(Instruction::Mul, x, recur_1, "tmp");
builder.CreateRet(retVal);
/* x<2 */
builder.SetInsertPoint(xLessThan2Block);
builder.CreateRet(One);
return mod;
}
int main(int argc, char**argv) {
long long x;
if(argc > 1)
x = atol(argv[1]);
else
x = 4;
Module* Mod = makeLLVMModule();
verifyModule(*Mod, PrintMessageAction);
PassManager PM;
PM.add(createPrintModulePass(&outs()));
PM.run(*Mod);
// Now we going to create JIT
ExecutionEngine *EE = EngineBuilder(Mod).create();
// Call the function with argument x:
std::vector<GenericValue> Args(1);
Args[0].IntVal = APInt(64, x);
Function* TheF = cast<Function>(Mod->getFunction("fact64")) ;
/* The following CRASHES.. */
GenericValue GV = EE->runFunction(TheF, Args);
outs() << "Result: " << GV.IntVal << "\n";
delete Mod;
return 0;
}
Edit:
The correct way to enable JIT (see the accepted answer below):
1.#include "llvm/ExecutionEngine/Jit.h"`
2.InitializeNativeTarget();
I would bet that the ExecutionEngine pointer is null.... You are missing a call to InitializeNativeTarget, the documentation says:
InitializeNativeTarget - The main program should call this function to initialize the native target corresponding to the host. This is useful for JIT applications to ensure that the target gets linked in correctly.
Since there is no JIT compiler available without calling InitializeNativeTarget, ModuleBuilder selects the interpreter (if available). Probably not what you wanted. You may want to look at my previous post on this subject.
#include "llvm/ExecutionEngine/Interpreter.h"
Including that header (llvm/ExecutionEngine/Interpreter.h) forces a static initialisation of the JIT. Not the best design decision, but at least it works.