llvm load big structures - llvm

I'm new to llvm IR and was wondering if you can use the load instruction to load relatively big structures. The documentation says something about restriction to first class types, however the sample code below compiled fine, will it also behave correctly?
%MyStruct = type { i32, i32, i64, i64 }
define void #my_func(%MyStruct *) local_unnamed_addr {
%2 = load %MyStruct, %MyStruct* %0
ret void
}

Related

How to manipulate function pointers at compile time, and make calls to them, in LLVM IR

Simply put, I am trying to create an LLVM function pass that obfuscates a function's execution.
The idea is that we:
Clone the function
Hollow out the original function
Get a pointer to the clone function
Encrypt the pointer at compile-time (as simple as an addition or subtraction at this stage)
Create a set of instructions inside the original function that decrypt the pointer at run-time, before calling it.
The IR below is a result of the pass running on a test C program's 'main' function. The function's original code (clone function) is irrelevant.
main_clone_ptr's creation:
cloneFnPtr = m.getOrInsertGlobal("main_clone_ptr", PointerType::get(cloneFn->getFunctionType(), 0));
cloneFnPtr->setInitializer(cloneFn);
The IR:
; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 #main(i32 noundef %0, i8** noundef %1) #0
{
%3 = add i32 (i32, i8**)** #main_clone_ptr, i32 -149 // Decrypt the pointer
br label %callfn
callfn:
%5 = load i32 (i32, i8**)*, i32 (i32, i8**)** %3, align 8 // Load the address
%6 = call i32 %5() // Call
br label %end
end:
ret i32 %6
}
The issue is that main_clone_ptr still holds the original address of main, as I haven't figured out a way to modify the address at compile-time.. so, to my question; is this even possible? if so, how do I get that address and modify it when I need to.
I've tried something like this:
Constant* initializer = cloneFnPtr->getInitializer();
const ConstantInt* constInt = cast<ConstantInt>(initializer);
// Get value
uint64_t constIntValue = constInt->getZExtValue();
// Encrypt the value
constIntValue -= modifierVal; // constIntValue - (-149)
// Set the value back
cloneFnPtr->setInitializer(ConstantInt::get(builder.getInt64Ty(), constIntValue));
However, this doesn't seem to work at all. I have a feeling I may am doing multiple things wrong here, however I'm not sure, as I'm not well versed in LLVM knowledge.
The address of the function is finally resolved via a multi-stage process that involves a linker at the compilation time and dynamic linker at application loading time. So, no in general you cannot do it this way on LLVM IR level simply because the final function address is not quite constant at this stage

Extract temporary from LLVM callInst

From the following example call:
call void %4(%class.EtherAppReq* %2, i64 %5)
I want to extract the temporary %4 to pass it as an argument of another function. To do this, I need it as Value class object. How could I do it?
Value *target = call->getCalledValue();
Value *args[] = {point, target};
Builder.CreateCall(func, args);
It caused a segmentation fault because of target.
CallInst::getCalledValue() to get a pointer to %4.
What I did is attempt to getCalledFunctionFirst, if NULL then getCalledValue and stripPointerCasts, if still fails then bail out or skip this one

create type for Eigen::Matrix in llvm

I'm trying to create a type, using llvm c++ api, for a Eigen::Matrix <complex<double>,Dynamic, 1> parameter.
Anyone knows how to do this?
I wrote simple sample c++ code and exposed llvm IR for it and found following lines at the beginning:
%"class.Eigen::Matrix" = type { %"class.Eigen::PlainObjectBase" }
%"class.Eigen::PlainObjectBase" = type { %"class.Eigen::DenseStorage" }
%"class.Eigen::DenseStorage" = type { i32*, i64 }
%"class.Eigen::DenseBase" = type { i8 }
%"class.Eigen::DenseCoeffsBase.0" = type { i8 }
%"struct.Eigen::EigenBase" = type { i8 }
%"class.Eigen::MatrixBase" = type { i8 }
%"struct.Eigen::internal::special_scalar_op_base" = type { i8 }
So I guess what I need is a type{ type{ type{ i32*, i64 } } }??
Thanks!
The precise layout of a type is specified by C++ ABI. So, you may be (or may be not) lucky with defining stuff by hands....
Use LLVM's cpp backend to generate the C++ API code which will generate the given IR. This is the easiest way.

*Value is not being generated into the LLVM code

I am attempting to write some compiler and use LLVM to generate intermediate code. Unfortunately, LLVM documentation is not very great and even somewhat confusing.
At the moment I have lexer,grammar and AST implemented. I was also following some examples found on Internet. My current AST works as follows: it has the abstract base class Tree*, from which other trees inherit (so, like one for variable definition, one for statement list, one for binary expression etc.).
I am trying to implement the variable definition, so for the input
class Test{
int main()
{
int x;
}
}
I want LLVM output to be:
; ModuleID = "Test"
define i32 #main() {
entry:
%x = alloca i32
return i32 0
}
However, right now I can get %x = alloca i32 part to the part where main function is created, but the actual output is missing the %x = alloca i32. So, the output I'm getting is as follows:
; ModuleID = "Test"
define i32 #main() {
entry:
return i32 0
}
my Codegen() for variable declaration is shown bellow (symbol table for now is just a list, I am trying to keep things as simple as possible at the moment):
llvm::Value *decafStmtList::Codegen() {
string name = SyandTy.back(); // Just a name of a variable
string type = SyandTy.front(); // and its type in string format
Type* typeVal = getLLVMType(decafType(str2DecafType(type))); // get LLVM::*Type representation
llvm::AllocaInst *Alloca = Builder.CreateAlloca(typeVal, 0, name.c_str());
Value *V = Alloca;
return Alloca;//Builder.CreateLoad(V, name.c_str());
}
The part where I am generating my #main is as follows:
Note: I have commented out the print_int function (this is the function I will use later to print things, but for now I don't need it). If I'll uncomment the print_int function, TheFunction will not pass verifier(TheFunction) -> complains about module being broken and parameters not matching the signature.
Function *gen_main_def(llvm::Value *RetVal, Function *print_int) {
if (RetVal == 0) {
throw runtime_error("something went horribly wrong\n");
}
// create the top-level definition for main
FunctionType *FT = FunctionType::get(IntegerType::get(getGlobalContext(), 32), false);
Function *TheFunction = Function::Create(FT, Function::ExternalLinkage, "main", TheModule);
if (TheFunction == 0) {
throw runtime_error("empty function block");
}
// Create a new basic block which contains a sequence of LLVM instructions
BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
// All subsequent calls to IRBuilder will place instructions in this location
Builder.SetInsertPoint(BB);
/*
Function *CalleeF = TheModule->getFunction(print_int->getName());
if (CalleeF == 0) {
throw runtime_error("could not find the function print_int\n");
}*/
// print the value of the expression and we are done
// Value *CallF = Builder.CreateCall(CalleeF, RetVal, "calltmp");
// Finish off the function.
// return 0 from main, which is EXIT_SUCCESS
Builder.CreateRet(ConstantInt::get(getGlobalContext(), APInt(32, 0)));
return TheFunction;
}
If someone knows why my Alloca object is not being generated, please help me out - any hints will be greatly appreciated.
Thank you
EDIT:
Codegen is called from the grammar:
start: program
program: extern_list decafclass
{
ProgramAST *prog = new ProgramAST((decafStmtList *)$1, (ClassAST *)$2);
if (printAST) {
cout << getString(prog) << endl;
}
Value *RetVal = prog->Codegen();
delete $1; // get rid of abstract syntax tree
delete $2; // get rid of abstract syntax tree
// we create an implicit print_int function call to print
// out the value of the expression.
Function *print_int = gen_print_int_def();
Function *TheFunction = gen_main_def(RetVal, print_int);
verifyFunction(*TheFunction);
}
EDIT: I figured it out, basically the createAlloca has to be called after the basicblock when generating main;
There are two weird things here:
All you do is call Builder.CreateRet... I don't see how there could be any code in main unless you call something that creates the corresponding instructions. In particular, you never seem to call the CodeGen part.
You pass a size of zero to CreateAlloc. I think the size should be one for a single variable.
Also, make sure that you don't call any LLVM optimization passes after generating your code. Those passes would optimize the value away (it's never used, thus dead code).

Handling Apache Thrift list/map Return Types in C++

First off, I'll say I'm not the most competent C++ programmer, but I'm learning, and enjoying the power of Thrift.
I've implemented a Thrift Service with some basic functions that return void, i32, and list. I'm using a Python client controlled by a Django web app to make RPC calls and it works pretty well. The generated code is pretty straight forward, except for list returns:
.thrift description file:
namespace cpp Remote
enum N_PROTO {
N_TCP,
N_UDP,
N_ANY
}
service Rcon {
i32 ping()
i32 KillFlows()
i32 RestartDispatch()
i32 PrintActiveFlows()
i32 PrintActiveListeners(1:i32 proto)
list<string> ListAllFlows()
}
The generated signatures from Rcon.h:
int32_t ping();
int32_t KillFlows();
int32_t RestartDispatch();
int32_t PrintActiveFlows();
int32_t PrintActiveListeners(const int32_t proto);
int64_t ListenerBytesReceived(const int32_t id);
void ListAllFlows(std::vector<std::string> & _return);
As you see, the ListAllFlows() function generated takes a reference to a vector of strings. I guess I expect it to return a vector of strings as laid out in the .thrift description.
I'm wondering if I am meant to provide the function a vector of strings to modify and then Thrift will handle returning it to my client despite the function returning void.
I can find absolutely no resources or example usages of Thrift list<> types in C++. Any guidance would be appreciated.
Ok, I've figured it out. It's pretty simple really.
void ListAllFlows(std::vector<std::string> & _return)
{
for(int x = 0; x < 5; x++)
{
_return.push_back(std::string("hi"));
}
}
Then the Python client just calls it as it looks in the .thrift file:
result = client.ListAllFlows()
print result # ['hi', 'hi', 'hi', 'hi', 'hi']
Returning a container class directly works, but has a few drawbacks which I outlined in more detail over here.