Outputting input's constant char array from llvm pass - llvm

all
I want to know how a llvm pass output constant char array defined from the input
source. Here's an example that I want to do.
Test input source
char* msg = "hello, world\n";
void msg_out(char * in) {
printf("msg: %s \n", in);
}
main () {
...
msg_out(msg);
...
}
llvm pass snippet
...
const CallInst* ci = dyn_cast<CallInst>(val);
const Function* func = ci->getCalledFunction();
if (func->getName() == "msg_out") {
errs() << ci->getOperand(0);
}
...
With the source, the above llvm pass would print the following output.
output
i8* getelementptr inbounds ([8 x i8]* #10, i32 0, i32 0)
However, what I want to implement instead is
identify the 1st argument is a constant character array
if so, print out "hello, world\n"
Can anyone let me know how to implement this?
Thanks a lot for your help in advance!
/Kangkook

First of all, the first argument isn't a constant character array; it's a pointer to one, hence the getelementptr (gep). In any case, the proper way to do this is to dereference the gep's pointer, verify it's a global, then get its initializer. In your case (and since the gep is actually a constant expression), it should look like this:
Value* op0 = ci->getOperand(0);
if (GetElementPtrConstantExpr* gep = dyn_cast<GetElementPtrConstantExpr>(op0)) {
if (GlobalVariable* global = dyn_cast<GlobalVariable>(gep->getOperand(0))) {
if (ConstantDataArray* array = dyn_cast<ConstantDataArray>(global->getInitializer())) {
if (array->isCString()) return array->getAsCString();
}
}
}

Related

Writing LLVM int/string input

I am trying to generate llvm-ir from AST.
For displaying the integer output I added,
Constant *CalleeF = TheModule->getOrInsertFunction("printf",FunctionType::get(IntegerType::getInt32Ty(Context), PointerType::get(Type::getInt8Ty(Context), 0), true);`
And while calling print function I wrote,
Value* PrintStmt::codegen(){
Value* V,*val,*to_print;
vector<Value *> ArgsV;
for (unsigned int i = 0, e = outs.size(); i != e; ++i){
to_print = outs[i]->codegen();
if(outs[i]->type=="int"){
val=Builder.CreateGlobalStringPtr("%d");
}
ArgsV.push_back(val);
ArgsV.push_back(to_print);
V = Builder.CreateCall(CalleeF, ArgsV, "printfCall");
}
return V;
}
What similar code should I write for getting input from user, i.e for scanf call?
For a scanf call you could first declare its prototype
llvm::FunctionType *readFnType = llvm::FunctionType::get(builder.getInt32Ty(), true);
llvm::Function* readfn = llvm::Function::Create(readFnType, llvm::GlobalValue::ExternalLinkage, "scanf", TheModule));
And call it like so
std::vector<Value*> ArgsV; // holds codegen IR for each argument
std::string StringFormat; // holds string formatting for all arguments
for(auto& arg : Args){
if(auto v = arg->codegen()){
if(v->getType()->isDoubleTy())
StringFormat += "%lf "; //
else if(v->getType()->isIntegerTy())
StringFormat += "%d ";
ArgsV.push_back(symbolTable[arg.name]);
}else return nullptr;
}
ArgsV.insert(ArgsV.begin(), builder.CreateGlobalStringPtr(StringFormat));
return builder.CreateCall(TheModule->getFunction("scanf"), ArgsV, "scanfCall");
Where symbolTable is a map of variable/argument names to a Value* holding the variable's stack allocated address. Recall that scanf takes the address of the variable to be written to, which explains the symbol table lookup.
It is also worth mentioning that this makes scanf inherently unsafe. You should consider using the fgets and gets functions instead.

Traversal of LLVM Operands

Using a ModulePass, my goal is to traverse a SSA graph upwards: going from one Statement with 0..2 operands (most opcodes fall under that), I want to find out two things:
Is an operand a metadata / constant (easy: just try casting to Constant-Type) or a variable?
If it is a variable, get me the Statement where it is defined (as LLVM IR is in SSA form, this is a well-formed query), so I can recursively continue this traversal.
As an example, assume following LLVM IR:
define i32 #mul_add(i32 %x, i32 %y, i32 %z) {
entry:
%tmp = mul i32 %x, %y
%tmp2 = add i32 %tmp, %z
ret i32 %tmp2
}
I will be starting with the return statement. Now I want to find out what I'm returning:
I will get myself the operands of the return statement
I detect that the operand is a variable called %tmp2
I will get myself the operands of the statement where %tmp2 is defined
I will first traverse the first operand, %tmp
(...)
%x is a function parameter and therefore probably the end of my traversal (or is it?)
(... continue with the other branches in this DFS ...)
How would I use the C++ API to implement those steps?
The solution is simple and I will describe it as generic as possible.
Each Operand of an llvm::Instruction in LLVM is of supertype llvm::Value. A subtype of Value is llvm::Instruction. This allows recursion via following methods:
bool runOnInstruction(llvm::Instruction *instruction) {
bool flag = false;
for (auto operand = instruction->operands().begin();
operand != instruction->operands().end(); ++operand) {
printOperandStats(operand->get());
flag = runOnOperand(operand->get()) | flag;
}
return flag;
}
bool runOnOperand(llvm::Value *operand) {
operand->printAsOperand(errs(), true);
// ... do something else ...
auto *instruction = dyn_cast<llvm::Instruction>(operand);
if (nullptr != instruction) {
return runOnInstruction(instruction);
} else {
return false;
}
}
This equals a DFS as required by the question. Each Operand will be cast to an Instruction and will be recursively analyzed. The boolean return value is used as usual for LLVM Passes: a value of true describes a modification to the IR.

LLVM How to get return value of an instruction

I have a program which allocates memory from stack like this:
%x = alloca i32, align 4
In my pass I want to get the actual memory pointer that points to this allocated memory at runtime. This should be %x. How do I get the pointer in my pass?
Instruction* I;
if (AllocaInst* AI = dyn_cast<AllocaInst>(I)) {
//How to get %x?
}
You can work with an Instruction* as a Value* (and Instruction inherits from Value), then you are working with the result / return value of that instruction. I have adapted some code from my LLVM Pass to demonstrate allocating space using alloca and then storing into that location. Notice that the results of the instructions can be directly passed to other instructions, as they are values.
// M is the module
// ci is the current instruction
LLVMContext &ctx = M.getContext();
Type* int32Ty = Type::getInt32Ty(ctx);
Type* int8Ty = Type::getInt8Ty(ctx);
Type* voidPtrTy = int8Ty->getPointerTo();
// Get an identifier for rand()
Constant* = M.getOrInsertFunction("rand", FunctionType::get(cct.int32Ty, false));
// Construct the struct and allocate space
Type* strTy[] = {int32Ty, voidPtrTy};
Type* t = StructType::create(strTy);
Instruction* nArg = new AllocaInst(t, "Wrapper Struct", ci);
// Add Store insts here
Value* gepArgs[2] = {ConstantInt::get(int32Ty, 0), ConstantInt::get(int32Ty, 0)};
Instruction* prand = GetElementPtrInst::Create(NULL, nArg, ArrayRef<Value*>(gepArgs, 2), "RandPtr", ci);
// Get a random number
Instruction* tRand = CallInst::Create(getRand, "", ci);
// Store the random number into the struct
Instruction* stPRand = new StoreInst(tRand, prand, ci);
If you want to store or load to %x you just use a store or lid instruction
If you want the numeric value of your pointer, use the ptrtoint instruction.

How strtok_r function return values?

I am doing component test for a 'C' code. I have read the functionality of strtok_r function but I am not able to get the return value that I want to pass in strncmp' function. My code is contains strtok_r and strncmp functions as below:
typedef struct BufferN {
uint32_t v;
uint32_t m;
} My_Buffer;
char subsystemstr[64] = { '\0' };
My_Buffer buffer;
char *p_system;
char *p_subsystem;
(void) GetString(&buffer, subsystemstr, sizeof(subsystemstr));
p_system = strtok_r (subsystemstr, ":", &p_subsystem);
for (i = 0u; i < 100; i++)
{
if (strncmp(p_system, "all", 64) == 0)
{
/*Some Code Statement*/
}
}
Since array subsystemstr is initialized to '\0', I am modifying this array value with the help of function GetString as below:
strncpy(subsystemstr, "all:", 64);
When I am printing subsystemstr, I am having updated array as:
["all:", '\0' <repeats 59 times>]
but when I am printing p_system(return value of strtok_r). I am getting
[0x388870 ""]
I am confused how it is working. Actually I want value of p_system = "all" so that 'strncmp' function can return 0.
Please suggest.
I suspect your understanding of what
p p_system
actually does (prints the address of p_system)
in gdb, the command would be
p *p_system
or, using the builtin printf command
printf "%s", p_system
or, using the C function
call printf("%s", p_system)
or,
call (void)puts(p_system)
or, if you do not mind also seeing some address values
x /s p_system

Python ctypes: initializing c_char_p()

I wrote a simple C++ program to illustrate my problem:
extern "C"{
int test(int, char*);
}
int test(int i, char* var){
if (i == 1){
strcpy(var,"hi");
}
return 1;
}
I compile this into an so. From python I call:
from ctypes import *
libso = CDLL("Debug/libctypesTest.so")
func = libso.test
func.res_type = c_int
for i in xrange(5):
charP = c_char_p('bye')
func(i,charP)
print charP.value
When I run this, my output is:
bye
hi
hi
hi
hi
I expected:
bye
hi
bye
bye
bye
What am I missing?
Thanks.
The string which you initialized with the characters "bye", and whose address you keep taking and assigning to charP, does not get re-initialized after the first time.
Follow the advice here:
You should be careful, however, not to
pass them to functions expecting
pointers to mutable memory. If you
need mutable memory blocks, ctypes has
a create_string_buffer function which
creates these in various ways.
A "pointer to mutable memory" is exactly what your C function expects, and so you should use the create_string_buffer function to create that buffer, as the docs explain.
I am guessing python is reusing the same buffer for all 5 passes. once you set it to "hi", you never set it back to "bye" You can do something like this:
extern "C"{
int test(int, char*);
}
int test(int i, char* var){
if (i == 1){
strcpy(var,"hi");
} else {
strcpy(var, "bye");
}
return 1;
}
but be careful, strcpy is just asking for a buffer overflow