EDITED:
Me and my colleague are very new to clang and llvm .
I have three functions..
function 1{}
function 2{}
function 3{}
Is there anyway to swap the functions to
function 3{}
function 2{}
function 1{}
using clang libtooling / rewriter and print out the function name and also the parameter inside the function ?
You can parse the AST first using an ASTConsumer. You get one function's AST at a time, you might want to store it globally somewhere and then you can add them to the clang's REWRITTER API and finally dump the buffer back to a file.
This is an example of editing some AST nodes and writing back to file. In your case, you won't edit the AST's but just rearrange the buffer push calls to rearrange the functions.
In VisitFunctionDecl:
bool VisitFunctionDecl(FunctionDecl *f) {
// Only function definitions (with bodies), not declarations.
if (f->hasBody()) {
//store in a global array/vector
}
return true;
}
In main after ParseAST and before writing to file, you will do a rearrangeFunctionDecls.
Related
PROBLEM:
I currently have a traditional module instrumentation pass that
inserts new function calls into a given IR according to some logic
(inserted functions are external from a small lib that is later linked
to given program). Running experiments, my overhead is from
the cost of executing a function call to the library function.
What I am trying to do:
I would like to inline these function bodies into the IR of
the given program to get rid of this bottleneck. I assume an intrinsic
would be a clean way of doing this, since an intrinsic function would
be expanded to its function body when being lowered to ASM (please
correct me if my understanding is incorrect here, this is my first
time working with intrinsics/LTO).
Current Status:
My original library call definition:
void register_my_mem(void *user_vaddr){
... C code ...
}
So far:
I have created a def in: llvm-project/llvm/include/llvm/IR/IntrinsicsX86.td
let TargetPrefix = "x86" in {
def int_x86_register_mem : GCCBuiltin<"__builtin_register_my_mem">,
Intrinsic<[], [llvm_anyint_ty], []>;
}
Added another def in:
otwm/llvm-project/clang/include/clang/Basic/BuiltinsX86.def
TARGET_BUILTIN(__builtin_register_my_mem, "vv*", "", "")
Added my library source (*.c, *.h) to the compiler-rt/lib/test_lib
and added to CMakeLists.txt
Replaced the function insertion with trying to insert the intrinsic
instead in: llvm/lib/Transforms/Instrumentation/myModulePass.cpp
WAS:
FunctionCallee sm_func =
curr_inst->getModule()->getOrInsertFunction("register_my_mem",
func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
NEW:
Intrinsic::ID aREGISTER(Intrinsic::x86_register_my_mem);
Function *sm_func = Intrinsic::getDeclaration(currFunc->getParent(),
aREGISTER, func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
Questions:
If my logic for inserting the intrinsic functions shouldnt be a
module pass, where do i put it?
Am I confusing LTO with intrinsics?
Do I put my library function definitions into the following files as mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114322.html as for example EmitRegisterMyMem()?
clang/lib/CodeGen/CodeGenFunction.cpp - define llvm::Instrinsic::ID
clang/lib/CodeGen/CodeGenFunction.h - declare llvm::Intrinsic::ID
My LLVM compiles, so it is semantically correct, but currently when
trying to insert this function call, LLVM segfaults saying "Not a valid type for function argument!"
I'm seeing multiple issues here.
Indeed, you're confusing LTO with intrinsics. Intrinsics are special "functions" that are either expanded into special instructions by a backend or lowered to library function calls. This is certainly not something you're going to achieve. You don't need an intrinsic at all, you'd just need to inline the function call in question: either by hands (from your module pass) or via LTO, indeed.
The particular error comes because you're declaring your intrinsic as receiving an integer argument (and this is how the declaration would look like), but:
asking the declaration of variadic intrinsic with invalid type (I'd assume your func_type is a non-integer type)
passing pointer argument
Hope this makes an issue clear.
See also: https://llvm.org/docs/LinkTimeOptimization.html
Thanks you for clearing up the issue #Anton Korobeynikov.
After reading your explanation, I also believe that I have to use LTO to accomplish what I am trying to do. I especially found this link very useful: https://llvm.org/docs/LinkTimeOptimization.html. It seems that I am now on a right path.
I am writing a LLVM pass which prints function name only if it is user-defined (which are defined by the user in the source file).
I cannot find any way to distinguish the user-defined function from the initialization function (or static constructors). I tried checking if the function is just declared or defined, but it does not work as there some init functions are defined (like __cxx_global_var_init).
At pass-time, I know of no way to accomplish what you're trying to do.
That said, Clang provides a way to determine this during initial compilation. See: clang::SourceManager::isInSystemHeader(). You would have to write a Clang plugin or a libTooling-based program to take advantage of this as the information is gone once opt is executed. Here is a contrived example of how to do so using an AST visitor:
bool VisitFunctionDecl(clang::FunctionDecl* funcDecl)
{
if (sourceManager.isInSystemHeader(funcDecl->getLocStart()))
{
return true;
}
}
This question already has answers here:
Listing Unused Symbols
(2 answers)
Closed 7 years ago.
How do I detect function definitions which are never getting called and delete them from the file and then save it?
Suppose I have only 1 CPP file as of now, which has a main() function and many other function definitions (function definition can also be inside main() ). If I were to write a program to parse this CPP file and check whether a function is getting called or not and delete if it is not getting called then what is(are) the way(s) to do it?
There are few ways that come to mind:
I would find out line numbers of beginning and end of main(). I can do it by maintaining a stack of opening and closing braces { and }.
Anything after main would be function definition. Then I can parse for function definitions. To do this I can parse it the following way:
< string >< open paren >< comma separated string(s) for arguments >< closing paren >
Once I have all the names of such functions as described in (2), I can make a map with its names as key and value as a bool, indicating whether a function is getting called once or not.
Finally parse the file once again to check for any calls for functions with their name as in this map. The function call can be from within main or from some other function. The value for the key (i.e. the function name) could be flagged according to whether a function is getting called or not.
I feel I have complicated my logic and it could be done in a smarter way. With the above logic it would be hard to find all the corner cases (there would be many). Also, there could be function pointers to make parsing logic difficult. If that's not enough, the function pointers could be typedefed too.
How do I go about designing my program? Are a map (to maintain filenames) and stack (to maintain braces) the right data structures or is there anything else more suitable to deal with it?
Note: I am not looking for any tool to do this. Nor do I want to use any library (if it exists to make things easy).
I think you should not try to build a C++ parser from scratch, becuse of other said in comments that is really hard. IMHO, you'd better start from CLang libraries, than can do the low-level parsing for you and work directly with the abstract syntax tree.
You could even use crange as an example of how to use them to produce a cross reference table.
Alternatively, you could directly use GNU global, because its gtags command directly generates definition and reference databases that you have to analyse.
IMHO those two ways would be simpler than creating a C++ parser from scratch.
The simplest approach for doing it yourself I can think of is:
Write a minimal parser that can identify functions. It just needs to detect the start and ending line of a function.
Programmatically comment out the first function, save to a temp file.
Try to compile the file by invoking the complier.
Check if there are compile errors, if yes, the function is called, if not, it is unused.
Continue with the next function.
This is a comment, rather than an answer, but I post it here because it's too long for a comment space.
There are lots of issues you should consider. First of all, you should not assume that main() is a first function in a source file.
Even if it is, there should be some functions header declarations before the main() so that the compiler can recognize their invocation in main.
Next, function's opening and closing brace needn't be in separate lines, they also needn't be the only characters in their lines. Generally, almost whole C++ code can be put in a single line!
Furthermore, functions can differ with parameters' types while having the same name (overloading), so you can't recognize which function is called if you don't parse the whole code down to the parameters' types. And even more: you will have to perform type lists matching with standard convertions/casts, possibly considering inline constructors calls. Of course you should not forget default parameters. Google for resolving overloaded function call, for example see an outline here
Additionally, there may be chains of unused functions. For example if a() calls b() and b() calls c() and d(), but a() itself is not called, then the whole four is unused, even though there exist 'calls' to b(), c() and d().
There is also a possibility that functions are called through a pointer, in which case you may be unable to find a call. Example:
int (*testfun)(int) = whattotest ? TestFun1 : TestFun2; // no call
int testResult = testfun(paramToTest); // unknown function called
Finally the code can be pretty obfuscated with #defineās.
Conclusion: you'll probably have to write your own C++ compiler (except the machine code generator) to achieve your goal.
This is a very rough idea and I doubt it's very efficient but maybe it can help you get started. First traverse the file once, picking out any function names (I'm not entirely sure how you would do this). But once you have those names, traverse the file again, looking for the function name anywhere in the file, inside main and other functions too. If you find more than 1 instance it means that the function is being called and should be kept.
I'm trying to experiment with llvm right now. I'd like to use languages that can be compiled to llvm bitcode for scripting. I've managed so far to load an llvm bitcode module and call a function defined in it from my 'internal' c++ code. I've next tried to expose a c++ function from my internal code to the jit'd code - so far in this effort I haven't managed to get anything but SEGFAULT.
My code is as follows. I've tried to create a Function and a global mapping in my execution engine that points to a function I'd like to call.
extern "C" void externGuy()
{
cout << "I'm the extern guy" << endl;
}
void ExposeFunction()
{
std::vector<Type*> NoArgs(0);
FunctionType* FT = FunctionType::get(Type::getVoidTy(getGlobalContext()), NoArgs, false);
Function* fnc = Function::Create(FT, Function::ExternalLinkage, "externGuy", StartModule);
JIT->addGlobalMapping(fnc, (void*)externGuy);
}
// ... Create module, create execution engine
ExposeFunction();
Is the problem that I can't add a function to the module after its been loaded from bitcode file?
Update:
I've refactored my code so that it reads like so instead:
// ... Create module, create execution engine
std::vector<Type*> NoArgs(0);
FunctionType* FT = FunctionType::get(Type::getVoidTy(getGlobalContext()), NoArgs, false);
Function* fnc = Function::Create(FT, Function::ExternalLinkage, "externGuy", m);
fnc->dump();
JIT->addGlobalMapping(fnc, (void*)externGuy);
So instead of segfault I get:
Program used external function 'externGuy' which could not be resolved
Also, the result of dump() prints:
declare void #externGuy1()
If I change my c++ script bitcode thing to call externGuy1() instead of externGuy() it will suggest to me that I meant to use the externGuy. The addGlobalMapping just doesn't seem to be working for me. I'm not sure what I'm missing here. I also added -fPIC to my compilation command like I saw suggested in another question - I'm honestly not sure if its helped anything but no harm in trying.
I finally got this to work. My suspicion is to maybe creating the function, as well as defining it in the script is causing perhaps more than 1 declaration of the functions name and the mapping just wasn't working out. What I did was define the function in the script and then use getFunction to use for mapping instead. Using the dump() method on the function output:
declare void #externGuy() #1
Which is why I think that had something to do with the mapping not working originally. I also remember the Kaleidoscope tutorial saying that getPointerToFunction would do the JIT compiling when its called if it isn't already done so I thought that I would have to do the mapping before this call.
So altogether to get this whole thing working it was as follows:
// Get the function and map it
Function* extrn = m->getFunction("externGuy");
extrn->dump();
JIT->addGlobalMapping(extrn, (void*)&::externGuy);
// Get a pointer to the jit compiled function
Function* mane = m->getFunction("hello");
void* fptr = JIT->getPointerToFunction(mane);
// Make a call to the jit compiled function which contains a call to externGuy
void (*FP)() = (void(*)())(intptr_t)fptr;
FP();
In my interpreter I have built-in functions available in the language like print exit input, etc.
These functions can obviously be accessed from inside the language. The interpreter then looks for the corresponding function with the right name in a vector and calls it via a pointer stored with its name.
So I gather all these functions in files like io.cpp, string.cpp, arithmetic.cpp. But I have to add every function to the function list in the interpreter in order for it to be found.
So in these function files I have things like:
void print( arg )
{
cout << arg.ToString;
}
I'd add this print function to the interpreter function list with:
interpreter.AddFunc( "print", print );
But where should I call the interpreter.AddFunc?
I can't just put it there below the print function as it has to be in a function according to the C++ syntax.
Where and how should all the functions be added to the list?
In each module (io, string, etc.), define a method that registers the module with the interpreter, e.g.:
void IOModule::Register(Interpreter &interpreter) {
interpreter.AddFunc( "print", print );
//...
}
This can also be a normal function if your module is not implemented in a class.
Then in your application's main initialization, call the register method of all modules.
This approach helps keep things modular: The main application initialization needs to know which modules exist, but the details of which functions are exported are left to the module itself.
The simplest is to keep a map of function names to function pointers and load that at program startup. You already have the functions linked into the interpreter executable, so they are accessible at the time main() is called.
You can also come up with a scheme where functions are defiled in the dynamic libraries (.dll or .so depending on the platform) and some configuration file maps function names to libraries/entry points.
Is everything included in every interpreter? If so, I would recommend adding it either in a constructor (assuming the interpreter is an object) or an init method.
If not, you may want to consider adding an "include" type directive in your language. Then you do it when you encounter the include directive.