I need to find all occurrences of a function call in a C++ file using python, and extract the arguments for each call.
I'm playing with the pygccxml package, and extracting the arguments given a string with the function call is extremely easy:
from pygccxml.declarations import call_invocation
def test_is_call_invocation(call):
if call_invocation.is_call_invocation(call):
print call_invocation.name(call)
for arg in call_invocation.args(call):
print " ",arg
else:
print "not a function invocation"
What I couldn't find is a way of getting the calls parsing a file:
from pygccxml import parser
from pygccxml import declarations
decls = parser.parse( ['main.cpp'] )
# ...
Is there a way to find the calls to a certain function using the pygccxml package?
Or maybe that package is an overkill for what I'm trying to do :) and there's a much simpler way? Finding the function calls with a regular expression is, I'm afraid, much trickier than it might look at a first sight...
XML-GCC can't do that, because it only reports the data types (and function signatures). It ignores the function bodies. To see that, create a.cc:
void foo()
{}
void bar()
{
foo();
}
and then run gccxml a.cc -fxml=a.xml. Look at the generated a.xml, to see that the only mentioning of foo (or its id) is in the declaration of foo.
An alternative might be available in codeviz (http://www.csn.ul.ie/~mel/projects/codeviz/). It consists of a patch to gcc 3.4.6 that generates call dependency information - plus some perl scripts that generate graphviz input; the latter you can safely ignore.
As yet another alternative (which doesn't need gcc modifications) you could copy the approach from egypt (http://www.gson.org/egypt/); this parses GCC RTL dumps. It should work with any recent GCC, however, it might be that you don't get calls to inline functions.
In any case, with these approaches, you won't get "calls" to macros, but that might be actually the better choice.
Related
PROBLEM:
I currently have a traditional module instrumentation pass that
inserts new function calls into a given IR according to some logic
(inserted functions are external from a small lib that is later linked
to given program). Running experiments, my overhead is from
the cost of executing a function call to the library function.
What I am trying to do:
I would like to inline these function bodies into the IR of
the given program to get rid of this bottleneck. I assume an intrinsic
would be a clean way of doing this, since an intrinsic function would
be expanded to its function body when being lowered to ASM (please
correct me if my understanding is incorrect here, this is my first
time working with intrinsics/LTO).
Current Status:
My original library call definition:
void register_my_mem(void *user_vaddr){
... C code ...
}
So far:
I have created a def in: llvm-project/llvm/include/llvm/IR/IntrinsicsX86.td
let TargetPrefix = "x86" in {
def int_x86_register_mem : GCCBuiltin<"__builtin_register_my_mem">,
Intrinsic<[], [llvm_anyint_ty], []>;
}
Added another def in:
otwm/llvm-project/clang/include/clang/Basic/BuiltinsX86.def
TARGET_BUILTIN(__builtin_register_my_mem, "vv*", "", "")
Added my library source (*.c, *.h) to the compiler-rt/lib/test_lib
and added to CMakeLists.txt
Replaced the function insertion with trying to insert the intrinsic
instead in: llvm/lib/Transforms/Instrumentation/myModulePass.cpp
WAS:
FunctionCallee sm_func =
curr_inst->getModule()->getOrInsertFunction("register_my_mem",
func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
NEW:
Intrinsic::ID aREGISTER(Intrinsic::x86_register_my_mem);
Function *sm_func = Intrinsic::getDeclaration(currFunc->getParent(),
aREGISTER, func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
Questions:
If my logic for inserting the intrinsic functions shouldnt be a
module pass, where do i put it?
Am I confusing LTO with intrinsics?
Do I put my library function definitions into the following files as mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114322.html as for example EmitRegisterMyMem()?
clang/lib/CodeGen/CodeGenFunction.cpp - define llvm::Instrinsic::ID
clang/lib/CodeGen/CodeGenFunction.h - declare llvm::Intrinsic::ID
My LLVM compiles, so it is semantically correct, but currently when
trying to insert this function call, LLVM segfaults saying "Not a valid type for function argument!"
I'm seeing multiple issues here.
Indeed, you're confusing LTO with intrinsics. Intrinsics are special "functions" that are either expanded into special instructions by a backend or lowered to library function calls. This is certainly not something you're going to achieve. You don't need an intrinsic at all, you'd just need to inline the function call in question: either by hands (from your module pass) or via LTO, indeed.
The particular error comes because you're declaring your intrinsic as receiving an integer argument (and this is how the declaration would look like), but:
asking the declaration of variadic intrinsic with invalid type (I'd assume your func_type is a non-integer type)
passing pointer argument
Hope this makes an issue clear.
See also: https://llvm.org/docs/LinkTimeOptimization.html
Thanks you for clearing up the issue #Anton Korobeynikov.
After reading your explanation, I also believe that I have to use LTO to accomplish what I am trying to do. I especially found this link very useful: https://llvm.org/docs/LinkTimeOptimization.html. It seems that I am now on a right path.
This question already has answers here:
Listing Unused Symbols
(2 answers)
Closed 7 years ago.
How do I detect function definitions which are never getting called and delete them from the file and then save it?
Suppose I have only 1 CPP file as of now, which has a main() function and many other function definitions (function definition can also be inside main() ). If I were to write a program to parse this CPP file and check whether a function is getting called or not and delete if it is not getting called then what is(are) the way(s) to do it?
There are few ways that come to mind:
I would find out line numbers of beginning and end of main(). I can do it by maintaining a stack of opening and closing braces { and }.
Anything after main would be function definition. Then I can parse for function definitions. To do this I can parse it the following way:
< string >< open paren >< comma separated string(s) for arguments >< closing paren >
Once I have all the names of such functions as described in (2), I can make a map with its names as key and value as a bool, indicating whether a function is getting called once or not.
Finally parse the file once again to check for any calls for functions with their name as in this map. The function call can be from within main or from some other function. The value for the key (i.e. the function name) could be flagged according to whether a function is getting called or not.
I feel I have complicated my logic and it could be done in a smarter way. With the above logic it would be hard to find all the corner cases (there would be many). Also, there could be function pointers to make parsing logic difficult. If that's not enough, the function pointers could be typedefed too.
How do I go about designing my program? Are a map (to maintain filenames) and stack (to maintain braces) the right data structures or is there anything else more suitable to deal with it?
Note: I am not looking for any tool to do this. Nor do I want to use any library (if it exists to make things easy).
I think you should not try to build a C++ parser from scratch, becuse of other said in comments that is really hard. IMHO, you'd better start from CLang libraries, than can do the low-level parsing for you and work directly with the abstract syntax tree.
You could even use crange as an example of how to use them to produce a cross reference table.
Alternatively, you could directly use GNU global, because its gtags command directly generates definition and reference databases that you have to analyse.
IMHO those two ways would be simpler than creating a C++ parser from scratch.
The simplest approach for doing it yourself I can think of is:
Write a minimal parser that can identify functions. It just needs to detect the start and ending line of a function.
Programmatically comment out the first function, save to a temp file.
Try to compile the file by invoking the complier.
Check if there are compile errors, if yes, the function is called, if not, it is unused.
Continue with the next function.
This is a comment, rather than an answer, but I post it here because it's too long for a comment space.
There are lots of issues you should consider. First of all, you should not assume that main() is a first function in a source file.
Even if it is, there should be some functions header declarations before the main() so that the compiler can recognize their invocation in main.
Next, function's opening and closing brace needn't be in separate lines, they also needn't be the only characters in their lines. Generally, almost whole C++ code can be put in a single line!
Furthermore, functions can differ with parameters' types while having the same name (overloading), so you can't recognize which function is called if you don't parse the whole code down to the parameters' types. And even more: you will have to perform type lists matching with standard convertions/casts, possibly considering inline constructors calls. Of course you should not forget default parameters. Google for resolving overloaded function call, for example see an outline here
Additionally, there may be chains of unused functions. For example if a() calls b() and b() calls c() and d(), but a() itself is not called, then the whole four is unused, even though there exist 'calls' to b(), c() and d().
There is also a possibility that functions are called through a pointer, in which case you may be unable to find a call. Example:
int (*testfun)(int) = whattotest ? TestFun1 : TestFun2; // no call
int testResult = testfun(paramToTest); // unknown function called
Finally the code can be pretty obfuscated with #defineās.
Conclusion: you'll probably have to write your own C++ compiler (except the machine code generator) to achieve your goal.
This is a very rough idea and I doubt it's very efficient but maybe it can help you get started. First traverse the file once, picking out any function names (I'm not entirely sure how you would do this). But once you have those names, traverse the file again, looking for the function name anywhere in the file, inside main and other functions too. If you find more than 1 instance it means that the function is being called and should be kept.
Based on the Kaleidoscope and Kaleidoscope with MCJIT tutorials, I have code to create a Module and function and call it using MCJIT. The function needs a prototype:
auto ft = llvm::FunctionType::get(llvm::Type::getInt32Ty(Context), argTypes, false);
However, the example only covers Double as parameters and return values (the above uses an int). To do anything advanced, you need to pass things like classes and containers.
How do you use existing C++ classes in the module?
Sure, you can link to any library you want, but you need to declare function prototypes to use them. If the library API has classes, how do you declare them?
What I want is something like this:
auto ft = llvm::FunctionType::get(llvm::Type::getStructTy("class.std::string"), argTypes, false);
where class.std::string has been imported from string.h.
The LLVM API only has primitive types. You can define structs to represent the classes, but this is way too hard to do manually (and not portable).
A way to do it might be to compile the class to bitcode and read it into a module, but I want to avoid temporary files if possible. Also I'm not sure how to extract the type from the module but it should be possible. I tried this on a header file of one of my classes (I renamed the header file to a cpp file otherwise clang would make into a .gch precompiled header) and the result was just a constant... maybe it was optimised out? I tried it on the cpp file and it resulted in 36000 lines of code...
Then I found this page. Instead of using the LLVM API, I should use the Clang API because Clang, as a compiler, can compile the code into a Module. Then I can use the LLVM API with the imported Modules. Is this the right way to go? Any working source code is appreciated because it took forever just to get function calling working (the tutorials are out of date and documentation is scarce).
The way I would do it is to compile the class to LLVM IR, and then link the two modules. Then, there's two options to extract the type from the module:
First, you can use the llvm::TypeFinder. The way you use it is by creating it, and then calling run() on it with the module as an argument. This code snippet will print out all of the types in the module:
llvm::TypeFinder type_finder;
type_finder.run(module, true);
for (auto t : type_finder) {
std::cout << t->getName().str() << std::endl;
}
Alternatively, it's possible to use Module's getIdentifiedStructTypes() method and iterate over the resulting vector in the same way as above.
Is there a way out to call a function directly from the what the user inputs ?
For example : If the user inputs greet the function named greet is called.
I don't want any cases or comparison for the call to generate.
#include <iostream>
#include<string>
using namespace std;
void nameOfTheFunction(); // prototype
int main() {
string nameOfTheFunction;
getline(cin,nameOfTheFunction); // enter the name of Function
string newString = nameOfTheFunction + "()"; // !!!
cout << newString;
// now call the function nameOfTheFunction
}
void nameOfTheFunction() {
cout << "hello";
}
And is there a concept of generating the function at run time ?
You mean run time function generation ??
NO.
But you can use a map if you already know which all strings a user might give as input (i.e you are limiting the inputs).
For the above you can probably use std::map < std::string, boost::function <... > >
Check boost::function HERE
In short, no this isn't possible. Names in C++ get turned into memory offsets (addresses), and then the names are discarded**. At runtime C++ has no knowledge of the function or method names it's actually running.
** If debug symbols are compiled in, then the symbols are there, but impractical to get access to.
Generating a function at runtime has a lot of drawbacks (if it is possible at all) and there is generally no good reason to do it in a language like C++. You should leave that to scripting languages (like Perl or Python), many offer a eval() function that can interpret a string like script code and execute it.
If you really, really need to do have something like eval() in a compiled language such as C++, you have a few options:
Define your own scripting language and write a parser/interpreter for it (lots of work)
Define a very simple imperative or math language that can be easily parsed and evaluated using well-known design patterns (like Interpreter)
Use an existing scripting language that can be easily integrated into your code through a library (example: Lua)
Stuff the strings of code you want to execute at runtime through an external interpreter or compiler and execute them through the operating system or load them into your program using dlopen/LoadLibrary/etc.
(3.) is probably the easiest and best approach. If you want to keep external dependencies to a minimum or if you need direct access to functionality and state inside your main program, I suggest you should go for (2.) Note that you can have callbacks into your own code in that case, so calling native functions from the script is not a problem. See here for a tutorial
If you can opt for a language like Java or C#, there's also the option to use the compiler built into the runtime itself. Have a look here for how to do this in Java
I just want to ask your ideas regarding this matter. For a certain important reason, I must extract/acquire all function names of functions that were called inside a "main()" function of a C source file (ex: main.c).
Example source code:
int main()
{
int a = functionA(); // functionA must be extracted
int b = functionB(); // functionB must be extracted
}
As you know, the only thing that I can use as a marker/sign to identify these function calls are it's parenthesis "()". I've already considered several factors in implementing this function name extraction. These are:
1. functions may have parameters. Ex: functionA(100)
2. Loop operators. Ex: while()
3. Other operators. Ex: if(), else if()
4. Other operator between function calls with no spaces. Ex: functionA()+functionB()
As of this moment I know what you're saying, this is a pain in the $$$... So please share your thoughts and ideas... and bear with me on this one...
Note: this is in C++ language...
You can write a Small C++ parser by combining FLEX (or LEX) and BISON (or YACC).
Take C++'s grammar
Generate a C++ program parser with the mentioned tools
Make that program count the funcion calls you are mentioning
Maybe a little bit too complicated for what you need to do, but it should certainly work. And LEX/YACC are amazing tools!
One option is to write your own C tokenizer (simple: just be careful enough to skip over strings, character constants and comments), and to write a simple parser, which counts the number of {s open, and finds instances of identifier + ( within. However, this won't be 100% correct. The disadvantage of this option is that it's cumbersome to implement preprocessor directives (e.g. #include and #define): there can be a function called from a macro (e.g. getchar) defined in an #include file.
An option that works for 100% is compiling your .c file to an assembly file, e.g. gcc -S file.c, and finding the call instructions in the file.S. A similar option is compiling your .c file to an object file, e.g, gcc -c file.c, generating a disassembly dump with objdump -d file.o, and searching for call instructions.
Another option is finding a parser using Clang / LLVM.
gnu cflow might be helpful