I am writing a LLVM pass which prints function name only if it is user-defined (which are defined by the user in the source file).
I cannot find any way to distinguish the user-defined function from the initialization function (or static constructors). I tried checking if the function is just declared or defined, but it does not work as there some init functions are defined (like __cxx_global_var_init).
At pass-time, I know of no way to accomplish what you're trying to do.
That said, Clang provides a way to determine this during initial compilation. See: clang::SourceManager::isInSystemHeader(). You would have to write a Clang plugin or a libTooling-based program to take advantage of this as the information is gone once opt is executed. Here is a contrived example of how to do so using an AST visitor:
bool VisitFunctionDecl(clang::FunctionDecl* funcDecl)
{
if (sourceManager.isInSystemHeader(funcDecl->getLocStart()))
{
return true;
}
}
Related
PROBLEM:
I currently have a traditional module instrumentation pass that
inserts new function calls into a given IR according to some logic
(inserted functions are external from a small lib that is later linked
to given program). Running experiments, my overhead is from
the cost of executing a function call to the library function.
What I am trying to do:
I would like to inline these function bodies into the IR of
the given program to get rid of this bottleneck. I assume an intrinsic
would be a clean way of doing this, since an intrinsic function would
be expanded to its function body when being lowered to ASM (please
correct me if my understanding is incorrect here, this is my first
time working with intrinsics/LTO).
Current Status:
My original library call definition:
void register_my_mem(void *user_vaddr){
... C code ...
}
So far:
I have created a def in: llvm-project/llvm/include/llvm/IR/IntrinsicsX86.td
let TargetPrefix = "x86" in {
def int_x86_register_mem : GCCBuiltin<"__builtin_register_my_mem">,
Intrinsic<[], [llvm_anyint_ty], []>;
}
Added another def in:
otwm/llvm-project/clang/include/clang/Basic/BuiltinsX86.def
TARGET_BUILTIN(__builtin_register_my_mem, "vv*", "", "")
Added my library source (*.c, *.h) to the compiler-rt/lib/test_lib
and added to CMakeLists.txt
Replaced the function insertion with trying to insert the intrinsic
instead in: llvm/lib/Transforms/Instrumentation/myModulePass.cpp
WAS:
FunctionCallee sm_func =
curr_inst->getModule()->getOrInsertFunction("register_my_mem",
func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
NEW:
Intrinsic::ID aREGISTER(Intrinsic::x86_register_my_mem);
Function *sm_func = Intrinsic::getDeclaration(currFunc->getParent(),
aREGISTER, func_type);
ArrayRef<Value*> args = {
builder.CreatePointerCast(sm_arg_val, currType->getPointerTo())
};
builder.CreateCall(sm_func, args);
Questions:
If my logic for inserting the intrinsic functions shouldnt be a
module pass, where do i put it?
Am I confusing LTO with intrinsics?
Do I put my library function definitions into the following files as mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/114322.html as for example EmitRegisterMyMem()?
clang/lib/CodeGen/CodeGenFunction.cpp - define llvm::Instrinsic::ID
clang/lib/CodeGen/CodeGenFunction.h - declare llvm::Intrinsic::ID
My LLVM compiles, so it is semantically correct, but currently when
trying to insert this function call, LLVM segfaults saying "Not a valid type for function argument!"
I'm seeing multiple issues here.
Indeed, you're confusing LTO with intrinsics. Intrinsics are special "functions" that are either expanded into special instructions by a backend or lowered to library function calls. This is certainly not something you're going to achieve. You don't need an intrinsic at all, you'd just need to inline the function call in question: either by hands (from your module pass) or via LTO, indeed.
The particular error comes because you're declaring your intrinsic as receiving an integer argument (and this is how the declaration would look like), but:
asking the declaration of variadic intrinsic with invalid type (I'd assume your func_type is a non-integer type)
passing pointer argument
Hope this makes an issue clear.
See also: https://llvm.org/docs/LinkTimeOptimization.html
Thanks you for clearing up the issue #Anton Korobeynikov.
After reading your explanation, I also believe that I have to use LTO to accomplish what I am trying to do. I especially found this link very useful: https://llvm.org/docs/LinkTimeOptimization.html. It seems that I am now on a right path.
I am trying to find all places in a large and old code base where certain constructors or functions are called. Specifically, these are certain constructors and member functions in the std::string class (that is, basic_string<char>). For example, suppose there is a line of code:
std::string foo(fiddle->faddle(k, 9).snark);
In this example, it is not obvious looking at this that snark may be a char *, which is what I'm interested in.
Attempts To Solve This So Far
I've looked into some of the dump features of gcc, and generated some of them, but I haven't been able to find any that tell me that the given line of code will generate a call to the string constructor taking a const char *. I've also compiled some code with -s to save the generated equivalent assembly code. But this suffers from two things: the function names are "mangled," so it's impossible to know what is being called in C++ terms; and there are no line numbers of any sort, so even finding the equivalent place in the source file would be tough.
Motivation and Background
In my project, we're porting a large, old code base from HP-UX (and their aCC C++ compiler) to RedHat Linux and gcc/g++ v.4.8.5. The HP tool chain allowed one to initialize a string with a NULL pointer, treating it as an empty string. The Gnu tools' generated code fails with some flavor of a null dereference error. So we need to find all of the potential cases of this, and remedy them. (For example, by adding code to check for NULL and using a pointer to a "" string instead.)
So if anyone out there has had to deal with the base problem and can offer other suggestions, those, too, would be welcomed.
Have you considered using static analysis?
Clang has one called clang analyzer that is extensible.
You can write a custom plugin that checks for this particular behavior by implementing a clang ast visitor that looks for string variable declarations and checks for setting it to null.
There is a manual for that here.
See also: https://github.com/facebook/facebook-clang-plugins/blob/master/analyzer/DanglingDelegateFactFinder.cpp
First I'd create a header like this:
#include <string>
class dbg_string : public std::string {
public:
using std::string::string;
dbg_string(const char*) = delete;
};
#define string dbg_string
Then modify your makefile and add "-include dbg_string.h" to cflags to force include on each source file without modification.
You could also check how is NULL defined on your platform and add specific overload for it (eg. dbg_string(int)).
You can try CppDepend and its CQLinq a powerful code query language to detect where some contructors/methods/fields/types are used.
from m in Methods where m.IsUsing ("CClassView.CClassView()") select new { m, m.NbLinesOfCode }
EDITED:
Me and my colleague are very new to clang and llvm .
I have three functions..
function 1{}
function 2{}
function 3{}
Is there anyway to swap the functions to
function 3{}
function 2{}
function 1{}
using clang libtooling / rewriter and print out the function name and also the parameter inside the function ?
You can parse the AST first using an ASTConsumer. You get one function's AST at a time, you might want to store it globally somewhere and then you can add them to the clang's REWRITTER API and finally dump the buffer back to a file.
This is an example of editing some AST nodes and writing back to file. In your case, you won't edit the AST's but just rearrange the buffer push calls to rearrange the functions.
In VisitFunctionDecl:
bool VisitFunctionDecl(FunctionDecl *f) {
// Only function definitions (with bodies), not declarations.
if (f->hasBody()) {
//store in a global array/vector
}
return true;
}
In main after ParseAST and before writing to file, you will do a rearrangeFunctionDecls.
This question already has answers here:
Listing Unused Symbols
(2 answers)
Closed 7 years ago.
How do I detect function definitions which are never getting called and delete them from the file and then save it?
Suppose I have only 1 CPP file as of now, which has a main() function and many other function definitions (function definition can also be inside main() ). If I were to write a program to parse this CPP file and check whether a function is getting called or not and delete if it is not getting called then what is(are) the way(s) to do it?
There are few ways that come to mind:
I would find out line numbers of beginning and end of main(). I can do it by maintaining a stack of opening and closing braces { and }.
Anything after main would be function definition. Then I can parse for function definitions. To do this I can parse it the following way:
< string >< open paren >< comma separated string(s) for arguments >< closing paren >
Once I have all the names of such functions as described in (2), I can make a map with its names as key and value as a bool, indicating whether a function is getting called once or not.
Finally parse the file once again to check for any calls for functions with their name as in this map. The function call can be from within main or from some other function. The value for the key (i.e. the function name) could be flagged according to whether a function is getting called or not.
I feel I have complicated my logic and it could be done in a smarter way. With the above logic it would be hard to find all the corner cases (there would be many). Also, there could be function pointers to make parsing logic difficult. If that's not enough, the function pointers could be typedefed too.
How do I go about designing my program? Are a map (to maintain filenames) and stack (to maintain braces) the right data structures or is there anything else more suitable to deal with it?
Note: I am not looking for any tool to do this. Nor do I want to use any library (if it exists to make things easy).
I think you should not try to build a C++ parser from scratch, becuse of other said in comments that is really hard. IMHO, you'd better start from CLang libraries, than can do the low-level parsing for you and work directly with the abstract syntax tree.
You could even use crange as an example of how to use them to produce a cross reference table.
Alternatively, you could directly use GNU global, because its gtags command directly generates definition and reference databases that you have to analyse.
IMHO those two ways would be simpler than creating a C++ parser from scratch.
The simplest approach for doing it yourself I can think of is:
Write a minimal parser that can identify functions. It just needs to detect the start and ending line of a function.
Programmatically comment out the first function, save to a temp file.
Try to compile the file by invoking the complier.
Check if there are compile errors, if yes, the function is called, if not, it is unused.
Continue with the next function.
This is a comment, rather than an answer, but I post it here because it's too long for a comment space.
There are lots of issues you should consider. First of all, you should not assume that main() is a first function in a source file.
Even if it is, there should be some functions header declarations before the main() so that the compiler can recognize their invocation in main.
Next, function's opening and closing brace needn't be in separate lines, they also needn't be the only characters in their lines. Generally, almost whole C++ code can be put in a single line!
Furthermore, functions can differ with parameters' types while having the same name (overloading), so you can't recognize which function is called if you don't parse the whole code down to the parameters' types. And even more: you will have to perform type lists matching with standard convertions/casts, possibly considering inline constructors calls. Of course you should not forget default parameters. Google for resolving overloaded function call, for example see an outline here
Additionally, there may be chains of unused functions. For example if a() calls b() and b() calls c() and d(), but a() itself is not called, then the whole four is unused, even though there exist 'calls' to b(), c() and d().
There is also a possibility that functions are called through a pointer, in which case you may be unable to find a call. Example:
int (*testfun)(int) = whattotest ? TestFun1 : TestFun2; // no call
int testResult = testfun(paramToTest); // unknown function called
Finally the code can be pretty obfuscated with #defineās.
Conclusion: you'll probably have to write your own C++ compiler (except the machine code generator) to achieve your goal.
This is a very rough idea and I doubt it's very efficient but maybe it can help you get started. First traverse the file once, picking out any function names (I'm not entirely sure how you would do this). But once you have those names, traverse the file again, looking for the function name anywhere in the file, inside main and other functions too. If you find more than 1 instance it means that the function is being called and should be kept.
I am unable to use lldb to invoke simple, non-templated functions that take string arguments. Is there any way to get lldb to understand the C++ datatype "string", which is a commonly used datatype in C++ programs?
The sample source code here just creates a simple class with a few constructors, and then calls them (includes of "iostream" and "string" omitted):
using namespace std;
struct lldbtest{
int bar=5;
lldbtest(){bar=6;}
lldbtest(int foo){bar=foo;}
lldbtest(string fum){bar=7;}
};
int main(){
string name="fum";
lldbtest x,y(3);
cout<<x.bar<<y.bar<<endl;
return 0;
}
When compiled on Mac Maverick with
g++ -g -std=c++11 -o testconstructor testconstructor.cpp
the program runs and prints the expected output of "63".
However, when a breakpoint is set in main just before the return statement, and attempt to invoke the constructor fails with a cryptic error message:
p lldbtest(string("hey there"))
error: call to a function 'lldbtest::lldbtest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)' ('_ZN8lldbtestC1ENSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE') that is not present in the target
error: The expression could not be prepared to run in the target
Possibly relevant as well, the command:
p lldbtest(name)
prints nothing at all.
Also, calling the constructor with a string literal also failed, the standard way:
p lldbtest("foo")
gives a similar long error:
error: call to a function
'lldbtest::lldbtest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)' ('_ZN8lldbtestC1ENSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE') that is not present in the targeterror: The expression could not be prepared to run in the target
Is there any way to get lldb to understand and use the C++ "string" datatype? I have a number of functions taking string arguments and need a way to invoke these functions from the debugger. On a Mac.
THE PROBLEM
This is due to a subtle problem with your code, that boils down to the following wording from the C++ Standard:
7.1.2p3-4 Function specifiers [dcl.fct.spec]
A function defined within a class definition is an inline function.
...
An inline function shall be defined in every translation unit in which it is odr-used, and shall have exactly the same definition in every case (3.2).
Your constructor, lldbtest(std::string) is defined within the body of lldbtest which means that it will implicitly be inline, which further means that the compiler will not generate any code for it, unless it is used in the translation unit.
Since the definition must be present in every translation unit that potentially calls it we can imagine the compiler saying; "heck, I don't need to do this.. if someone else uses it, they will generate the code".
lldb will look for a function definition which doesn't exist, since gcc didn't generate one; because you didn't use it.
THE SOLUTION
If we change the definition of lldbtest to the following I bet it will work as you intended:
struct lldbtest{
int bar=5;
lldbtest();
lldbtest(int foo);
lldbtest(string fum);
};
lldbtest::lldbtest() { bar=6; }
lldbtest::lldbtest(int) { bar=7; }
lldbtest::lldbtest(string) { bar=8; }
But.. what about p lldbtest(name)?
The command p in lldb is used to print* information, but it can also be used to evaluate expressions.
lldbtest(name) will not call the constructor of lldbtest with a variable called name, it's equivalent of declaring a variable called name of type lldbtest; ie. lldbtest name is sementically equivalent.
Going to answer the asked question here instead of addressing the problem with the op's code. Especially since this took me a while to figure out.
Use a string in a function invocation in lldb in C++
(This post helped greatly, and is a good read: Dancing in The Debugger)