Question
(Can I get clang or perhaps some other optimizing tool shipped with LLVM to identify unused virtual functions in a C++ program, to mark them for dead code elimination? I guess not.)
If there is no such functionality shipped with LLVM, how would one go about implementing a thing like this? What's the most appropriate layer to achieve this, and where can I find examples on which I could build this?
Thoughts
My first thought was an optimizer working on LLVM bitcode or IR. After all, a lot of optimizers are written for that representation. Simple dead code elimination is easy enough: any function which is neither called nor has its address taken and stored somewhere is dead code and can be omitted from the final binary. But a virtual function has its address taken and stored in the virtual function table of the corresponding class. In order to identify whether that function has a chance of getting called, an optimizer would not only have to identify all virtual function calls, but also identify the type hierarchy to map these virtual function calls to all possible implementations.
This makes things look quite hard to tackle at the bitcode level. It might be better to handle this somewhere closer to the front end, at a stage where more type information is available, and where calls to a virtual function might be more readily associated with implementations of these functions. Perhaps the VirtualCallChecker could serve as a starting point.
One problem is probably the fact that while it's possible to combine the bitcode of several objects into a single unit for link time optimization, one hardly ever compiles all the source code of a moderately sized project as a single translation unit. So the association between virtual function calls and implementations might have to be somehow maintained till that stage. I don't know if any kind of custom annotation is possible with LLVM; I have seen no indication of this in the language specification.
But I'm having a bit of a trouble with the language specification in any case. The only reference to virtual in there are the virtuality and virtualIndex properties of MDSubprogram, but so far I have found no information at all about their semantics. No documentation, nor any useful places inside the LLVM source code. I might be looking at the wrong documentation for my use case.
Cross references
eliminate unused virtual functions asked about pretty much the same thing in the context of GCC, but I'm specifically looking for a LLVM solution here. There used to be a -fvtable-gc switch to GCC, but apparently it was too buggy and got punted, and clang doesn't support it either.
Example:
struct foo {
virtual ~foo() { }
virtual int a() { return 12345001; }
virtual int b() { return 12345002; }
};
struct bar : public foo {
virtual ~bar() { }
virtual int a() { return 12345003; }
virtual int b() { return 12345004; }
};
int main(int argc, char** argv) {
foo* p = (argc & 1 ? new foo() : new bar());
int res = p->a();
delete p;
return res;
};
How can I write a tool to automatically get rid of foo::b() and bar::b() in the generated code?
clang++ -fuse-ld=gold -O3 -flto with clang 3.5.1 wasn't enough, as an objdump -d -C of the resulting executable showed.
Question focus changed
Originally I had been asking not only about how to use clang or LLVM to this effect, but possibly for third party tools to achieve the same if clang and LLVM were not up to the task. Questions asking for tools are frowned upon here, though, so by now the focus has shifted from finding a tool to writing one. I guess chances for finding one are slim in any case, since a web search revealed no hints in that direction.
Related
In my C++ program I have a class, in some methods of which there are same routines happen, such as opening streams for reading/writing to files, parsing files, determining mime types, etc. Same routines are also used in constructor. To make methods more compact and avoid typing same code multiple times I split these routine operations into private methods for using inside the class only. However, some of these private methods depend on the result of the others, so that calling these methods in wrong order could lead in pretty bad consequences.
Just a stupid example:
class Example
{
public:
Example(int x);
~Example() {}
//...
//...
protected:
private:
int a;
int b;
bool c;
void foo_();
void bar_();
//...
//...
};
Example::Example(int x) : a(x)
{
foo_();
bar_();
}
void Example::foo_()
{
if (a == 0)
{
b = 10;
}
else
{
b = a * 2;
}
}
void Example::bar_()
{
if (b == 0)
{
c = false;
}
else
{
c = true;
}
}
As can be seen from the above example, calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized. But should I bother about such nuances if I am definitely sure that I am using these private methods correctly inside the class, and they can never be used outside the class?
Not to mention that what you did is the recommended way! Whenever you have multiple different operations inside a function, the standard way is to separate the function into multiple functions. In your case, the user does not need those functions, so making them private was the best you could do! When it comes to the part where "I need to call them in a specific order", its entirely fine if the code needs calls in a particular order. I mean, its only logical to call foo after bar is the former depends on the result of the later. It's not much different than when you need to assign memory to int* p before using it as an array. Although, as #Basil and many others have explained, be sure to document your code correctly
calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized
As a rule of thumb, I always explicitly initialize all member fields in a constructor (in particular those having a scalar type like pointers or numbers, e.g. your a,b,c inside class Example). Advantage: the behavior of your program is more reproducible. Disadvantage: the compiled code might run useless initialization (but clever optimizing compilers would remove them).
If you compile with GCC, use it as g++ -Wall -Wextra -g. It usually gives you useful warnings.
For a large C++ project, consider documenting your coding rules (in a separate written document, on paper, distributed to all developers in your team) and checking some of them with your GCC plugin. See also the DECODER project and the Bismon static source code analyzer, and the Clang static analyzer (all of GCC, Bismon and Clang analyzer are open source, you can improve their source code).
In some cases some C++ code is generated. See GNU bison, ANTLR, RefPerSys, FLTK, Qt as examples of software projects generating C++ code or providing code generators emitting C++ code. On x86/64 PCs, you could generate machine code at runtime with ASMJIT or libgccjit, and call that code thru function pointers (on Linux see also dlopen(3), dlsym(3) and the C++ dlopen minihowto...). If your software project has C++ code generators (e.g. using GPP), you can ensure that the generated code respects some of your coding conventions and invariants. Be however aware of Rice's theorem.
If you debug with GDB, read about its watch command and watchpoints.
I am also aware of the C++ rule of five.
Let's say I have a function like:
template<typename It, typename Cmp>
void mysort( It begin, It end, Cmp cmp )
{
std::sort( begin, end, cmp );
}
When I compile this using -finstrument-functions-after-inlining with clang++ --version:
clang version 11.0.0 (...)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: ...
The instrument code explodes the execution time, because my entry and exit functions are called for every call of
void std::__introsort_loop<...>(...)
void std::__move_median_to_first<...>(...)
I'm sorting a really big array, so my program doesn't finish: without instrumentation it takes around 10 seconds, with instrumentation I've cancelled it at 10 minutes.
I've tried adding __attribute__((no_instrument_function)) to mysort (and the function that calls mysort), but this doesn't seem to have an effect as far as these standard library calls are concerned.
Does anyone know if it is possible to ignore function instrumentation for the internals of a standard library function like std::sort? Ideally, I would only have mysort instrumented, so a single entry and a single exit!
I see that clang++ sadly does not yet support anything like finstrument-functions-exclude-function-list or finstrument-functions-exclude-file-list, but g++ does not yet support -finstrument-functions-after-inlining which I would ideally have, so I'm stuck!
EDIT: After playing more, it would appear the effect on execution-time is actually less than that described, so this isn't the end of the world. The problem still remains however, because most people who are doing function instrumentation in clang will only care about the application code, and not those functions linked from (for example) the standard library.
EDIT2: To further highlight the problem now that I've got it running in a reasonable time frame: the resulting trace that I produce from the instrumented code with those two standard library functions is 15GB. When I hard code my tracing to ignore the two function addresses, the resulting trace is 3.7MB!
I've run into the same problem. It looks like support for these flags was once proposed, but never merged into the main branch.
https://reviews.llvm.org/D37622
This is not a direct answer, since the tool doesn't support what you want to do, but I think I have a decent work-around. What I wound up doing was creating a "skip list" of sorts. In the instrumented functions (__cyg_profile_func_enter and __cyg_profile_func_exit), I would guess the part that is contributing most to your execution time is the printing. If you can come up with a way of short-circuiting the profile functions, that should help, even if it's not the most ideal. At the very least it will limit the size of the output file.
Something like
#include <stdint.h>
uintptr_t skipAddrs[] = {
// assuming 64-bit addresses
0x123456789abcdef, 0x2468ace2468ace24
};
size_t arrSize = 0;
int main(void)
{
...
arrSize = sizeof(skipAddrs)/sizeof(skipAddrs[0]);
// https://stackoverflow.com/a/37539/12940429
...
}
void __cyg_profile_func_enter (void *this_fn, void *call_site) {
for (size_t idx = 0; idx < arrSize; idx++) {
if ((uintptr_t) this_fn == skipAddrs[idx]) {
return;
}
}
}
I use something like objdump -t binaryFile to examine the symbol table and find what the addresses are for each function.
If you specifically want to ignore library calls, something that might work is examining the symbol table of your object file(s) before linking against libraries, then ignoring all the ones that appear new in the final binary.
All this should be possible with things like grep, awk, or python.
You have to add attribute __attribute__((no_instrument_function)) to the functions that should not be instrumented. Unfortunately it is not easy to make it work with C/C++ standard library functions because this feature requires editing all the C++ library functions.
There are some hacks you can do like #define existing macros from include/__config to add this attribute as well. e.g.,
-D_LIBCPP_INLINE_VISIBILITY=__attribute__((no_instrument_function,internal_linkage))
Make sure to append existing macro definition with no_instrument_function to avoid unexpected errors.
I wonder if there is some theory/tool available to replace a piece of code that contains function calls, into code where all function call has been replaced by their respective code.
like
main()
{
fun();
}
fun()
{
int i;
fun2();
}
fun2()
{
int j;
}
into
main()
{
int i;
int j;
}
I know there is a lot to take care of, like local variable names, recursive calls, external function calls etc etc. .. ..
I also know that it may not be at all useful, but still does something like this exist? even in theory?
should I call it advance per-processor unit :)
The compiler can usually tell when it's a good idea to do this, and already automatically does inlining whenever needed. You can also suggest that a function should be inlined using the inline keyword before a function (note that it still doesn't actually force it, and the compiler might decide to avoid the inlining).It's generally not such a good idea to do this manually, as modern compilers tend to figure out the best possible inlinings on their own. This article explains inline functions really well, I found it very helpful
Edit 1:
There are several reasons why one might want to do that inlining you speak of. If you feel like your code is divided into many different functions reducing its clarity and making it overly verbose, you could try a refactoring tool, such as the one provided by the VAssist X Visual Studio plugin. Though this plugin doesn't really do what you suggest (I can't think of a tool that does), it can help move functions/ methods around with ease, allowing you to clean up your code.
I just read about the LLVM project and that it could be used to do static analysis on C/C++ codes using the analyzer Clang which the front end of LLVM. I wanted to know if it is possible to extract all the accesses to memory(variables, local as well as global) in the source code using LLVM.
Is there any inbuilt library present in LLVM which I could use to extract this information.
If not please suggest me how to write functions to do the same.(existing source code, reference, tutorial, example...)
Of what i have thought, is I would first convert the source code into LLVM bc and then instrument it to do the analysis, but don't know exactly how to do it.
I tried to figure out myself which IR should I use for my purpose ( Clang's Abstract Syntax Tree (AST) or LLVM's SSA Intermediate Representation (IR). ), but couldn't really figure out which one to use.
Here is what I m trying to do.
Given any C/C++ program (like the one given below), I am trying to insert calls to some function, before and after every instruction that reads/writes to/from memory. For example consider the below C++ program ( Account.cpp)
#include <stdio.h>
class Account {
int balance;
public:
Account(int b) {
balance = b;
}
int read() {
int r;
r = balance;
return r;
}
void deposit(int n) {
balance = balance + n;
}
void withdraw(int n) {
int r = read();
balance = r - n;
}
};
int main () {
Account* a = new Account(10);
a->deposit(1);
a->withdraw(2);
delete a;
}
So after the instrumentation my program should look like:
#include <stdio.h>
class Account {
int balance;
public:
Account(int b) {
balance = b;
}
int read() {
int r;
foo();
r = balance;
foo();
return r;
}
void deposit(int n) {
foo();
balance = balance + n;
foo();
}
void withdraw(int n) {
foo();
int r = read();
foo();
foo();
balance = r - n;
foo();
}
};
int main () {
Account* a = new Account(10);
a->deposit(1);
a->withdraw(2);
delete a;
}
where foo() may be any function like get the current system time or increment a counter .. so on. I understand that to insert function like above I will have to first get the IR and then run an instrumentation pass on the IR which will insert such calls into the IR, but I don't really know how to achieve it. Please suggest me with examples how to go about it.
Also I understand that once I compile the program into the IR, it would be really difficult to get 1:1 mapping between my original program and the instrumented IR. So, is it possible to reflect the changes made in the IR ( because of instrumentation ) into the original program.
In order to get started with LLVM pass and how to make one on my own, I looked at an example of a pass that adds run-time checks to LLVM IR loads and stores, the SAFECode's load/store instrumentation pass (http://llvm.org/viewvc/llvm-project/safecode/trunk/include/safecode/LoadStoreChecks.h?view=markup and http://llvm.org/viewvc/llvm-project/safecode/trunk/lib/InsertPoolChecks/LoadStoreChecks.cpp?view=markup). But I couldn't figure out how to run this pass. Please give me steps how to run this pass on some program say the above Account.cpp.
First off, you have to decide whether you want to work with clang or LLVM. They both operate on very different data structures which have advantages and disadvantages.
From your sparse description of your problem, I'll recommend going for optimization passes in LLVM. Working with the IR will make it much easier to sanitize, analyze and inject code because that's what it was designed to do. The downside is that your project will be dependent on LLVM which may or may not be a problem for you. You could output the result using the C backend but that won't be usable by a human.
Another important downside when working with optimization passes is that you also lose all symbols from the original source code. Even if the Value class (more on that later) has a getName method, you should never rely on it to contain anything meaningful. It's meant to help you debug your passes and nothing else.
You will also have to have a basic understanding of compilers. For example, it's a bit of a requirement to know about basic blocks and static single assignment form. Fortunately they're not very difficult concepts to learn or understand (the Wikipedia articles should be adequate).
Before you can start coding, you first have to do some reading so here's a few links to get you started:
Architecture Overview: A quick architectural overview of LLVM. Will give you a good idea of what you're working with and whether LLVM is the right tool for you.
Documentation Head: Where you can find all the links below and more. Refer to this if I missed anything.
LLVM's IR reference: This is the full description of the LLVM IR which is what you'll be manipulating. The language is relatively simple so there isn't too much to learn.
Programmer's manual: A quick overview of basic stuff you'll need to know when working with LLVM.
Writting Passes: Everything you need to know to write transformation or analysis passes.
LLVM Passes: A comprehensive list of all the passes provided by LLVM that you can and should use. These can really help clean up the code and make it easier to analyze. For example, when working with loops, the lcssa, simplify-loop and indvar passes will save your life.
Value Inheritance Tree: This is the doxygen page for the Value class. The important bit here is the inheritance tree that you can follow to get the documentation for all the instructions defined in the IR reference page. Just ignore the ungodly monstrosity that they call the collaboration diagram.
Type Inheritance Tree: Same as above but for types.
Once you understand all that then it's cake. To find memory accesses? Search for store and load instructions. To instrument? Just create what you need using the proper subclass of the Value class and insert it before or after the store and load instruction. Because your question is a bit too broad, I can't really help you more than this. (See correction below)
By the way, I had to do something similar a few weeks ago. In about 2-3 weeks I was able to learn all I needed about LLVM, create an analysis pass to find memory accesses (and more) within a loop and instrument them with a transformation pass I created. There was no fancy algorithms involved (except the ones provided by LLVM) and everything was pretty straightforward. Moral of the story is that LLVM is easy to learn and work with.
Correction: I made an error when I said that all you have to do is search for load and store instructions.
The load and store instruction will only give accesses that are made to the heap using pointers. In order to get all memory accesses you also have to look at the values which can represent a memory location on the stack. Whether the value is written to the stack or stored in a register is determined during the register allocation phase which occurs in an optimization pass of the backend. Meaning that it's platform dependent and shouldn't be relied on.
Now unless you provide more information about what kind of memory accesses you're looking for, in what context and how you intend to instrument them, I can't help you much more then this.
Since there are no answer to your question after two days, I will offer his one which is slightly but not completely off-topic.
As an alternative to LLVM, for static analysis of C programs, you may consider writing a Frama-C plug-in.
The existing plug-in that computes a list of inputs for a C function needs to visit every lvalue in the function's body. This is implemented in file src/inout/inputs.ml. The implementation is short (the complexity is in other plug-ins that provide their results to this one, e.g. resolving pointers) and can be used as a skeleton for your own plug-in.
A visitor for the Abstract Syntax Tree is provided by the framework. In order to do something special for lvalues, you simply define the corresponding method. The heart of the inputs plug-in is the method definition:
method vlval lv = ...
Here is an example of what the inputs plug-in does:
int a, b, c, d, *p;
main(){
p = &a;
b = c + *p;
}
The inputs of main() are computed thus:
$ frama-c -input t.c
...
[inout] Inputs for function main:
a; c; p;
More information about writing Frama-C plug-ins in general can be found here.
I have programmed in both Java and C, and now I am trying to get my hands dirty with C++.
Given this code:
class Booth {
private :
int tickets_sold;
public :
int get_tickets_sold();
void set_tickets_sold();
};
In Java, wherever I needed the value of tickets_sold, I would call the getter repeatedly.
For example:
if (obj.get_tickets_sold() > 50 && obj.get_tickets_sold() < 75){
//do something
}
In C I would just get the value of the particular variable in the structure:
if( obj_t->tickets_sold > 50 && obj_t->tickets_sold < 75){
//do something
}
So while using structures in C, I save on the two calls that I would otherwise make in Java, the two getters that is, I am not even sure if those are actual calls or Java somehow inlines those calls.
My point is if I use the same technique that I used in Java in C++ as well, will those two calls to getter member functions cost me, or will the compiler somehow know to inline the code? (thus reducing the overhead of function call altogether?)
Alternatively, am I better off using:
int num_tickets = 0;
if ( (num_tickets = obj.get_ticket_sold()) > 50 && num_tickets < 75){
//do something
}
I want to write tight code and avoid unnecessary function calls, I would care about this in Java, because, well, we all know why. But, I want my code to be readable and to use the private and public keywords to correctly reflect what is to be done.
Unless your program is too slow, it doesn't really matter. In 99.9999% of code, the overhead of a function call is insignificant. Write the clearest, easiest to maintain, easiest to understand code that you can and only start tweaking for performance after you know where your performance hot spots are, if you have any at all.
That said, modern C++ compilers (and some linkers) can and will inline functions, especially simple functions like this one.
If you're just learning the language, you really shouldn't worry about this. Consider it fast enough until proven otherwise. That said, there are a lot of misleading or incomplete answers here, so for the record I'll flesh out a few of the subtler implications. Consider your class:
class Booth
{
public:
int get_tickets_sold();
void set_tickets_sold();
private:
int tickets_sold;
};
The implementation (known as a definition) of the get and set functions is not yet specified. If you'd specified function bodies inside the class declaration then the compiler would consider you to have implicitly requested they be inlined (but may ignore that if they're excessively large). If you specify them later using the inline keyword, that has exactly the safe effect. Summarily...
class Booth
{
public:
int get_tickets_sold() { return tickets_sold; }
...
...and...
class Booth
{
public:
int get_tickets_sold();
...
};
inline int Booth::get_tickets_sold() { return tickets_sold; }
...are equivalent (at least in terms of what the Standard encourages us to expect, but individual compiler heuristics may vary - inlining is a request that the compiler's free to ignore).
If the function bodies are specified later without the inline keyword, then the compiler is under no obligation to inline them, but may still choose to do so. It's much more likely to do so if they appear in the same translation unit (i.e. in the .cc/.cpp/.c++/etc. "implementation" file you're compiling or some header directly or indirectly included by it). If the implementation is only available at link time then the functions may not be inlined at all, but it depends on the way your particular compiler and linker interact and cooperate. It is not simply a matter of enabling optimisation and expecting magic. To prove this, consider the following code:
// inline.h:
void f();
// inline.cc:
#include <cstdio>
void f() { printf("f()\n"); }
// inline_app.cc:
#include "inline.h"
int main() { f(); }
Building this:
g++ -O4 -c inline.cc
g++ -O4 -o inline_app inline_app.cc inline.o
Investigating the inlining:
$ gdb inline_app
...
(gdb) break main
Breakpoint 1 at 0x80483f3
(gdb) break f
Breakpoint 2 at 0x8048416
(gdb) run
Starting program: /home/delroton/dev/inline_app
Breakpoint 1, 0x080483f3 in main ()
(gdb) next
Single stepping until exit from function main,
which has no line number information.
Breakpoint 2, 0x08048416 in f ()
(gdb) step
Single stepping until exit from function _Z1fv,
which has no line number information.
f()
0x080483fb in main ()
(gdb)
Notice the execution went from 0x080483f3 in main() to 0x08048416 in f() then back to 0x080483fb in main()... clearly not inlined. This illustrates that inlining can't be expected just because a function's implementation is trivial.
Notice that this example is with static linking of object files. Clearly, if you use library files you may actually want to avoid inlining of the functions specifically so that you can update the library without having to recompile the client code. It's even more useful for shared libraries where the linking is done implicitly at load time anyway.
Very often, classes providing trivial functions use the two forms of expected-inlined function definitions (i.e. inside class or with inline keyword) if those functions can be expected to be called inside any performance-critical loops, but the countering consideration is that by inlining a function you force client code to be recompiled (relatively slow, possibly no automated trigger) and relinked (fast, for shared libraries happens on next execution), rather than just relinked, in order to pick up changes to the function implementation.
These kind of considerations are annoying, but deliberate management of these tradeoffs is what allows enterprise use of C and C++ to scale to tens and hundreds of millions of lines and thousands of individual projects, all sharing various libraries over decades.
One other small detail: as a ballpark figure, an out-of-line get/set function is typically about an order of magnitude (10x) slower than the equivalent inlined code. That will obviously vary with CPU, compiler, optimisation level, variable type, cache hits/misses etc..
No, repetitive calls to member functions will not hurt.
If it's just a getter function, it will almost certainly be inlined by the C++ compiler (at least with release/optimized builds) and the Java Virtual Machine may "figure out" that a certain function is being called frequently and optimize for that. So there's pretty much no performance penalty for using functions in general.
You should always code for readability first. Of course, that's not to say that you should completely ignore performance outright, but if performance is unacceptable then you can always profile your code and see where the slowest parts are.
Also, by restricting access to the tickets_sold variable behind getter functions, you can pretty much guarantee that the only code that can modify the tickets_sold variable to member functions of Booth. This allows you to enforce invariants in program behavior.
For example, tickets_sold is obviously not going to be a negative value. That is an invariant of the structure. You can enforce that invariant by making tickets_sold private and making sure your member functions do not violate that invariant. The Booth class makes tickets_sold available as a "read-only data member" via a getter function to everyone else and still preserves the invariant.
Making it a public variable means that anybody can go and trample over the data in tickets_sold, which basically completely destroys your ability to enforce any invariants on tickets_sold. Which makes it possible for someone to write a negative number into tickets_sold, which is of course nonsensical.
The compiler is very likely to inline function calls like this.
class Booth {
public:
int get_tickets_sold() const { return tickets_sold; }
private:
int tickets_sold;
};
Your compiler should inline get_tickets_sold, I would be very surprised if it didn't. If not, you either need to use a new compiler or turn on optimizations.
Any compiler worth its salt will easily optimize the getters into direct member access. The only times that won't happen are when you have optimization explicitly disabled (e.g. for a debug build) or if you're using a brain-dead compiler (in which case, you should seriously consider ditching it for a real compiler).
The compiler will very likely do the work for you, but in general, for things like this I would approach it more from the C perspective rather than the Java perspective unless you want to make the member access a const reference. However, when dealing with integers, there's usually little value in using a const reference over a copy (at least in 32 bit environments since both are 4 bytes), so your example isn't really a good one here... Perhaps this may illustrate why you would use a getter/setter in C++:
class StringHolder
{
public:
const std::string& get_string() { return my_string; }
void set_string(const std::string& val) { if(!val.empty()) { my_string = val; } }
private
std::string my_string;
}
That prevents modification except through the setter which would then allow you to perform extra logic. However, in a simple class such as this, the value of this model is nil, you've just made the coder who is calling it type more and haven't really added any value. For such a class, I wouldn't have a getter/setter model.