Can C++ compilers optimize away a class? - c++

Let's say I have a class that's something like this:
class View
{
public:
View(DataContainer &c)
: _c(c)
{
}
inline Elem getElemForCoords(double x, double y)
{
int idx = /* some computation here... */;
return _c.data[idx];
}
private:
DataContainer& _c;
};
If I have a function using this class, is the compiler allowed to optimize it away entirely and just inline the data access?
Is the same still true if View::_c happens to be a std::shared_ptr?

If I have a function using this class, is the compiler allowed to
optimize it away entirely and just inline the data access?
Is the same still true if View::_c happens to be a std::shared_ptr?
Absolutely, yes, and yes; as long as it doesn't violate the as-if rule (as already pointed out by Pentadecagon). Whether this optimization really happens is a much more interesting question; it is allowed by the standard. For this code:
#include <memory>
#include <vector>
template <class DataContainer>
class View {
public:
View(DataContainer& c) : c(c) { }
int getElemForCoords(double x, double y) {
int idx = x*y; // some dumb computation
return c->at(idx);
}
private:
DataContainer& c;
};
template <class DataContainer>
View<DataContainer> make_view(DataContainer& c) {
return View<DataContainer>(c);
}
int main(int argc, char* argv[]) {
auto ptr2vec = std::make_shared<std::vector<int>>(2);
auto view = make_view(ptr2vec);
return view.getElemForCoords(1, argc);
}
I have verified, by inspecting the assembly code (g++ -std=c++11 -O3 -S -fwhole-program optaway.cpp), that the View class is like it is not there, it adds zero overhead.
Some unsolicited advice.
Inspect the assembly code of your programs; you will learn a lot and start worrying about the right things. shared_ptr is a heavy-weight object (compared to, for example, unique_ptr), partly because of all that multi-threading machinery under the hood. If you look at the assembly code, you will worry much more about the overhead of the shared pointer and less about element access. ;)
The inline in your code is just noise, that function is implicitly inline anyway. Please don't trash your code with the inline keyword; the optimizer is free to treat it as whitespace anyway. Use link time optimization instead (-flto with gcc). GCC and Clang are surprisingly smart compilers and generate good code.
Profile your code instead of guessing and doing premature optimization. Perf is a great tool.
Want speed? Measure. (by Howard Hinnant)

In general, compilers don't optimize away classes. Usually, they optimize functions.
The compiler may decide to take the content of simple inlined functions and paste the content where the function is invoked, rather than making the inlined function a hard-coded function (i.e. it would have an address). This optimization depends on the compiler's optimization level.
The compiler and linker may decide to drop functions that are not used, whether they be class methods or free standing.
Think of the class as a stencil for describing an object. The stencil isn't any good without an instance. An exception is a public static function within the class (static methods don't require object instances). The class is usually kept in the compiler's dictionary.

Related

Traits vs virtual overhead

I've come across to an Alexandrescu tutorial about traits and I have some reflections to share. This is the code:
// Example 6: Reference counting traits
//
template <class T>
class RefCountingTraits
{
static void Refer(T* p)
{
p->IncRef(); // assume RefCounted interface
}
static void Unrefer(T* p)
{
p->DecRef(); // assume RefCounted interface
}
};
template <>
class RefCountingTraits<Widget>
{
static void Refer(Widget* p)
{
p->AddReference(); // use Widget interface
}
static void Unrefer(Widget* p)
{
// use Widget interface
if (p->RemoveReference() == 0)
delete p;
}
};
How much overhead we have in this case compared to a standard virtual function member case? we are not accessing directly to the object also in this case: we are still passing a pointer. Is the compiler able to optimize it in same way?
At typical production optimisation levels (-O2 or /O2) you can expect all the code you've shown to be inlined and the bits without side-effects optimised away. That leaves the actual calls to IncRef or AddReference and the check for and delete-ion.
If virtual functions had been used, and if the reference counting code is trivial (e.g. not thread safe), it might have been about an order of magnitude slower due to a dispatch table lookup and out-of-line function call, but that will vary a bit with compiler, exact optimisation settings, CPU, calling conventions etc..
As always, when you have to care, profile and experiment.

Will C++ optimize out empty/non-virtual/void method calls?

Example code:
class DummyLock {
public:
void lock() {}
void unlock() {}
};
...
template <class T>
class List {
T _lock;
...
public:
void append(void* smth) {
_lock.lock();
...
_lock.unlock();
}
};
...
List<DummyLock> l;
l.append(...);
So, will it optimize out these method calls if lock type is a templated type? If no, what is the best approach to making a template list that has policies as template arguments (as in Andrei Alexandrescu C++ book)
Assuming inlining is enabled (so "some optimisation turned on"), then yes, any decent compiler should make this sort of thing into zero instructions. Particularly in a template, as templates require [in nearly all of the current compilers, at least] the compiler to "see" the source of the object. In a non-templated sitution, then it's possible to come up with a scenario where you are "out of line" declaring the empty lock code, and the compiler can't know that the function is empty.
(Looks scary with void *smth in your append tho' - I hope you do intent to have that as a second template type in your real implementation)
As always when it comes to "does the compiler do this", if it's really important, you need to check that YOUR compiler does what you expect in this particular case. clang++ -S or g++ -S would for example show if there are calls made or not within your append function.
Yes, any real-world C++ compiler (i.e. gcc, cland, VC++), will output no code for empty inline functions when optimization is turned on.

Does a getter have zero cost?

I have a simple class:
class A {
public:
int get() const;
private:
void do_something();
int value;
}
int A::get() const {
return value;
}
The getter function is simple and straightforward. Getters are to use them, so in do_something I should use get() in order to access value. My question is: will compiler optimize-out the getter, so it will be equivalent to accessing the data directly? Or I still will gain performance if I access it directly (what would imply worse design)?
A::do_something()
{
x = get();
// or...
x = value;
}
When the method is not virtual, compilers can optimize it. Good compilers (with link-time optimization) can optimize even if the method is not inline and defined in separate .cpp file. Not so good ones can only do that if it's declared inside the class definition, or in the header file with inline keyword. For virtual methods, it depends, but most likely no.
The compiler will almost certainly inline such a trivial getter, if it's got access to the definition.
If the getter is defined as an inline function (either implicitly by defining it inside the class, or explicitly with the inline keyword), the compiler will usually inline it, and there will be no overhead in calling it.
It is, however, common for debug builds to disable inlining, which is perfectly valid since compilers are not required to inline anything.
Well, using get is usually a better design because it hides the actual logic involved in getting the value (today it's a field, tomorrow it may require more complex logic). As for performance, while accessing the value itself will always be at least as fast as using get, the compiler will most likely inline the call anyway.
First you would not be able to manipulate the value inside your object if you do not return a reference rather than the value:
int& get();
Now it returns a reference and can be altered. But imho this is not quite clean, you should also define a setter and use it to write back the altered value:
int get() const;
void set(int);
...
A::do_something()
{
x = get();
set(value);
}
The performance of the setter depends on your compiler. Most modern compilers are able to inline simple getters/setters, so there should not be any performance loss.

LTO, Devirtualization, and Virtual Tables

Comparing virtual functions in C++ and virtual tables in C, do compilers in general (and for sufficiently large projects) do as good a job at devirtualization?
Naively, it seems like virtual functions in C++ have slightly more semantics, thus may be easier to devirtualize.
Update: Mooing Duck mentioned inlining devirtualized functions. A quick check shows missed optimizations with virtual tables:
struct vtab {
int (*f)();
};
struct obj {
struct vtab *vtab;
int data;
};
int f()
{
return 5;
}
int main()
{
struct vtab vtab = {f};
struct obj obj = {&vtab, 10};
printf("%d\n", obj.vtab->f());
}
My GCC will not inline f, although it is called directly, i.e., devirtualized. The equivalent in C++,
class A
{
public:
virtual int f() = 0;
};
class B
{
public:
int f() {return 5;}
};
int main()
{
B b;
printf("%d\n", b.f());
}
does even inline f. So there's a first difference between C and C++, although I don't think that the added semantics in the C++ version are relevant in this case.
Update 2: In order to devirtualize in C, the compiler has to prove that the function pointer in the virtual table has a certain value. In order to devirtualize in C++, the compiler has to prove that the object is an instance of a particular class. It would seem that the proof is harder in the first case. However, virtual tables are typically modified in only very few places, and most importantly: just because it looks harder, doesn't mean that compilers aren't as good in it (for otherwise you might argue that xoring is generally faster than adding two integers).
The difference is that in C++, the compiler can guarantee that the virtual table address never changes. In C then it's just another pointer and you could wreak any kind of havoc with it.
However, virtual tables are typically modified in only very few places
The compiler doesn't know that in C. In C++, it can assume that it never changes.
I tried to summarize in http://hubicka.blogspot.ca/2014/01/devirtualization-in-c-part-2-low-level.html why generic optimizations have hard time to devirtualize. Your testcase gets inlined for me with GCC 4.8.1, but in slightly less trivial testcase where you pass pointer to your "object" out of main it will not.
The reason is that to prove that the virtual table pointer in obj and the virtual table itself did not change the alias analysis module has to track all possible places you can point to it. In a non-trivial code where you pass things outside of the current compilation unit this is often a lost game.
C++ gives you more information on when type of object may change and when it is known. GCC makes use of it and it will make a lot more use of it in the next release. (I will write on that soon, too).
Yes, if it is possible for the compiler to deduce the exact type of a virtualized type, it can "devirtualize" (or even inline!) the call. A compiler can only do this if it can guarantee that no matter what, this is the function needed.
The major concern is basically threading. In the C++ example, the guarantees hold even in a threaded environment. In C, that can't be guaranteed, because the object could be grabbed by another thread/process, and overwritten (deliberately or otherwise), so the function is never "devirtualized" or called directly. In C the lookup will always be there.
struct A {
virtual void func() {std::cout << "A";};
}
struct B : A {
virtual void func() {std::cout << "B";}
}
int main() {
B b;
b.func(); //this will inline in optimized builds.
}
It depends on what you are comparing compiler inlining to. Compared to link time or profile guided or just in time optimizations, compilers have less information to use. With less information, the compile time optimizations will be more conservative (and do less inlining overall).
A compiler will still generally be pretty decent at inlining virtual functions as it is equivalent to inlining function pointer calls (say, when you pass a free function to an STL algorithm function like sort or for_each).

How does const after a function optimize the program?

I've seen some methods like this:
void SomeClass::someMethod() const;
What does this const declaration do, and how can it help optimize a program?
Edit
I see that the first part of this question has been asked before... BUT, it still doesn't answer the second part: how would this optimize the program?
If the compiler knows that the fields of a class instance are not modified across a const member function call, it doesn't have to reload any fields that it may have kept in registers before the const function call.
This is sort of referred to the in C++ FAQ in the discussion on const_cast.
It tells the compiler that the method has no effect on the classes state; you can't assign to anything in it. Have a look at the C++ FAQ Lite 18.10.
The asm code that is generated for the const method will be the same if the const is there or not. const is a function of the compiler not the runtime, so if there are any performance gains I would think that the compilers optimizer might use the const as a hint for things like inlining or determining side effects for a possible optimization. So in short the optimizer might be able to help out a bit, but if the method is straight forward to begin with then I doubt that the code generated from the optimizer would be any different const or no const.
Here's an easy optimization I use (rather than hit and miss things like const) which take a second but pay off. Organize your class variables so that they fall on cache line boundaries a little better, and put your most accessed variables together. To do it just put your ints, doubles, floats, etc. together at the top of your class variable declarations and your odd sized variables at the bottom like so:
int foo;
int bar;
double baz;
SomeObject obj;
char ch[14];
It allows you to call the class member function on const objects:
class SomeClass
{
public:
void foo();
void bar() const;
}
SomeClass a;
const SomeClass b;
a.foo(); // ok
a.bar(); // ok
b.foo(); // ERROR -- foo() is not const
b.bar(); // ok -- bar() is const
There's also the volatile qualifier for use with volatile objects, and you can also make functions const volatile for use on const volatile objects, but those two are exceedingly rare.
It prevents someMethod from altering any member variable of an object of that class.
My first thought regarding optimization is that since the 'const' indicates that the instance's state hasn't changed, the compiler possibly has more freedom with respect to reordering nearby calls to methods on that instance.