Optimizing away function calls - c++

Is it a conceivable that a C++ compiler would optimize out a function call to a class member function that only sets class variables? Example:
class A
{
private:
int foo;
public:
void bar(int foo_in)
{
foo = foo_in;
}
}
So if I did this
A test;
A.bar(5);
could a compiler optimize this to directly access the member and set it like so?

Yes, it is called inlining.
Moreover c++ is designed specifically to support, or make it easier for the compiler to perform, such optimizations in quite complex inheritance cases and templates.
Some would say this is quite distinctive feat of c++ as a high level language compared to others. Its "high level" features (mostly I mean generic programming - templates) were designed with such optimizations in mind. It is also one of the reasons that makes it considered as efficient in terms of performance.
This is also why would I expect a decent job at working out inlines with any reputable compiler.
From what I've read, this is also the reason why it is hard to get all the fancy stuff of other high-level languages such as Reflection mechanism, or other known from e.g. Java or python. It is because c++ is designed to easily allow to inline pretty much everything possible, so it's hard to introspect optimized code.
Edit:
Because you said you are writing an OpenGL stuff where performance of setters and getter and such optimizations do matter I decided to elaborate a bit and show a bit more interesting example where you can rely on inline mechanism.
You can write some interfaces avoiding the virtual mechanism but using templates. E.g:
//This is a stripped down interface for matrices for physical objects
//that have Hamiltonian and you can apply external field and temperature to it
template< class Object >
class Iface {
protected:
Object& t;
public:
Iface(Object& obj) : t(obj) {};
Vector get_eigen_vals() {return t.get_eigen_vals(); };
Matrix get_eigen_vectors() {return t.get_eigen_vectors(); };
void set_H(VectorD vect) { t.set_H(vect); };
void set_temp(double temp) {t.set_temp(temp);};
};
If you declare interface like this, you can wrap an object with this interface object, and pass instance of this interface class to your functions/algorithms, and still have everything inlined because it works on the reference of Object. Good compiler optimizes whole Iface out.

To answer the question a little bit more generally than just inlining:
There is something in the standard known as the as-if rule. It says that the compiler is allowed to make any change to your program as long as it doesn't affect the observable behaviour. There are even exemptions that allow them to change things that technically do change the observable behaviour.
It can elide function calls and even complete classes. It can do basically whatever it wants as long as it doesn't break anything.

Yes, the compiler can optimize this call away.
This is, actually very simple case of inlining.
The compiler is allowed to do much more tan that (it can unfold loops, optimize out local variables replace calculations with constants etc.)

Related

Is it a good idea to change pre c++11 setting functions to modern templated ones with forwarding

in, my classes I have lots of setXXX function which looks something like follows:
void setName(const std::string& newName) {
name = newName;
}
this is what most programmers would do before C++11.
I learned (item 25 from Effective Moderm C++) that using forwarding would make those functions more efficient. So they would look as follows:
template<typename T>
void setName(T&& newName) {
name = std::forward<T>(newName);
}
my qestion is whether it is a good idea to convert all such pre C++11 setting functions to templates with forwarding reference parameter? My current understanding is that it will always be beneficial, at least I dont see drawbacks.
Perfect forwarding might sound interesting, though, I haven't used it for setters. (Although, I do work on performance critical software)
From my experience, I would say that it makes your code more complex to understand without much benefits. Let me explain:
Complexity
Every caller of this set-method from now on, needs to check the type of the member before knowing what to pass. Let's assume a method 'setFile':
template<typename T>
void setFile(T &&file) { m_file = std::forward<T>(file); }
Should we call this method with a string (the filename), a c-style file-handle, a file-stream, your library specific file-wrapper ...? From seeing this method, it is not clear which to use. Especially not if you are new to the codebase.
Even though you class could be that small it can fit on a single terminal screen, code completion can't provide you with the information that you need.
Please note that compilation errors also become more complex if you use the wrong type.
Gain
This brings us to the question: What do you gain? From now on, you use perfect forwarding. In other words, you can assign a const char * to a std::string without the performance overhead. However, how much will this gain? If you had performance critical code, this overload already exists. However, if the code is not performance critical, it won't matter that you loose a few CPU cycles.
On top of that, your compiler (Clang, GCC, MSVC ...) is an optimizing compiler. In other words, it allows you to write the code you want without having to worry about performance. (Don't get me wrong, if the code is critical, please worry, though trust the many compiler writers and researchers to handle the obvious)
So, if you have this setName(const std::string &), make sure it is implemented in the header so that the compiler can reason about it and can try to optimize it away as much as possible. (For your own classes, make sure all constructors, destructor and assign/cast functions are visible to the compiler)
Conclusion
I doubt it is useful to spend your time 'upgrading' your code. If it would bring that much gain, a clang-based utility would already exist. I would even bet that you would gain more performance by spending the same time profiling your application and fixing the low-hanging fruit.

C++ Low latency Design: Function Dispatch v/s CRTP for Factory implementation

As part of a system design, we need to implement a factory pattern. In combination with the Factory pattern, we are also using CRTP, to provide a base set of functionality which can then be customized by the Derived classes.
Sample code below:
class FactoryInterface{
public:
virtual void doX() = 0;
};
//force all derived classes to implement custom_X_impl
template< typename Derived, typename Base = FactoryInterface>
class CRTP : public Base
{
public:
void doX(){
// do common processing..... then
static_cast<Derived*>(this)->custom_X_impl();
}
};
class Derived: public CRTP<Derived>
{
public:
void custom_X_impl(){
//do custom stuff
}
};
Although this design is convoluted, it does a provide a few benefits. All the calls after the initial virtual function call can be inlined. The derived class custom_X_impl call is also made efficiently.
I wrote a comparison program to compare the behavior for a similar implementation (tight loop, repeated calls) using function pointers and virtual functions. This design came out triumphs for gcc/4.8 with O2 and O3.
A C++ guru however told me yesterday, that any virtual function call in a large executing program can take a variable time, considering cache misses and I can achieve a potentially better performance using C style function table look-ups and gcc hotlisting of functions. However I still see 2x the cost in my sample program mentioned above.
My questions are as below:
1. Is the guru's assertion true? For either answers, are there any links I can refer.
2. Is there any low latency implementation which I can refer, has a base class invoking a custom function in a derived class, using function pointers?
3. Any suggestions on improving the design?
Any other feedback is always welcome.
Your guru refers to the hot attribute of the gcc compiler. The effect of this attribute is:
The function is optimized more aggressively and on many targets it is
placed into a special subsection of the text section so all hot
functions appear close together, improving locality.
So yes, in a very large code base, the hotlisted function may remain in cache ready to be executed without delay, because it avodis cache misses.
You can perfectly use this attribute for member functions:
struct X {
void test() __attribute__ ((hot)) {cout <<"hello, world !\n"; }
};
But...
When you use virtual functions the compiler generally generates a vtable that is shared between all objects of the class. This table is a table of pointers to functions. And indeed -- your guru is right -- nothing garantees that this table remains in cached memory.
But, if you manually create a "C-style" table of function pointers, the problem is EXACTLY THE SAME. While the function may remain in cache, nothing ensures that your function table remains in cache as well.
The main difference between the two approaches is that:
in the case of virtual functions, the compiler knows that the virtual function is a hot spot, and could decide to make sure to keep the vtable in cache as well (I don't know if gcc can do this or if there are plans to do so).
in the case of the manual function pointer table, your compiler will not easily deduce that the table belongs to a hot spot. So this attempt of manual optimization might very well backfire.
My opinion: never try to optimize yourself what a compiler can do much better.
Conclusion
Trust in your benchmarks. And trust your OS: if your function or your data is frequently acessed, there are high chances that a modern OS will take this into account in its virtual memry management, and whatever the compiler will generate.

Mimicing C# 'new' (hiding a virtual method) in a C++ code generator

I'm developing a system which takes a set of compiled .NET assemblies and emits C++ code which can then be compiled to any platform having a C++ compiler. Of course, this involves some extensive trickery due to various things .NET does that C++ doesn't.
One such situation is the ability to hide virtual methods, such as the following in C#:
class A
{
virtual void MyMethod()
{ ... }
}
class B : A
{
override void MyMethod()
{ ... }
}
class C : B
{
new virtual void MyMethod()
{ ... }
}
class D : C
{
override void MyMethod()
{ ... }
}
I came up with a solution to this that seemed clever and did work, as in the following example:
namespace impdetails
{
template<class by_type>
struct redef {};
}
struct A
{
virtual void MyMethod( void );
};
struct B : A
{
virtual void MyMethod( void );
};
struct C : B
{
virtual void MyMethod( impdetails::redef<C> );
};
struct D : C
{
virtual void MyMethod( impdetails::redef<D> );
};
This does of course require that all the call sites for C::MyMethod and D::MyMethod construct and pass the dummy object, as in this example:
C *c_d = &d;
c_d->MyMethod( impdetails::redef<C>() );
I'm not worried about this extra source code overhead; the output of this system is mainly not intended for human consumption.
Unfortunately, it turns out this actually causes runtime overhead. Intuitively, one would expect that because impdetails::redef<> is empty, it would take no space and passing it would involve no code.
However, the C++ standard, for reasons I understand but don't totally agree with, mandates that objects cannot have zero size. This leaves us with a situation where the compiler actually emits code to create and pass the object.
In fact, at least on VC2008, I found that it even went to the trouble of zeroing the dummy byte, even in release builds! I'm not sure why that was necessary, but it makes me even more not want to do it this way.
If all else fails I could always change the actual name of the function, such as perhaps having MyMethod, MyMethod$1, and MyMethod$2. However, this causes more problems. For instance, $ is actually not legal in C++ identifiers (although compilers I've tested will allow it.) A totally acceptable identifier in the output program could also be an identifier in the input program, which suggests a more complex approach would be needed, making this a less attractive option.
It also so turns out that there are other situations in this project where it would be nice to be able to modify method signatures using arbitrary type arguments similar to how I'm passing a type to impdetails::redef<>.
Is there any other clever way to get around this, or am I stuck between adding overhead at every call site or mangling names?
After considering some other aspects of the system as well such as interfaces in .NET, I am starting to think maybe it's better - perhaps even more-or-less necessary - to not even use the C++ virtual calling mechanism at all. The more I consider, the messier using that mechanism is getting.
In this approach, each user object class would have a separate struct for the vtable (perhaps kept in a separate namespace like vtabletype::. The generated class would have a pointer member that would be initialized through some trickery to point to a static instance of the vtable. Virtual calls would explicitly use a member pointer from that vtable.
If done properly this should have the same performance as the compiler's own implementation would. I've confirmed it does on VC2008. (By contrast, just using straight C, which is what I was planning on earlier, would likely not perform as well, since compilers often optimize this into a register.)
It would be hellish to write code like this manually, but of course this isn't a concern for a generator. This approach does have some advantages in this application:
Because it's a much more explicit approach, one can be more sure that it's doing exactly what .NET specifies it should be doing with respect to newslot as well as selection of interface implementations.
It might be more efficient (depending on some internal details) than a more traditional C++ approach to interfaces, which would tend to invoke multiple inheritance.
In .NET, objects are considered to be fully constructed when their .ctor runs. This impacts how virtual functions behave. With explicit knowledge of the vtables, this could be achieved by writing it in during allocation. (Although putting the .ctor code into a normal member function is another option.)
It might avoid redundant data when implementing reflection.
It provides better control and knowledge of object layout, which could be useful for the garbage collector.
On the downside, it totally loses the C++ compiler's overloading feature with regard to the vtable entries: those entries are data members, not functions, so there is no overloading. In this case it would be tempting to just number the members (say _0, _1...) This may not be so bad when debugging, since once the pointer is followed, you'll see an actual, properly-named member function anyway.
I think I may end up doing it this way but by all means I'd like to hear if there are better options, as this is admittedly a rather complex approach (and problem.)

To inline or not to inline

I've been writing a few classes lately; and I was wondering whether it's bad practice, bad for performance, breaks encapsulation or whether there's anything else inherently bad with actually defining some of the smaller member functions inside a header (I did try Google!). Here's an example I have of a header I've written with a lot of this:
class Scheduler {
public:
typedef std::list<BSubsystem*> SubsystemList;
// Make sure the pointer to entityManager is zero on init
// so that we can check if one has been attached in Tick()
Scheduler() : entityManager(0) { }
// Attaches a manager to the scheduler - used by Tick()
void AttachEntityManager( EntityManager &em )
{ entityManager = &em; }
// Detaches the entityManager from a scheduler.
void DetachEntityManager()
{ entityManager = 0; }
// Adds a subsystem to the scheduler; executed on Tick()
void AddSubsystem( BSubsystem* s )
{ subsystemList.push_back(s); }
// Removes the subsystem of a type given
void RemoveSubsystem( const SubsystemTypeID& );
// Executes all subsystems
void Tick();
// Destroys subsystems that are in subsystemList
virtual ~Scheduler();
private:
// Holds a list of all subsystems
SubsystemList subsystemList;
// Holds the entity manager (if attached)
EntityManager *entityManager;
};
So, is there anything that's really wrong with inlining functions like this, or is it acceptable?
(Also, I'm not sure if this'd be more suited towards the 'code review' site)
Inlining increases coupling, and increases "noise" in the class
definition, making the class harder to read and understand. As a
general rule, inlining should be considered as an optimization measure,
and only used when the profiler says it's necessary.
There are a few exceptions: I'll always inline the virtual destructor of
an abstract base class if all of the other functions are pure virtual;
it seems silly to have a separate source file just for an empty
destructor, and if all of the other functions are pure virtual, and
there are no data members, the destructor isn't going to change without
something else changing. And I'll occasionally provide inlined
constructors for "structures"—classes in which all data members
are public, and there are no other functions. I'm also less rigorous
about avoiding inline in classes which are defined in a source file,
rather than a header—the coupling issues obviously don't apply in
that case.
All of your member functions are one-liners, so in my opinion thats acceptable. Note that inline functions may actually decrease code size (!!) because optimizing compilers increase the size of (non-inline) functions in order to make them fit into blocks.
In order to make your code more readable I would suggest to use inline definitions as follows:
class Scheduler
{
...
void Scheduler::DetachEntityManager();
...
};
inline void Scheduler::DetachEntityManager()
{
entityManager = 0;
}
In my opinion thats more readable.
I think inlining (if I understood you right, you mean the habit of writing trivial code right into the header file, and not the compiler behaviour) aids readability by two factors:
It distinguishes trivial methods from non-trivial ones.
It makes the effect of trivial methods available at a glance, being self-documenting code.
From a design POV, it doesn't really matter. You are not going to change your inlined method without changing the subsystemList member, and a recompile is necessary in both cases. Inlining does not affect encapsulation, since the method is still a method with a public interface.
So, if the method is a dumb one-liner without a need for lengthy documentation or a conceivable need of change that does not encompass an interface change, I'd advise to go for inlining.
It will increase executable size and in some occasions this will lead to worse performance.
Keep in mind that an inline method requires it's source code to be visible to whoever uses it (ie. code in the header) this means that a small change in the implementation of your inlined methods will cause a recompilation on everything that uses the header where the inline method was defined.
On the other hand, it is a small performance increase, it's good for short methods that are called really frequently, since it will save you the typical overhead of calling to methods.
Inline methods are fine if you know where to use them and don't spam them.
Edit:
Regarding style and encapsulation, using inline methods prevents you from using things like Pointer to implementation, forward declarations, etc.. since your code is in the header.
Inlining has three "drawbacks" at least:
inline functions are at odds with the virtual keyword (I mean conceptually, IMO, either you want a piece of code to be substituted for the function call, or you want the function call to be virtual, i.e. polymorphic; anyway, see also this for more details as to when it could make sense practically);
your binary code will be larger;
if you include the inline method in the class definition, you reveal implementation detail.
Apart from that it is plainly ok to inline methods, although it is also true that modern compilers are already sufficiently smart to inline methods on their own when it makes sense for performance. So, in a sense I think it is better to leave it to the compiler altogether...
Methods inside class body are usually inline automatically. Also, inline is a suggestion and not a command. Compilers are generally smart enough to judge whether to inline a function or not.
You can refer to this similar question.
In fact you can write all your functions in the header file, if the function is too large the compiler will automatically not inline the function. Just write the function body where you think it fits best, let the compiler decide. The inline keyword is ignored often as well, if you really insist on inlining the function use __forceinline or something similar (I think that is MS specific).

Do the accessors affect the performance of an application?

I was wondering if the use of accessors can significantly affect performance of an application. Let's say we have a class Point and there are two private fields. We can get access to these fields by calling public functions such as GetX().
class Point
{
public:
Point(void);
double GetX();
double GetY();
void SetX(double x);
void SetY(double y);
~Point(void);
private:
double x,y;
};
However if we need to get the value of field x a lot of time (e.g if we process images) wouldn't this construction affect the performance of application? Maybe it would be faster just to make fields x and y public?
First and foremost, this is probably premature optimization, and in the general case accessors are not the source of application-level bottlenecks. However, they're not magic pixie dust. It's generally not the case that accessors will hurt performance. There are a few things to consider:
If the implementation is inline or if you have a toolchain that supports link-time optimization, it's likely that there will be 0 impact. Here's an example that lets you get absolutely the same performance on a compiler that doesn't suck.
class Point {
public: double GetX() const;
private: double x;
};
inline double Point::GetX() const { return x; }
If the implementation is out-of-line, then you have the added cost of a function call. If, as you say, the function is being called many times, then at least the code is more or less guaranteed to be in the cache, but the relative % of overhead may be high: the work to perform the function call is higher than the work of moving a double around, and there's a pointer indirection because the function actually uses this as a parameter.
If the implementation is both out-of-line and part of a relocatable library (Linux *.so or Windows *.dll), there's an additional indirection that occurs in order to manage the relocation.
Both of the latter costs are reduced on x86-64 hardware relative to x86 32-bit; so much so that you should just not worry about it. I can't speak about other architectures.
Penultimately, if you have many trivial objects with trivial getters and setters, and if you have no profile-guided optimization or link-time optimization, there may be caching effects due to large numbers of tiny functions. It's likely that each function requires a minimum of one cache line, and the functions are not going to be naturally organized in a way that groups commonly-used sections together. This cost is something you should probably ignore unless you're writing a very large-scale C++ project or core component, such as the KDE base system.
Ultimately, don't worry about it.
Such methods should always be inlined by the compiler and the performance of that will be identical to making them public. You can use the inline keyword to help the compiler along, but that's just a hint. If it's really critical that you avoid function call overhead, read the generated assembly. If they're getting inlined you're ok. Otherwise you might want to consider loosening their visibility.
In a typical case, no, there will not be a difference in performance (unless you've fairly specifically told the compiler not to inline any functions). If you allow it to inline functions, however, chances are that it'll generate identical assembly language for both.
That should not, however, be seen as an excuse for ruining your design by including these abominations. First of all, a class should generally provide high level operations, so (for example) you could have a move_relative and move_absolute, so instead of something like this:
Point whatever;
whatever.SetX(GetX()+3);
whatever.SetY(GetY()+4);
...you'd do something like this:
Point whatever;
whatever.move_relative(3, 4);
There are times, however, that exposing something as data really does make sense and work well. If/when you are going to do that, C++ already provides a good way to encapsulate access to the data: a class. It also provides a predefined name for SetXXX and GetXXX -- they're operator= and operator T respectively. The right way to do this is something like this:
template <class T>
class encapsulate {
T value;
public:
encapsulate(T const &t) : value(t) {}
encapsulate &operator=(encapsulate const &t) { value = t.value; }
operator T() { return value; }
};
Using this, your Point class looks like:
struct Point {
encapsulate<double> x, y;
};
With this, the data you want to be public looks and acts as if it is. At the same time, you retain full control over getting/setting the values by changing the encapsulate to something that does whatever you need done.
Point whatever;
whatever.x = whatever.x + 3;
whatever.y = whatever.y + 4;
Though I haven't bothered to in the demo template above, it's fairly easy to support the normal compound assignment operators (+=, -=, *=, /=, etc.) as well. Depending on the situation, it's often useful to eliminate many of these though. Just for example, adding/subtracting to an X/Y coordinate often makes sense -- but multiplication and division frequently won't, so you can just add += and -=, and if somebody accidentally types in /= or |= (for just a couple of examples), their code simply won't compile.
This also provides better enforcement of whatever constraints you need on the data. With private data and an accessor/mutator, other code in the class can (and almost inevitably will) modify the data in ways you didn't want. With a class dedicated to nothing by enforcing the correct constraints, that issue is virtually eliminated. Instead, code both inside and outside the class does a simple assignment (or uses the value, as the case may be) and it's routed through the operator=/operator T automatically -- code inside the class can't bypass whatever checking is needed.
Since you're (apparently) concerned with efficiency, I'll add that this won't normally have any run-time cost either. In fact, being a template gives it a slight advantage in that regard. Where code in a normal function could (even if only by accident) be rewritten in a way that prevented inline expansion, using a template eliminates that -- if you try to rewrite it in a way that otherwise wouldn't generate inline code, with a template it won't compile at all.
As long as you define the functions in the header so the compiler can inline them there should be no difference at all. But even if they aren't inlined you still shouldn't make them public unless profiling indicates that it's a significant bottleneck and that making the variables public improves the problem. Making variables public decreases encapsulation and maintainability. For a bit more on public variables, see my answer on What good are public variables then?
The short answer is yes, this will affect the performance. Whether you will notice the difference or not is another matter that depends on how much code you have in the accessors, among other things.
The more important questions, though, is do you need what you gain from using accessors? If you make the fields public, then you lose control over their values. Do you want to allow x or y to be NaN? or +-infinity? Making them public would make such cases possible.
If you decide later that a double is not acceptable for your point class (maybe you need more precision or the precision isn't necessary), then accessing the fields directly would cause trouble. While this change might also require changes in the accessors, the setters should be fine with overloaded methods. And you may still be fine with a public representation of a double whereas the internal representation is not a double (although this is not so likely with a Point class, I imagine).
There are other cases where you might want to have side effects on accessors and setters as well that making the fields public would circumvent. Maybe you want to create events for when your point changes, but if the fields are public, then your class won't know when the values change.
ADDED
Ok, so my glossing over with my "yes" so that I could get to the non-performance issues that I felt more important wasn't appreciated.
In many cases, the yes is probably as correct as it will be imperceptible. True, using inline and a kick-ass compiler may very well end up with the same code (assuming an accessor like double GetX() { return x; }), but there are a lot of ifs there. Compilers will only inline things that end up in the same object file (often created from a single code file). So you also need a kick-ass linker to optimize the references in other object files (by the time you get to the linker, the inline hint may not even still remain in the code). So some, but not necessarily all, of the code may end up being identical, but that would be something you can confirm only after the fact and isn't useful.
If you're concerned about image processing then it might be worth allowing for friend classes so that an image class that you code can have access directly to the fields, but again I don't think that even in that case the accessor will be adding a lot to your runtime.