C++ Is Object creation optimized out? - c++

If I have a class with two constructors like this:
class Test {
public:
Test(int x) {
_num = x;
}
Test() {
_num = 0;
}
private:
int _num;
};
I want to create a stack object based on a condition like this:
Test test;
if (someCondition() == 23) {
test = Test(42);
}
Will I have the overhead of creating a Test object two times, calling both constructors, in this case? Or will this be optimized out in general? Is this considered good practice?
Toy-examples in compiler explorer are optimized heavily with inlining with no apparent constructor call left. So it's not really clear to me.

Write code to express intent.
You do not want to call the constuctor twice. Don't call the constructor twice:
Test test = (someCondition() == 23) ? Test() : Test(42);
Can the compiler optimize your code to have only one Test constructed?
Yes. Compiler optimizations must follow the so-called "as-if rule". In a nutshell: Creating two, one, or none, or 100 instances in your code has no observable effect. The compiler can notice that and produce a program where no instance is created.
Will the compiler optimize your code to have only one Test constructed?
You cannot tell unless you try (or you are a compiler writer and your brain is capable of doing the job of a compiler... then you can know without trying). That's why it is important to write code to express intent. The code you write is not instructions for the CPU, but instructions for the compiler to generate instructions for the CPU.

Related

Check if the function has been called in gtest

In gtest framework, is there any way to check whether a function has been called? (without gmock, use gtest only)
for example:
class a
{
public:
void dd() {...};
void cc() {...};
void bb() {...};
void aa()
{
bb(cc(dd()));
}
};
void main ()
{
a dut;
dut.aa();
}
I do not care the function input and even the correctness of the output.
I just want to know if the function (e.g. aa()) has been triggered.
Is there any solution? Many thanks in advance!
without gmock, use gtest only
That's a very strict restriction. In general, you can't tell if a function was called. Gmock gets around this by generating mock functions that record the calls, arguments and can fake behavior based on runtime parameters.
Without this, you only have two options:
Observe a side effect of the function in question.
This is straightforward, but brittle: if you know there is a observable side effect of the function, you can check that:
class a
{
public:
a() : aa_flag(false) {}
void aa()
{
aa_flag = true;
}
bool aa_flag;
};
TEST(FuncCalled, CheckSideEffectFlag)
{
a dut;
dut.aa();
EXPECT_TRUE(dut.aa_flag);
}
You don't need to restrict yourself to flags that a function sets. Log messages and other side effects are also workable.
class a
{
public:
a() : aa_flag(false) {}
void aa()
{
LOG_INFO("aa called");
}
bool aa_flag;
};
TEST(FuncCalled, CheckSideEffectLog)
{
a dut;
dut.aa();
EXPECT_TRUE(LogContains("aa called"));
}
As mentioned above, this is a brittle solution, because you may be checking a side effect that randomly changes in the future. However, sometimes this is good enough.
Hotpatch the function to trace calls
This is nasty and I can't provide a complete example because the way to do this depends on your compiler and target. Basically, you instruct the compiler to generate the functions starting with a few no-op (not necessarily NOP) instructions. This allows you to take the address of the function, change these instructions to jump somewhere and then back. This is very useful because you can call a function that registers the return address and from that you can tell if a function was called or not. It was called iif the return address of aa() is registered.
You will need OS-specific calls to hotpatch your code and some knowledge of the CPU instructions you are running on. You obviously lose portability. I also don't know your requirements, but this probably isn't worth the trouble.
All in all, your best bet is gmock, if you want to stay in the boundaries of the standard. Virtual functions are the standard way of dynamic dispatch (which is what you would do if you were hotpatching your image).

Can C++ compilers optimize away a class?

Let's say I have a class that's something like this:
class View
{
public:
View(DataContainer &c)
: _c(c)
{
}
inline Elem getElemForCoords(double x, double y)
{
int idx = /* some computation here... */;
return _c.data[idx];
}
private:
DataContainer& _c;
};
If I have a function using this class, is the compiler allowed to optimize it away entirely and just inline the data access?
Is the same still true if View::_c happens to be a std::shared_ptr?
If I have a function using this class, is the compiler allowed to
optimize it away entirely and just inline the data access?
Is the same still true if View::_c happens to be a std::shared_ptr?
Absolutely, yes, and yes; as long as it doesn't violate the as-if rule (as already pointed out by Pentadecagon). Whether this optimization really happens is a much more interesting question; it is allowed by the standard. For this code:
#include <memory>
#include <vector>
template <class DataContainer>
class View {
public:
View(DataContainer& c) : c(c) { }
int getElemForCoords(double x, double y) {
int idx = x*y; // some dumb computation
return c->at(idx);
}
private:
DataContainer& c;
};
template <class DataContainer>
View<DataContainer> make_view(DataContainer& c) {
return View<DataContainer>(c);
}
int main(int argc, char* argv[]) {
auto ptr2vec = std::make_shared<std::vector<int>>(2);
auto view = make_view(ptr2vec);
return view.getElemForCoords(1, argc);
}
I have verified, by inspecting the assembly code (g++ -std=c++11 -O3 -S -fwhole-program optaway.cpp), that the View class is like it is not there, it adds zero overhead.
Some unsolicited advice.
Inspect the assembly code of your programs; you will learn a lot and start worrying about the right things. shared_ptr is a heavy-weight object (compared to, for example, unique_ptr), partly because of all that multi-threading machinery under the hood. If you look at the assembly code, you will worry much more about the overhead of the shared pointer and less about element access. ;)
The inline in your code is just noise, that function is implicitly inline anyway. Please don't trash your code with the inline keyword; the optimizer is free to treat it as whitespace anyway. Use link time optimization instead (-flto with gcc). GCC and Clang are surprisingly smart compilers and generate good code.
Profile your code instead of guessing and doing premature optimization. Perf is a great tool.
Want speed? Measure. (by Howard Hinnant)
In general, compilers don't optimize away classes. Usually, they optimize functions.
The compiler may decide to take the content of simple inlined functions and paste the content where the function is invoked, rather than making the inlined function a hard-coded function (i.e. it would have an address). This optimization depends on the compiler's optimization level.
The compiler and linker may decide to drop functions that are not used, whether they be class methods or free standing.
Think of the class as a stencil for describing an object. The stencil isn't any good without an instance. An exception is a public static function within the class (static methods don't require object instances). The class is usually kept in the compiler's dictionary.

LTO, Devirtualization, and Virtual Tables

Comparing virtual functions in C++ and virtual tables in C, do compilers in general (and for sufficiently large projects) do as good a job at devirtualization?
Naively, it seems like virtual functions in C++ have slightly more semantics, thus may be easier to devirtualize.
Update: Mooing Duck mentioned inlining devirtualized functions. A quick check shows missed optimizations with virtual tables:
struct vtab {
int (*f)();
};
struct obj {
struct vtab *vtab;
int data;
};
int f()
{
return 5;
}
int main()
{
struct vtab vtab = {f};
struct obj obj = {&vtab, 10};
printf("%d\n", obj.vtab->f());
}
My GCC will not inline f, although it is called directly, i.e., devirtualized. The equivalent in C++,
class A
{
public:
virtual int f() = 0;
};
class B
{
public:
int f() {return 5;}
};
int main()
{
B b;
printf("%d\n", b.f());
}
does even inline f. So there's a first difference between C and C++, although I don't think that the added semantics in the C++ version are relevant in this case.
Update 2: In order to devirtualize in C, the compiler has to prove that the function pointer in the virtual table has a certain value. In order to devirtualize in C++, the compiler has to prove that the object is an instance of a particular class. It would seem that the proof is harder in the first case. However, virtual tables are typically modified in only very few places, and most importantly: just because it looks harder, doesn't mean that compilers aren't as good in it (for otherwise you might argue that xoring is generally faster than adding two integers).
The difference is that in C++, the compiler can guarantee that the virtual table address never changes. In C then it's just another pointer and you could wreak any kind of havoc with it.
However, virtual tables are typically modified in only very few places
The compiler doesn't know that in C. In C++, it can assume that it never changes.
I tried to summarize in http://hubicka.blogspot.ca/2014/01/devirtualization-in-c-part-2-low-level.html why generic optimizations have hard time to devirtualize. Your testcase gets inlined for me with GCC 4.8.1, but in slightly less trivial testcase where you pass pointer to your "object" out of main it will not.
The reason is that to prove that the virtual table pointer in obj and the virtual table itself did not change the alias analysis module has to track all possible places you can point to it. In a non-trivial code where you pass things outside of the current compilation unit this is often a lost game.
C++ gives you more information on when type of object may change and when it is known. GCC makes use of it and it will make a lot more use of it in the next release. (I will write on that soon, too).
Yes, if it is possible for the compiler to deduce the exact type of a virtualized type, it can "devirtualize" (or even inline!) the call. A compiler can only do this if it can guarantee that no matter what, this is the function needed.
The major concern is basically threading. In the C++ example, the guarantees hold even in a threaded environment. In C, that can't be guaranteed, because the object could be grabbed by another thread/process, and overwritten (deliberately or otherwise), so the function is never "devirtualized" or called directly. In C the lookup will always be there.
struct A {
virtual void func() {std::cout << "A";};
}
struct B : A {
virtual void func() {std::cout << "B";}
}
int main() {
B b;
b.func(); //this will inline in optimized builds.
}
It depends on what you are comparing compiler inlining to. Compared to link time or profile guided or just in time optimizations, compilers have less information to use. With less information, the compile time optimizations will be more conservative (and do less inlining overall).
A compiler will still generally be pretty decent at inlining virtual functions as it is equivalent to inlining function pointer calls (say, when you pass a free function to an STL algorithm function like sort or for_each).

Do you consider multiple initialization steps "poor form"?

I'm writing a physics simulation (Ising model) in C++ that operates on square lattices. The heart of my program is my Ising class with a constructor that calls for the row and column dimensions of the lattice. I have two other methods to set other parameters of the system (temperature & initial state) that must get called before evolving the system! So, for instance, a sample program might look like this
int main() {
Ising system(30, 30);
system.set_state(up);
system.set_temperature(2);
for(int t = 0; t < 1000; t++) {
system.step();
}
return 0;
}
If the system.set_*() methods aren't called prior to system.step(), system.step() throws an exception alerting the user to the problem. I implemented it this way to simplify my constructor; is this bad practice?
It is recommended to put all mandatory parameters in the constructor whenever possible (there are exceptions of course, but these should be rare - I have seen one real-world example so far). This way you make your class both easier and safer to use.
Note also that by simplifying your constructor you make the client code more complicated instead, which IMO is a bad tradeoff. The constructor is written only once, but caller code may potentially need to be written many times more (increasing both the sheer amount of code to be written and the chance of errors).
Not at all, IMO. I face the same thing when loading data from external files. When the objects are created (ie, their respective ctors are called), the data is still unavailable and can only be retrieved at a later stage. So I split the initialisation in different stages:
constructor
initialisation (called by the framework engine when an object is activated for the first time)
activation (called each time an object is activated).
This is very specific to the framework I'm developing, but there is no way to deal with everything using just the constructor.
However, if you know the variables at the moment the ctor is called, it's better not to complicate the code. It's a possible source of headaches for anyone using your code.
IMO this is poor form if all of these initialization steps must be invoked every time. One of the goals of well-designed software is to minimize the opportunities to screw up, and having multiple methods which must be invoked before an object is "usable" simply makes it harder to get right. If these calls were optional then having them as separate methods would be fine.
Share and enjoy.
The entire point in a class is to present some kind of abstraction. As a user of a class, I should be able to assume that it behaves like the abstraction it models.
And part of that is that the class must always be valid. Once the object has been created (by calling the constructor), the class must be in a meaningful, valid state. It should be ready to use. If it isn't, then it is no longer a good abstraction.
If the initialization methods must be called in a specific order then I would wrap the call to them in their own method as this indicates that the methods are not atomic on their own so the 'knowledge' of how they should be called should be held in one place.
Well that's my opinion, anyway!
I'd say that setting the initial conditions should be separate from the constructor if you plan to initialize and run more than one transient on the same lattice.
If you run a transient and stop, then it's possible to move setting the initial conditions inside the constructor, but it means that you have to pass in the parameter values in order to do this.
I fully agree with the idea that an object should be 100% ready to be used after its constructor is called, but I think that's separate from the physics of setting the initial temperature field. The object could be fully usable, yet have every node in the problem at the same temperature of absolute zero. A uniform temperature field in an insulated body isn't of much interest from a heat transfer point of view.
As another commentator pointed out, having to call a bunch of initialisation functions is poor form. I would wrap this up in a class:
class SimulationInfo
{
private:
int x;
int y;
int state;
int temperature;
public:
SimulationArgs() : x(30), y (30), state(up), temperature(2) { }; // default ctor
// custom constructors here!
// properties
int x() const { return x; };
int y() const { return y; };
int state() const { return state; };
int temperature() const { return temperature; };
}; // eo class SimulationInfo
class Simulation
{
private:
Ising m_system;
public:
Simulation(const SimulationInfo& _info) : m_system(_info.x(), _info.y())
{
m_system.set_state(_info.state());
m_system.set_temperature(_info.temperature());
} // eo ctor
void simulate(int _steps)
{
for(int step(0); step < _steps; ++steps)
m_system.step();
} // eo simulate
}; // eo class Simulation
There are otherways, but this makes things infinitely more usable from a default setup:
SimulationInfo si; // accept all defaults
Simulation sim(si);
sim.simulate(1000);

using virtual functions versus static_cast from base to derived

I am trying to understand which implementation below is "faster". Assume that one compiles this code with and without the -DVIRTUAL flag.
I assume that compiling without -DVIRTUAL will be faster because:
a] There is no vtable used
b] The compiler might be able to optimize the assembly instructions because it "knows" exactly which call will be made given the various options (there are only a finite number of options).
My question is PURELY related to speed, not pretty code.
a] Am I correct in my analysis above?
b] Will the branch predictor / compiler combination be intelligent enough to optimize for a given branch of the switch statement? See that the "type" is a const int.
c] Are there any other factors that I am missing?
Thanks!
#include <iostream>
class Base
{
public:
Base(int t) : type(t) {}
~Base() {}
const int type;
#ifdef VIRTUAL
virtual void fn1()=0;
#else
void fn2();
#endif
};
class Derived1 : public Base
{
public:
Derived1() : Base(1) { }
~Derived1() {}
void fn1() { std::cout << "in Derived1()" << std::endl; }
};
class Derived2 : public Base
{
public:
Derived2() : Base(2) { }
~Derived2() { }
void fn1() { std::cout << "in Derived2()" << std::endl; }
};
#ifndef VIRTUAL
void Base::fn2()
{
switch(type)
{
case 1:
(static_cast<Derived1* const>(this))->fn1();
break;
case 2:
(static_cast<Derived2* const>(this))->fn1();
break;
default:
break;
};
}
#endif
int main()
{
Base *test = new Derived1();
#ifdef VIRTUAL
test->fn1();
#else
test->fn2();
#endif
return 0;
}
I think you misunderstand the VTable. The VTable is simply a jump table (In most implementations though AFAIK the spec does not guarantee this!). In fact you could go as far as saying its a giant switch statement. As such I'd wager the speed would be exactly the same with both your methods.
If anything I'd imagine the VTable method would be slightly faster as the compiler can make better decisions to optimise for cache alignment and so forth...
Have you measured the performance to see if there's even any difference at all?
I suppose not, because then you wouldn't be asking here. It's the only reasonable response though.
Assuming that you are not prematurely micro-optimizing pointlessly, and you have profiled your code and found this to be a problem that needs solving, the best way to figure out the answer to your question is to compile both in release with full optimizations and examine the generated machine code.
It's impossible to answer without specifying compiler and compiler options.
I see no particular reason why your non-virtual code should necessarily be any faster to make the call than the virtual code. In fact, the switch might well be slower than a vtable, since a call using a vtable will load an address and jump to it, whereas the switch will load an integer and do a little bit of thinking. Either one of them could be faster. For obvious reasons, a virtual call is not specified by the standard to be "slower than any other thing you invent to replace it".
I think it's reasonably unlikely that a randomly-chosen compiler will actually inline the call in the virtual case, but it's certainly allowed to (under the as-if rule), since the dynamic type of *test could be determined by data-flow analysis or similar. I think it's reasonably likely that with optimization enabled a randomly-chosen compiler will inline everything in the non-virtual case. But then, you've given a small example with very short functions all in one TU, so inlining is especially easy.
It depends on the platform and the compiler. A switch statement can be implemented as a test and branch or a jump table (i.e., an indirect branch). A virtual function is usually implemented as an indirect branch. If your compiler turns the switch statement into a jump table, the two approaches differ by one additional dereference. If that is the case and this particular usage happens infrequently enough (or thrashes the cache enough) then you might see a difference due to an extra cache miss.
On the other hand, if the switch statement is simply a test and branch, you might see a much bigger performance difference on some in-order CPUs that flush the instruction cache on an indirect branch (or require a high latency between setting the destination of an indirect branch and jumping to it).
If you are really concerned with the overhead of virtual function dispatch, say, for an inner loop over a heterogenous collection of objects, you might want to reconsider where you perform the dynamic dispatch. It doesn't have to be per object; it could also be per known groupings of objects with the same type.
It is not necessarily true that avoiding vtables will be faster - to be sure, you should measure yourself.
Note that:
The static_cast version may introduce a branch (likely not to, if it gets optimized to a jump table),
The vtable version on all implementations I know will result in a jump table.
See a pattern here?
Generally, you'd prefer linear time lookup, not branching the code, so the virtual function method seems to be better.