I know that dynamic_cast has a serious cost, but when I try the following codes, I get a bigger value almost every time from virtual function call loop. Do I know wrong until this time?
EDIT: The problem was that my compiler had been in debug mode. When I switched to release mode, virtual function call loop runs 5 to 7 times faster than dynamic_cast loop.
struct A {
virtual void foo() {}
};
struct B : public A {
virtual void foo() override {}
};
struct C : public B {
virtual void foo() override {}
};
int main()
{
vector<A *> vec;
for (int i = 0; i < 100000; ++i)
if (i % 2)
vec.push_back(new C());
else
vec.push_back(new B());
clock_t begin = clock();
for (auto iter : vec)
if (dynamic_cast<C*>(iter))
;
clock_t end = clock();
cout << (static_cast<double>(end) - begin) / CLOCKS_PER_SEC << endl;
begin = clock();
for (auto iter : vec)
iter->foo();
end = clock();
cout << (static_cast<double>(end) - begin) / CLOCKS_PER_SEC << endl;
return 0;
}
Since you are not doing anything with the result of the dynamic_cast in the lines
for (auto iter : vec)
if (dynamic_cast<C*>(iter))
;
the compiler might be optimizing away most of that code if not all of it.
If you do something useful with the result of the dynamic_cast, you might see a difference. You could try:
for (auto iter : vec)
{
if (C* cptr = dynamic_cast<C*>(iter))
{
cptr->foo();
}
if (B* bptr = dynamic_cast<B*>(iter))
{
bptr->foo();
}
}
That will most likely make a difference.
See http://ideone.com/BvqoqU for a sample run.
Do I know wrong until this time?
We probably can not tell from your code. The optimizer is clever, and it is some times quite challenging to 'defeat' or 'deceive' it.
In the following, I use 'assert()' to try to control the optimizer's enthusiasm. Also note that 'time(0)' is a fast function on Ubuntu 15.10. I believe the compiler does not yet know what the combination will do, and thus will not remove it, providing a more reliable/repeatable measurement.
I think I like these results better, and perhaps these indicate that dynamic cast is slower than virtual function invocation.
Environment:
on an older Dell, using Ubuntu 15.10, 64 bit, and -O3
~$ g++-5 --version
g++-5 (Ubuntu 5.2.1-23ubuntu1~15.10) 5.2.1 20151028
Results (dynamic cast followed by virtual funtion):
void T523_t::testStruct()
0.443445
0.184873
void T523_t::testClass()
252,495 us
184,961 us
FINI 2914399 us
Code:
#include <chrono>
// 'compressed' chrono access --------------vvvvvvv
typedef std::chrono::high_resolution_clock HRClk_t; // std-chrono-hi-res-clk
typedef HRClk_t::time_point Time_t; // std-chrono-hi-res-clk-time-point
typedef std::chrono::milliseconds MS_t; // std-chrono-milliseconds
typedef std::chrono::microseconds US_t; // std-chrono-microseconds
typedef std::chrono::nanoseconds NS_t; // std-chrono-nanoseconds
using namespace std::chrono_literals; // support suffixes like 100ms, 2s, 30us
#include <iostream>
#include <iomanip>
#include <vector>
#include <cassert>
// original ////////////////////////////////////////////////////////////////////
struct A {
virtual ~A() = default; // warning: ‘struct A’ has virtual functions and
// accessible non-virtual destructor [-Wnon-virtual-dtor]
virtual void foo() { assert(time(0)); }
};
struct B : public A {
virtual void foo() override { assert(time(0)); }
};
struct C : public B {
virtual void foo() override { assert(time(0)); }
};
// with class ////////////////////////////////////////////////////////////////////////////
// If your C++ code has no class ... why bother?
class A_t {
public:
virtual ~A_t() = default; // warning: ‘struct A’ has virtual functions and
// accessible non-virtual destructor [-Wnon-virtual-dtor]
virtual void foo() { assert(time(0)); }
};
class B_t : public A_t {
public:
virtual void foo() override { assert(time(0)); }
};
class C_t : public B_t {
public:
virtual void foo() override { assert(time(0)); }
};
class T523_t
{
public:
T523_t() = default;
~T523_t() = default;
int exec()
{
testStruct();
testClass();
return(0);
}
private: // methods
std::string digiComma(std::string s)
{ //vvvvv--sSize must be signed int of sufficient size
int32_t sSize = static_cast<int32_t>(s.size());
if (sSize > 3)
for (int32_t indx = (sSize - 3); indx > 0; indx -= 3)
s.insert(static_cast<size_t>(indx), 1, ',');
return(s);
}
void testStruct()
{
using std::vector;
using std::cout; using std::endl;
std::cout << "\n\n " << __PRETTY_FUNCTION__ << std::endl;
vector<A *> vec;
for (int i = 0; i < 10000000; ++i)
if (i % 2)
vec.push_back(new C());
else
vec.push_back(new B());
clock_t begin = clock();
int i=0;
for (auto iter : vec)
{
if(i % 2) (assert(dynamic_cast<C*>(iter))); // if (dynamic_cast<C*>(iter)) {};
else (assert(dynamic_cast<B*>(iter)));
}
clock_t end = clock();
cout << "\n " << std::setw(8)
<< ((static_cast<double>(end) - static_cast<double>(begin))
/ CLOCKS_PER_SEC) << endl; //^^^^^^^^^^^^^^^^^^^^^^^^^^
// warning: conversion to ‘double’ from ‘clock_t {aka long int}’ may alter its value [-Wconversion]
begin = clock();
for (auto iter : vec)
iter->foo();
end = clock();
cout << "\n " << std::setw(8)
<< ((static_cast<double>(end) - static_cast<double>(begin))
/ CLOCKS_PER_SEC) << endl; //^^^^^^^^^^^^^^^^^^^^^^^^^^
// warning: conversion to ‘double’ from ‘clock_t {aka long int}’ may alter its value [-Wconversion]
}
void testClass()
{
std::cout << "\n\n " << __PRETTY_FUNCTION__ << std::endl;
std::vector<A_t *> APtrVec;
for (int i = 0; i < 10000000; ++i)
{
if (i % 2) APtrVec.push_back(new C_t());
else APtrVec.push_back(new B_t());
}
{
Time_t start_us = HRClk_t::now();
int i = 0;
for (auto Aptr : APtrVec)
{
if(i % 2) assert(dynamic_cast<C_t*>(Aptr)); // check for nullptr
else assert(dynamic_cast<B_t*>(Aptr)); // check for nullptr
++i;
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n " << std::setw(8)
<< digiComma(std::to_string(duration_us.count()))
<< " us" << std::endl;
}
{
Time_t start_us = HRClk_t::now();
for (auto Aptr : APtrVec) {
Aptr->foo();
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n " << std::setw(8)
<< digiComma(std::to_string(duration_us.count()))
<< " us" << std::endl;
}
}
}; // class T523_t
int main(int argc, char* argv[])
{
std::cout << "\nargc: " << argc << std::endl;
for (int i = 0; i < argc; i += 1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /**/ }; }
Time_t start_us = HRClk_t::now();
int retVal = -1;
{
T523_t t523;
retVal = t523.exec();
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n FINI " << (std::to_string(duration_us.count()))
<< " us" << std::endl;
return(retVal);
}
update 2017-08-31
I suspect many of you will object to performing the dynamic cast without using the result. Here is one possible approach by replacing the for-auto loop in testClass() method:
for (auto Aptr : APtrVec)
{
if(i % 2) { C_t* c = dynamic_cast<C_t*>(Aptr); assert(c); c->foo(); }
else { B_t* b = dynamic_cast<B_t*>(Aptr); assert(b); b->foo(); }
++i;
}
With results
void T523_t::testStruct()
0.443445
0.184873
void T523_t::testClass()
322,431 us
191,285 us
FINI 4156941 us
end update
Related
I am using 4 threads to create a few objects using a thread_local memory pool.
I am using std::vector<std::future<int>> and std::async(std::launch::async, function); to dispatch the threads and std::for_each with t.get to get their value back. Here's the code:
struct GameObject
{
int x_, y_, z_;
int m_cost;
GameObject() = default;
GameObject(int x, int y, int z, int cost)
: x_(x), y_(y), z_(z), m_cost(cost)
{}
};
struct Elf : GameObject
{
Elf() = default;
Elf(int x, int y, int z, int cost)
: GameObject(x, y, z, cost)
{
std::cout << "Elf created" << '\n';
}
~Elf() noexcept
{
std::cout << "Elf destroyed" << '\n';
}
std::string m_cry = "\nA hymn for Gandalf\n";
};
struct Dwarf : GameObject
{
Dwarf() = default;
Dwarf(int x, int y, int z, int cost)
: GameObject(x, y, z, cost)
{
std::cout << "dwarf created" << '\n';
}
~Dwarf() noexcept
{
std::cout << "dwarf destroyed" << '\n';
}
std::string m_cry = "\nFind more cheer in a graveyard\n";
};
int elvenFunc()
{
thread_local ObjectPool<Elf> elvenPool{ 229 };
for (int i = 0; i < elvenPool.getSize(); ++i)
{
Elf* elf = elvenPool.construct(i, i + 1, i + 2, 100);
std::cout << elf->m_cry << '\n';
elvenPool.destroy(elf);
}
thread_local std::promise<int> pr;
pr.set_value(rand());
return 1024;
}
int dwarvenFunc()
{
thread_local ObjectPool<Dwarf> dwarvenPool{ 256 };
for (int i = 0; i < dwarvenPool.getSize(); ++i)
{
Dwarf* dwarf = dwarvenPool.construct(i - 1, i - 2, i - 3, 100);
std::cout << dwarf->m_cry << '\n';
dwarvenPool.destroy(dwarf);
}
thread_local std::promise<int> pr;
pr.set_value(rand());
return 2048;
}
int main()
{
std::ios_base::sync_with_stdio(false);
srand(time(0));
std::vector<std::future<int>> vec{ 4 };
vec.emplace_back(std::async(std::launch::async, elvenFunc));
vec.emplace_back(std::async(std::launch::async, elvenFunc));
vec.emplace_back(std::async(std::launch::async, dwarvenFunc));
vec.emplace_back(std::async(std::launch::async, dwarvenFunc));
int term = 0;
try
{
std::for_each(std::execution::par, vec.begin(), vec.end(), [&term](std::future<int>& t)
{
auto ret = t.get();
std::cout << "thread brought me " << ret << '\n';
term += ret;
});
}
catch (const std::exception& ex)
{
std::cout << ex.what() << '\n';
}
std::cout << "Final word = " << term << '\n';
}
(the construct and destroy call allocate and deallocate internally.) I get a lot of expected output from the terminal, but somewhere along the lines abort gets called and the program doesn't complete normally. I can't find out why. I believe that the t.get() call of a thread that's been started with std::async automatically calls .join too right?
Using C++17 and Visual Studio 2017. What am I doing wrong?
You have undefined behaviour. You call get on non-valid future. Your vector of futures has first 4 items as empty future. get can be called only if future::valid returns true.
What do you think this line
std::vector<std::future<int>> vec{ 4 };
does ?
Default constructor. Constructs a std::future with no shared state.
After construction, valid() == false.
And here you can read what happens when future::get is called:
The behavior is undefined if valid() is false before the call to this
function.
I am trying to implement observer design pattern in C++ as below
#include <iostream>
#include <vector>
using namespace std;
class observer
{
public:
observer() = default;
~observer() = default;
virtual void notify() = 0;
};
class subject
{
vector <observer *> vec;
public:
subject() = default;
~subject() = default;
void _register(observer *obj)
{
vec.push_back(obj);
}
void unregister(observer *obj)
{
int i;
for(i = 0; i < vec.size(); i++)
{
if(vec[i] == obj)
{
cout << "found elem. unregistering" << endl;
vec.erase(vec.begin() + i);
break;
}
}
if(i == vec.size())
{
cout << "elem not found to unregister" << endl;
}
}
void notify()
{
vector <observer *>::iterator it = vec.begin();
while(it != vec.end())
{
(*it)->notify();
it ++;
}
}
};
class obsone : public observer
{
void notify()
{
cout << "in obsone notify" << endl;
}
};
class obstwo : public observer
{
void notify()
{
cout << "in obstwo notify" << endl;
}
};
int main()
{
subject sub;
obsone *one = new obsone();
obstwo *two = new obstwo();
sub._register(one);
sub._register(two);
sub.notify();
sub.unregister(one);
sub.notify();
//delete two;
//sub.notify();
return 0;
}
I am registering the objects with the subject explicitly. Is it the correct way of doing it or do I need to register through observer class only. Are there any problems with the above approach?
Here's an example of doing the callbacks with lambdas and function objects in the callback collection.
The details can vary greatly! So, this code is not “the” way, but just your code rewritten in one specific way, out of a myriad possibilities. But it hopefully shows the general idea in modern C++.
#include <iostream>
#include <functional> // std::function
#include <stdint.h> // uint64_t
#include <unordered_map> // std::unordered_map
#include <utility> // std::move
#include <vector> // std::vector
using namespace std;
namespace my
{
using Callback = function<void()>;
template< class Key, class Value > using Map_ = unordered_map<Key, Value>;
class Subject
{
public:
enum Id: uint64_t {};
private:
Map_<uint64_t, Callback> m_callbacks;
static auto id_value()
-> uint64_t&
{
static uint64_t the_id;
return the_id;
}
public:
auto add_listener( Callback cb )
-> Id
{
const auto id = Id( ++id_value() );
m_callbacks.emplace( id, move( cb ) );
return id;
}
auto remove_listener( const Id id )
-> bool
{
const auto it = m_callbacks.find( id );
if( it == m_callbacks.end() )
{
return false;
}
m_callbacks.erase( it );
return true;
}
void notify_all() const
{
for( const auto& pair : m_callbacks )
{
pair.second();
}
}
};
}
struct Observer_1
{
void notify() { cout << "Observer_1::notify() called." << endl; }
};
struct Observer_2
{
void notify() { cout << "Observer_2::notify() called." << endl; }
};
auto main()
-> int
{
my::Subject subject;
Observer_1 one;
Observer_2 two;
using Id = my::Subject::Id;
const Id listener_id_1 = subject.add_listener( [&]{ one.notify(); } );
const Id listener_id_2 = subject.add_listener( [&]{ two.notify(); } );
cout << "After adding two listeners:" << endl;
subject.notify_all();
cout << endl;
subject.remove_listener( listener_id_1 )
and (cout << "Removed listener 1." << endl)
or (cout << "Did not find registration of listener 1." << endl);
cout << endl;
cout << "After removing or attempting to remove listener 1:" << endl;
subject.notify_all();
}
I have been looking for a way to get around the slowness of the dynamic cast type checking. Before you start saying I should redesign everything, let me inform you that the design was decided on 5 years ago. I can't fix all 400,000 lines of code that came after (I wish I could), but I can make some changes. I have run this little test on type identification:
#include <iostream>
#include <typeinfo>
#include <stdint.h>
#include <ctime>
using namespace std;
#define ADD_TYPE_ID \
static intptr_t type() { return reinterpret_cast<intptr_t>(&type); }\
virtual intptr_t getType() { return type(); }
struct Base
{
ADD_TYPE_ID;
};
template <typename T>
struct Derived : public Base
{
ADD_TYPE_ID;
};
int main()
{
Base* b = new Derived<int>();
cout << "Correct Type: " << (b->getType() == Derived<int>::type()) << endl; // true
cout << "Template Type: " << (b->getType() == Derived<float>::type()) << endl; // false
cout << "Base Type: " << (b->getType() == Base::type()) << endl; // false
clock_t begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
if (b->getType() == Derived<int>::type())
Derived <int>* d = static_cast<Derived<int>*> (b);
}
}
clock_t end = clock();
double elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
Derived<int>* d = dynamic_cast<Derived<int>*>(b);
if (d);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
Derived<int>* d = dynamic_cast<Derived<int>*>(b);
if ( typeid(d) == typeid(Derived<int>*) )
static_cast<Derived<int>*> (b);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
return 0;
}
It seems that using the class id (first times solution above) would be the fastest way to do type-checking at runtime.
Will this cause any problems with threading? Is there a better way to check for types at runtime (with not much re-factoring)?
Edit: Might I also add that this needs to work with the TI compilers, which currently only support up to '03
First off, note that there's a big difference between dynamic_cast and RTTI: The cast tells you whether you can treat a base object as some further derived, but not necessarily most-derived object. RTTI tells you the precise most-derived type. Naturally the former is more powerful and more expensive.
So then, there are two natural ways you can select on types if you have a polymorphic hierarchy. They're different; use the one that actually applies.
void method1(Base * p)
{
if (Derived * q = dynamic_cast<Derived *>(p))
{
// use q
}
}
void method2(Base * p)
{
if (typeid(*p) == typeid(Derived))
{
auto * q = static_cast<Derived *>(p);
// use q
}
}
Note also that method 2 is not generally available if the base class is a virtual base. Neither method applies if your classes are not polymorphic.
In a quick test I found method 2 to be significantly faster than your manual ID-based solution, which in turn is faster than the dynamic cast solution (method 1).
How about comparing the classes' virtual function tables?
Quick and dirty proof of concept:
void* instance_vtbl(void* c)
{
return *(void**)c;
}
template<typename C>
void* class_vtbl()
{
static C c;
return instance_vtbl(&c);
}
// ...
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
if (instance_vtbl(b) == class_vtbl<Derived<int>>())
Derived <int>* d = static_cast<Derived<int>*> (b);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
With Visual C++'s /Ox switch, this appears 3x faster than the type/getType trick.
Given this type of code
class A {
};
class B : public A {
}
A * a;
B * b = dynamic_cast<B*> (a);
if( b != 0 ) // do something B specific
The polymorphic (right?) way to fix it is something like this
class A {
public:
virtual void specific() { /* do nothing */ }
};
class B : public A {
public:
virtual void specific() { /* do something B specific */ }
}
A * a;
if( a != 0 ) a->specific();
When MSVC 2005 first came out, dynamic_cast<> for 64-bit code was much slower than for 32-bit code. We wanted a quick and easy fix. This is what our code looks like. It probably violates all kinds of good design rules, but the conversion to remove dynamic_cast<> can be automated with a script.
class dbbMsgEph {
public:
virtual dbbResultEph * CastResultEph() { return 0; }
virtual const dbbResultEph * CastResultEph() const { return 0; }
};
class dbbResultEph : public dbbMsgEph {
public:
virtual dbbResultEph * CastResultEph() { return this; }
virtual const dbbResultEph * CastResultEph() const { return this; }
static dbbResultEph * Cast( dbbMsgEph * );
static const dbbResultEph * Cast( const dbbMsgEph * );
};
dbbResultEph *
dbbResultEph::Cast( dbbMsgEph * arg )
{
if( arg == 0 ) return 0;
return arg->CastResultEph();
}
const dbbResultEph *
dbbResultEph::Cast( const dbbMsgEph * arg )
{
if( arg == 0 ) return 0;
return arg->CastResultEph();
}
When we used to have
dbbMsgEph * pMsg;
dbbResultEph * pResult = dynamic_cast<dbbResultEph *> (pMsg);
we changed it to
dbbResultEph * pResult = dbbResultEph::Cast (pMsg);
using a simple sed(1) script. And virtual function calls are pretty efficient.
//in release module(VS2008) this is true:
cout << "Base Type: " << (b->getType() == Base::type()) << endl;
I guess it's because the optimization.So I change the implementation of Derived::type()
template <typename T>
struct Derived : public Base
{
static intptr_t type()
{
cout << "different type()" << endl;
return reinterpret_cast<intptr_t>(&type);
}
virtual intptr_t getType() { return type(); }
};
Then it's different.So how to deal with it if use this method???
This question already has answers here:
When should static_cast, dynamic_cast, const_cast, and reinterpret_cast be used?
(11 answers)
Closed 9 years ago.
What would be a genuinely-acceptable example of usage for a derived cast? I have always thought they are only used when implementing "hacks" but if this is not the case, could someone give an acceptable example of when to use one?
#user997112
[edit at bottom]
Hello. Below we use a collection of random polymorphic pointers with common ancestor
through the common interface.
Additional work is done with one of the particular derived classes
we need dynamic_cast or typeid to know this ....
main function has the call
then class declarations
then the dynamic cast is at end
delete of objects created with new is not shown
#include <iostream>
#include <algorithm>
#include <random>
#include <exception>
using namespace std;
int dynamic_test();
int main()
{
cout << "Hello world!" << endl;
dynamic_test();
return 0;
}
............
class basex {
public:
virtual ~basex() {};
virtual void work() const = 0;
};
class next1x : public basex {
public:
void work() const override {cout << "1";/*secret*/}
};
class next2x : public basex {
public:
void work() const override {cout << "2";/*secret*/}
};
class next3x : public basex {
public:
void work() const override {cout << "3";/*secret*/}
};
std::vector<basex *> secret_class_picker()
{
//pick classes with common base at random
std::random_device rd;
std::uniform_int_distribution<int> ud(1,3);
std::mt19937 mt(rd());
std::vector<int> random_v;
for (int i = 0; i < 22; ++i)
random_v.push_back( ud(mt) );
cout << "Random" << endl;
for ( auto bq : random_v) //inspecting for human reader
cout << bq << " ";
std::vector<basex *> v;
basex * bptr;
for (auto bq : random_v) {
switch(bq)
{
default: throw std::exception(); break;
case 1: bptr = new next1x; break;
case 2: bptr = new next2x; break;
case 3: bptr = new next3x; break;
}
v.push_back(bptr);
}
cout << "Objects Created " << v.size() << endl;
return v;
}
//this function demands a more derived class
int special_work(const next3x *)
{
//elided
cout <<"[!]";
return 0;
}
int dynamic_test()
{
std::vector<basex *> v = secret_class_picker();//delete these pointer later
cout <<"Working with random polymorphic pointers"<<endl;
for (const auto bq : v)
{
bq->work();//polymorphic
next3x * ptr = dynamic_cast<next3x *>(bq);
if (nullptr != ptr) special_work(ptr); //reserved for particular type
}
return 0;
}
...................... alternative
int dynamic_static_typeid()
{
std::vector<basex *> v = secret_class_picker();
cout <<"Working with random polymorphic pointers"<<endl;
int k(0);
for (const auto bq : v)
{
bool flipflop = (k % 2) == 0;
bq->work();//polymorphic
//cout << "[*]"<< typeid(*bq).name();//dereference
if (flipflop) {
next3x * dc_ptr = dynamic_cast<next3x *>(bq);//not constant time in general
if (nullptr != dc_ptr) {
special_work(dc_ptr); //reserved for particular type
++k;
}
}
else {
if (typeid(next3x) == typeid(*bq)){//constant time
auto sc_ptr = static_cast<next3x *>(bq);//constant time
special_work(sc_ptr);
++k; cout <<"[sc]";
}
}
cout << endl;
}
return 0;
}
I read that using a policy class for a function that will be called in a tight loop is much faster than using a polymorphic function. However, I setup this demo and the timing indicates that it is exactly the opposite!? The policy version takes between 2-3x longer than the polymorphic version.
#include <iostream>
#include <boost/timer.hpp>
// Policy version
template < typename operation_policy>
class DoOperationPolicy : public operation_policy
{
using operation_policy::Operation;
public:
void Run(const float a, const float b)
{
Operation(a,b);
}
};
class OperationPolicy_Add
{
protected:
float Operation(const float a, const float b)
{
return a + b;
}
};
// Polymorphic version
class DoOperation
{
public:
virtual float Run(const float a, const float b)= 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
int main()
{
boost::timer timer;
unsigned int numberOfIterations = 1e7;
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
policy_operation.Run(1,2);
}
std::cout << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
polymorphic_operation->Run(1,2);
}
std::cout << timer.elapsed() << " seconds." << std::endl;
}
Is there something wrong with the demo? Or is just incorrect that the policy should be faster?
Your benchmark is meaningless (sorry).
Making real benchmarks is hard, unfortunately, as compilers are very clever.
Things to look for here:
devirtualization: the polymorphic call is expected to be slower because it is supposed to be virtual, but here the compiler can realize than polymorphic_operation is necessarily a OperationAdd and thus directly call OperationAdd::Run without invoking runtime dispatch
inlining: since the compiler has access to the methods body, it can inline them, and avoid the function calls altogether.
"dead store removal": values that are not used need not be stored, and the computations that lead to them and do not provoke side-effects can be avoided entirely.
Indeed, your entire benchmark code can be optimized to:
int main()
{
boost::timer timer;
std::cout << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
std::cout << timer.elapsed() << " seconds." << std::endl;
}
Which is when you realize that you are not timing what you'd like to...
In order to make your benchmark meaningful you need to:
prevent devirtualization
force side-effects
To prevent devirtualization, just declare a DoOperation& Get() function, and then in another cpp file: DoOperation& Get() { static OperationAdd O; return O; }.
To force side-effects (only necessary if the methods are inlined): return the value and accumulate it, then display it.
In action using this program:
// test2.cpp
namespace so8746025 {
class DoOperation
{
public:
virtual float Run(const float a, const float b) = 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
class OperationAddOutOfLine: public DoOperation
{
public:
float Run(const float a, const float b);
};
float OperationAddOutOfLine::Run(const float a, const float b)
{
return a + b;
}
DoOperation& GetInline() {
static OperationAdd O;
return O;
}
DoOperation& GetOutOfLine() {
static OperationAddOutOfLine O;
return O;
}
} // namespace so8746025
// test.cpp
#include <iostream>
#include <boost/timer.hpp>
namespace so8746025 {
// Policy version
template < typename operation_policy>
struct DoOperationPolicy
{
float Run(const float a, const float b)
{
return operation_policy::Operation(a,b);
}
};
struct OperationPolicy_Add
{
static float Operation(const float a, const float b)
{
return a + b;
}
};
// Polymorphic version
class DoOperation
{
public:
virtual float Run(const float a, const float b) = 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
class OperationAddOutOfLine: public DoOperation
{
public:
float Run(const float a, const float b);
};
DoOperation& GetInline();
DoOperation& GetOutOfLine();
} // namespace so8746025
using namespace so8746025;
int main()
{
unsigned int numberOfIterations = 1e8;
DoOperationPolicy<OperationPolicy_Add> policy;
OperationAdd stackInline;
DoOperation& virtualInline = GetInline();
OperationAddOutOfLine stackOutOfLine;
DoOperation& virtualOutOfLine = GetOutOfLine();
boost::timer timer;
float result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i) {
result += policy.Run(1,2);
}
std::cout << "Policy: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += stackInline.Run(1,2);
}
std::cout << "Stack Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += virtualInline.Run(1,2);
}
std::cout << "Virtual Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += stackOutOfLine.Run(1,2);
}
std::cout << "Stack Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += virtualOutOfLine.Run(1,2);
}
std::cout << "Virtual Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
}
We get:
$ gcc --version
gcc (GCC) 4.3.2
$ ./testR
Policy: 0.17 seconds (6.71089e+07)
Stack Inline: 0.17 seconds (6.71089e+07)
Virtual Inline: 0.52 seconds (6.71089e+07)
Stack Out Of Line: 0.6 seconds (6.71089e+07)
Virtual Out Of Line: 0.59 seconds (6.71089e+07)
Note the subtle difference between devirtualization + inline and the absence of devirtualization.
FWIW I made it
a policy, as opposed to mixn
return the value
use a volatile to avoid optimizing away of the loop and unrelated optimization of the loop (like, reducing load/stores due to loop unrolling and vectorization on targets that support it).
compare with a direct, static function call
use way more iterations
compile with -O3 on gcc
Timings are:
DoDirect: 3.4 seconds.
Policy: 3.41 seconds.
Polymorphic: 3.4 seconds.
Ergo: there is no difference. Mainly because GCC is able to statically analyze the type of DoOperation* to be DoOperationAdd - there is vtable lookup inside the loop :)
IMPORTANT
If you wanted to benchmark reallife performance of this exact loop, instead of function invocation overhead, drop the volatile. The timings now become
DoDirect: 6.71089e+07 in 1.12 seconds.
Policy: 6.71089e+07 in 1.15 seconds.
Polymorphic: 6.71089e+07 in 3.38 seconds.
As you can see, without volatile, the compiler is able to optimize some load-store cycles away; I assume it might be doing loop unrolling+register allocation there (however I haven't inspected the machine code). The point is, that the loop as a whole can be optimized much more with the 'policy' approach than with the dynamic dispatch (i.e. the virtual method)
CODE
#include <iostream>
#include <boost/timer.hpp>
// Direct version
struct DoDirect {
static float Run(const float a, const float b) { return a + b; }
};
// Policy version
template <typename operation_policy>
struct DoOperationPolicy {
float Run(const float a, const float b) const {
return operation_policy::Operation(a,b);
}
};
struct OperationPolicy_Add {
static float Operation(const float a, const float b) {
return a + b;
}
};
// Polymorphic version
struct DoOperation {
virtual float Run(const float a, const float b) const = 0;
};
struct OperationAdd : public DoOperation {
float Run(const float a, const float b) const { return a + b; }
};
int main(int argc, const char *argv[])
{
boost::timer timer;
const unsigned long numberOfIterations = 1<<30ul;
volatile float result = 0;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += DoDirect::Run(1,2);
}
std::cout << "DoDirect: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += policy_operation.Run(1,2);
}
std::cout << "Policy: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
timer.restart();
result = 0;
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += polymorphic_operation->Run(1,2);
}
std::cout << "Polymorphic: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
}
Turn on optimisation. The policy-based variant profits highly from that because most intermediate steps are completely optimised out, while the polymorphic version cannot skip for example the dereferencing of the object.
You have to turn on optimization, and make sure that
both code parts actually do the same thing (which they currently do not, your policy-variant does not return the result)
the result is used for something, so that the compiler does not discard the code path altogether (just sum the results and print them somewhere should be enough)
I had to change your policy code to return the computed value:
float Run(const float a, const float b)
{
return Operation(a,b);
}
Secondly, I had to store the returned value to ensure that the loop wouldn't be optimized away:
int main()
{
unsigned int numberOfIterations = 1e9;
float answer = 0.0;
boost::timer timer;
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
answer += policy_operation.Run(1,2);
}
std::cout << "Policy got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;
answer = 0.0;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
answer += polymorphic_operation->Run(1,2);
}
std::cout << "Polymo got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;
return 0;
}
Without optimizations on g++ 4.1.2:
Policy got 6.71089e+07 in 13.75 seconds
Polymo got 6.71089e+07 in 7.52 seconds
With -O3 on g++ 4.1.2:
Policy got 6.71089e+07 in 1.18 seconds
Polymo got 6.71089e+07 in 3.23 seconds
So the policy is definitely faster once optimizations are turned on.