Policy vs polymorphic speed - c++

I read that using a policy class for a function that will be called in a tight loop is much faster than using a polymorphic function. However, I setup this demo and the timing indicates that it is exactly the opposite!? The policy version takes between 2-3x longer than the polymorphic version.
#include <iostream>
#include <boost/timer.hpp>
// Policy version
template < typename operation_policy>
class DoOperationPolicy : public operation_policy
{
using operation_policy::Operation;
public:
void Run(const float a, const float b)
{
Operation(a,b);
}
};
class OperationPolicy_Add
{
protected:
float Operation(const float a, const float b)
{
return a + b;
}
};
// Polymorphic version
class DoOperation
{
public:
virtual float Run(const float a, const float b)= 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
int main()
{
boost::timer timer;
unsigned int numberOfIterations = 1e7;
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
policy_operation.Run(1,2);
}
std::cout << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
polymorphic_operation->Run(1,2);
}
std::cout << timer.elapsed() << " seconds." << std::endl;
}
Is there something wrong with the demo? Or is just incorrect that the policy should be faster?

Your benchmark is meaningless (sorry).
Making real benchmarks is hard, unfortunately, as compilers are very clever.
Things to look for here:
devirtualization: the polymorphic call is expected to be slower because it is supposed to be virtual, but here the compiler can realize than polymorphic_operation is necessarily a OperationAdd and thus directly call OperationAdd::Run without invoking runtime dispatch
inlining: since the compiler has access to the methods body, it can inline them, and avoid the function calls altogether.
"dead store removal": values that are not used need not be stored, and the computations that lead to them and do not provoke side-effects can be avoided entirely.
Indeed, your entire benchmark code can be optimized to:
int main()
{
boost::timer timer;
std::cout << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
std::cout << timer.elapsed() << " seconds." << std::endl;
}
Which is when you realize that you are not timing what you'd like to...
In order to make your benchmark meaningful you need to:
prevent devirtualization
force side-effects
To prevent devirtualization, just declare a DoOperation& Get() function, and then in another cpp file: DoOperation& Get() { static OperationAdd O; return O; }.
To force side-effects (only necessary if the methods are inlined): return the value and accumulate it, then display it.
In action using this program:
// test2.cpp
namespace so8746025 {
class DoOperation
{
public:
virtual float Run(const float a, const float b) = 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
class OperationAddOutOfLine: public DoOperation
{
public:
float Run(const float a, const float b);
};
float OperationAddOutOfLine::Run(const float a, const float b)
{
return a + b;
}
DoOperation& GetInline() {
static OperationAdd O;
return O;
}
DoOperation& GetOutOfLine() {
static OperationAddOutOfLine O;
return O;
}
} // namespace so8746025
// test.cpp
#include <iostream>
#include <boost/timer.hpp>
namespace so8746025 {
// Policy version
template < typename operation_policy>
struct DoOperationPolicy
{
float Run(const float a, const float b)
{
return operation_policy::Operation(a,b);
}
};
struct OperationPolicy_Add
{
static float Operation(const float a, const float b)
{
return a + b;
}
};
// Polymorphic version
class DoOperation
{
public:
virtual float Run(const float a, const float b) = 0;
};
class OperationAdd : public DoOperation
{
public:
float Run(const float a, const float b)
{
return a + b;
}
};
class OperationAddOutOfLine: public DoOperation
{
public:
float Run(const float a, const float b);
};
DoOperation& GetInline();
DoOperation& GetOutOfLine();
} // namespace so8746025
using namespace so8746025;
int main()
{
unsigned int numberOfIterations = 1e8;
DoOperationPolicy<OperationPolicy_Add> policy;
OperationAdd stackInline;
DoOperation& virtualInline = GetInline();
OperationAddOutOfLine stackOutOfLine;
DoOperation& virtualOutOfLine = GetOutOfLine();
boost::timer timer;
float result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i) {
result += policy.Run(1,2);
}
std::cout << "Policy: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += stackInline.Run(1,2);
}
std::cout << "Stack Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += virtualInline.Run(1,2);
}
std::cout << "Virtual Inline: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += stackOutOfLine.Run(1,2);
}
std::cout << "Stack Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
timer.restart();
result = 0;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
result += virtualOutOfLine.Run(1,2);
}
std::cout << "Virtual Out Of Line: " << timer.elapsed() << " seconds (" << result << ")" << std::endl;
}
We get:
$ gcc --version
gcc (GCC) 4.3.2
$ ./testR
Policy: 0.17 seconds (6.71089e+07)
Stack Inline: 0.17 seconds (6.71089e+07)
Virtual Inline: 0.52 seconds (6.71089e+07)
Stack Out Of Line: 0.6 seconds (6.71089e+07)
Virtual Out Of Line: 0.59 seconds (6.71089e+07)
Note the subtle difference between devirtualization + inline and the absence of devirtualization.

FWIW I made it
a policy, as opposed to mixn
return the value
use a volatile to avoid optimizing away of the loop and unrelated optimization of the loop (like, reducing load/stores due to loop unrolling and vectorization on targets that support it).
compare with a direct, static function call
use way more iterations
compile with -O3 on gcc
Timings are:
DoDirect: 3.4 seconds.
Policy: 3.41 seconds.
Polymorphic: 3.4 seconds.
Ergo: there is no difference. Mainly because GCC is able to statically analyze the type of DoOperation* to be DoOperationAdd - there is vtable lookup inside the loop :)
IMPORTANT
If you wanted to benchmark reallife performance of this exact loop, instead of function invocation overhead, drop the volatile. The timings now become
DoDirect: 6.71089e+07 in 1.12 seconds.
Policy: 6.71089e+07 in 1.15 seconds.
Polymorphic: 6.71089e+07 in 3.38 seconds.
As you can see, without volatile, the compiler is able to optimize some load-store cycles away; I assume it might be doing loop unrolling+register allocation there (however I haven't inspected the machine code). The point is, that the loop as a whole can be optimized much more with the 'policy' approach than with the dynamic dispatch (i.e. the virtual method)
CODE
#include <iostream>
#include <boost/timer.hpp>
// Direct version
struct DoDirect {
static float Run(const float a, const float b) { return a + b; }
};
// Policy version
template <typename operation_policy>
struct DoOperationPolicy {
float Run(const float a, const float b) const {
return operation_policy::Operation(a,b);
}
};
struct OperationPolicy_Add {
static float Operation(const float a, const float b) {
return a + b;
}
};
// Polymorphic version
struct DoOperation {
virtual float Run(const float a, const float b) const = 0;
};
struct OperationAdd : public DoOperation {
float Run(const float a, const float b) const { return a + b; }
};
int main(int argc, const char *argv[])
{
boost::timer timer;
const unsigned long numberOfIterations = 1<<30ul;
volatile float result = 0;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += DoDirect::Run(1,2);
}
std::cout << "DoDirect: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
timer.restart();
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += policy_operation.Run(1,2);
}
std::cout << "Policy: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
timer.restart();
result = 0;
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned long i = 0; i < numberOfIterations; ++i) {
result += polymorphic_operation->Run(1,2);
}
std::cout << "Polymorphic: " << result << " in " << timer.elapsed() << " seconds." << std::endl;
}

Turn on optimisation. The policy-based variant profits highly from that because most intermediate steps are completely optimised out, while the polymorphic version cannot skip for example the dereferencing of the object.

You have to turn on optimization, and make sure that
both code parts actually do the same thing (which they currently do not, your policy-variant does not return the result)
the result is used for something, so that the compiler does not discard the code path altogether (just sum the results and print them somewhere should be enough)

I had to change your policy code to return the computed value:
float Run(const float a, const float b)
{
return Operation(a,b);
}
Secondly, I had to store the returned value to ensure that the loop wouldn't be optimized away:
int main()
{
unsigned int numberOfIterations = 1e9;
float answer = 0.0;
boost::timer timer;
DoOperationPolicy<OperationPolicy_Add> policy_operation;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
answer += policy_operation.Run(1,2);
}
std::cout << "Policy got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;
answer = 0.0;
timer.restart();
DoOperation* polymorphic_operation = new OperationAdd;
for(unsigned int i = 0; i < numberOfIterations; ++i)
{
answer += polymorphic_operation->Run(1,2);
}
std::cout << "Polymo got " << answer << " in " << timer.elapsed() << " seconds" << std::endl;
return 0;
}
Without optimizations on g++ 4.1.2:
Policy got 6.71089e+07 in 13.75 seconds
Polymo got 6.71089e+07 in 7.52 seconds
With -O3 on g++ 4.1.2:
Policy got 6.71089e+07 in 1.18 seconds
Polymo got 6.71089e+07 in 3.23 seconds
So the policy is definitely faster once optimizations are turned on.

Related

Abort() when starting up a few threads with std::vector<std::future<int>> and std::async

I am using 4 threads to create a few objects using a thread_local memory pool.
I am using std::vector<std::future<int>> and std::async(std::launch::async, function); to dispatch the threads and std::for_each with t.get to get their value back. Here's the code:
struct GameObject
{
int x_, y_, z_;
int m_cost;
GameObject() = default;
GameObject(int x, int y, int z, int cost)
: x_(x), y_(y), z_(z), m_cost(cost)
{}
};
struct Elf : GameObject
{
Elf() = default;
Elf(int x, int y, int z, int cost)
: GameObject(x, y, z, cost)
{
std::cout << "Elf created" << '\n';
}
~Elf() noexcept
{
std::cout << "Elf destroyed" << '\n';
}
std::string m_cry = "\nA hymn for Gandalf\n";
};
struct Dwarf : GameObject
{
Dwarf() = default;
Dwarf(int x, int y, int z, int cost)
: GameObject(x, y, z, cost)
{
std::cout << "dwarf created" << '\n';
}
~Dwarf() noexcept
{
std::cout << "dwarf destroyed" << '\n';
}
std::string m_cry = "\nFind more cheer in a graveyard\n";
};
int elvenFunc()
{
thread_local ObjectPool<Elf> elvenPool{ 229 };
for (int i = 0; i < elvenPool.getSize(); ++i)
{
Elf* elf = elvenPool.construct(i, i + 1, i + 2, 100);
std::cout << elf->m_cry << '\n';
elvenPool.destroy(elf);
}
thread_local std::promise<int> pr;
pr.set_value(rand());
return 1024;
}
int dwarvenFunc()
{
thread_local ObjectPool<Dwarf> dwarvenPool{ 256 };
for (int i = 0; i < dwarvenPool.getSize(); ++i)
{
Dwarf* dwarf = dwarvenPool.construct(i - 1, i - 2, i - 3, 100);
std::cout << dwarf->m_cry << '\n';
dwarvenPool.destroy(dwarf);
}
thread_local std::promise<int> pr;
pr.set_value(rand());
return 2048;
}
int main()
{
std::ios_base::sync_with_stdio(false);
srand(time(0));
std::vector<std::future<int>> vec{ 4 };
vec.emplace_back(std::async(std::launch::async, elvenFunc));
vec.emplace_back(std::async(std::launch::async, elvenFunc));
vec.emplace_back(std::async(std::launch::async, dwarvenFunc));
vec.emplace_back(std::async(std::launch::async, dwarvenFunc));
int term = 0;
try
{
std::for_each(std::execution::par, vec.begin(), vec.end(), [&term](std::future<int>& t)
{
auto ret = t.get();
std::cout << "thread brought me " << ret << '\n';
term += ret;
});
}
catch (const std::exception& ex)
{
std::cout << ex.what() << '\n';
}
std::cout << "Final word = " << term << '\n';
}
(the construct and destroy call allocate and deallocate internally.) I get a lot of expected output from the terminal, but somewhere along the lines abort gets called and the program doesn't complete normally. I can't find out why. I believe that the t.get() call of a thread that's been started with std::async automatically calls .join too right?
Using C++17 and Visual Studio 2017. What am I doing wrong?
You have undefined behaviour. You call get on non-valid future. Your vector of futures has first 4 items as empty future. get can be called only if future::valid returns true.
What do you think this line
std::vector<std::future<int>> vec{ 4 };
does ?
Default constructor. Constructs a std::future with no shared state.
After construction, valid() == false.
And here you can read what happens when future::get is called:
The behavior is undefined if valid() is false before the call to this
function.

Is virtual function call slower than dynamic_cast?

I know that dynamic_cast has a serious cost, but when I try the following codes, I get a bigger value almost every time from virtual function call loop. Do I know wrong until this time?
EDIT: The problem was that my compiler had been in debug mode. When I switched to release mode, virtual function call loop runs 5 to 7 times faster than dynamic_cast loop.
struct A {
virtual void foo() {}
};
struct B : public A {
virtual void foo() override {}
};
struct C : public B {
virtual void foo() override {}
};
int main()
{
vector<A *> vec;
for (int i = 0; i < 100000; ++i)
if (i % 2)
vec.push_back(new C());
else
vec.push_back(new B());
clock_t begin = clock();
for (auto iter : vec)
if (dynamic_cast<C*>(iter))
;
clock_t end = clock();
cout << (static_cast<double>(end) - begin) / CLOCKS_PER_SEC << endl;
begin = clock();
for (auto iter : vec)
iter->foo();
end = clock();
cout << (static_cast<double>(end) - begin) / CLOCKS_PER_SEC << endl;
return 0;
}
Since you are not doing anything with the result of the dynamic_cast in the lines
for (auto iter : vec)
if (dynamic_cast<C*>(iter))
;
the compiler might be optimizing away most of that code if not all of it.
If you do something useful with the result of the dynamic_cast, you might see a difference. You could try:
for (auto iter : vec)
{
if (C* cptr = dynamic_cast<C*>(iter))
{
cptr->foo();
}
if (B* bptr = dynamic_cast<B*>(iter))
{
bptr->foo();
}
}
That will most likely make a difference.
See http://ideone.com/BvqoqU for a sample run.
Do I know wrong until this time?
We probably can not tell from your code. The optimizer is clever, and it is some times quite challenging to 'defeat' or 'deceive' it.
In the following, I use 'assert()' to try to control the optimizer's enthusiasm. Also note that 'time(0)' is a fast function on Ubuntu 15.10. I believe the compiler does not yet know what the combination will do, and thus will not remove it, providing a more reliable/repeatable measurement.
I think I like these results better, and perhaps these indicate that dynamic cast is slower than virtual function invocation.
Environment:
on an older Dell, using Ubuntu 15.10, 64 bit, and -O3
~$ g++-5 --version
g++-5 (Ubuntu 5.2.1-23ubuntu1~15.10) 5.2.1 20151028
Results (dynamic cast followed by virtual funtion):
void T523_t::testStruct()
0.443445
0.184873
void T523_t::testClass()
252,495 us
184,961 us
FINI 2914399 us
Code:
#include <chrono>
// 'compressed' chrono access --------------vvvvvvv
typedef std::chrono::high_resolution_clock HRClk_t; // std-chrono-hi-res-clk
typedef HRClk_t::time_point Time_t; // std-chrono-hi-res-clk-time-point
typedef std::chrono::milliseconds MS_t; // std-chrono-milliseconds
typedef std::chrono::microseconds US_t; // std-chrono-microseconds
typedef std::chrono::nanoseconds NS_t; // std-chrono-nanoseconds
using namespace std::chrono_literals; // support suffixes like 100ms, 2s, 30us
#include <iostream>
#include <iomanip>
#include <vector>
#include <cassert>
// original ////////////////////////////////////////////////////////////////////
struct A {
virtual ~A() = default; // warning: ‘struct A’ has virtual functions and
// accessible non-virtual destructor [-Wnon-virtual-dtor]
virtual void foo() { assert(time(0)); }
};
struct B : public A {
virtual void foo() override { assert(time(0)); }
};
struct C : public B {
virtual void foo() override { assert(time(0)); }
};
// with class ////////////////////////////////////////////////////////////////////////////
// If your C++ code has no class ... why bother?
class A_t {
public:
virtual ~A_t() = default; // warning: ‘struct A’ has virtual functions and
// accessible non-virtual destructor [-Wnon-virtual-dtor]
virtual void foo() { assert(time(0)); }
};
class B_t : public A_t {
public:
virtual void foo() override { assert(time(0)); }
};
class C_t : public B_t {
public:
virtual void foo() override { assert(time(0)); }
};
class T523_t
{
public:
T523_t() = default;
~T523_t() = default;
int exec()
{
testStruct();
testClass();
return(0);
}
private: // methods
std::string digiComma(std::string s)
{ //vvvvv--sSize must be signed int of sufficient size
int32_t sSize = static_cast<int32_t>(s.size());
if (sSize > 3)
for (int32_t indx = (sSize - 3); indx > 0; indx -= 3)
s.insert(static_cast<size_t>(indx), 1, ',');
return(s);
}
void testStruct()
{
using std::vector;
using std::cout; using std::endl;
std::cout << "\n\n " << __PRETTY_FUNCTION__ << std::endl;
vector<A *> vec;
for (int i = 0; i < 10000000; ++i)
if (i % 2)
vec.push_back(new C());
else
vec.push_back(new B());
clock_t begin = clock();
int i=0;
for (auto iter : vec)
{
if(i % 2) (assert(dynamic_cast<C*>(iter))); // if (dynamic_cast<C*>(iter)) {};
else (assert(dynamic_cast<B*>(iter)));
}
clock_t end = clock();
cout << "\n " << std::setw(8)
<< ((static_cast<double>(end) - static_cast<double>(begin))
/ CLOCKS_PER_SEC) << endl; //^^^^^^^^^^^^^^^^^^^^^^^^^^
// warning: conversion to ‘double’ from ‘clock_t {aka long int}’ may alter its value [-Wconversion]
begin = clock();
for (auto iter : vec)
iter->foo();
end = clock();
cout << "\n " << std::setw(8)
<< ((static_cast<double>(end) - static_cast<double>(begin))
/ CLOCKS_PER_SEC) << endl; //^^^^^^^^^^^^^^^^^^^^^^^^^^
// warning: conversion to ‘double’ from ‘clock_t {aka long int}’ may alter its value [-Wconversion]
}
void testClass()
{
std::cout << "\n\n " << __PRETTY_FUNCTION__ << std::endl;
std::vector<A_t *> APtrVec;
for (int i = 0; i < 10000000; ++i)
{
if (i % 2) APtrVec.push_back(new C_t());
else APtrVec.push_back(new B_t());
}
{
Time_t start_us = HRClk_t::now();
int i = 0;
for (auto Aptr : APtrVec)
{
if(i % 2) assert(dynamic_cast<C_t*>(Aptr)); // check for nullptr
else assert(dynamic_cast<B_t*>(Aptr)); // check for nullptr
++i;
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n " << std::setw(8)
<< digiComma(std::to_string(duration_us.count()))
<< " us" << std::endl;
}
{
Time_t start_us = HRClk_t::now();
for (auto Aptr : APtrVec) {
Aptr->foo();
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n " << std::setw(8)
<< digiComma(std::to_string(duration_us.count()))
<< " us" << std::endl;
}
}
}; // class T523_t
int main(int argc, char* argv[])
{
std::cout << "\nargc: " << argc << std::endl;
for (int i = 0; i < argc; i += 1) std::cout << argv[i] << " ";
std::cout << std::endl;
setlocale(LC_ALL, "");
std::ios::sync_with_stdio(false);
{ time_t t0 = std::time(nullptr); while(t0 == time(nullptr)) { /**/ }; }
Time_t start_us = HRClk_t::now();
int retVal = -1;
{
T523_t t523;
retVal = t523.exec();
}
auto duration_us = std::chrono::duration_cast<US_t>(HRClk_t::now() - start_us);
std::cout << "\n FINI " << (std::to_string(duration_us.count()))
<< " us" << std::endl;
return(retVal);
}
update 2017-08-31
I suspect many of you will object to performing the dynamic cast without using the result. Here is one possible approach by replacing the for-auto loop in testClass() method:
for (auto Aptr : APtrVec)
{
if(i % 2) { C_t* c = dynamic_cast<C_t*>(Aptr); assert(c); c->foo(); }
else { B_t* b = dynamic_cast<B_t*>(Aptr); assert(b); b->foo(); }
++i;
}
With results
void T523_t::testStruct()
0.443445
0.184873
void T523_t::testClass()
322,431 us
191,285 us
FINI 4156941 us
end update

Circumventing RTTI on legacy code

I have been looking for a way to get around the slowness of the dynamic cast type checking. Before you start saying I should redesign everything, let me inform you that the design was decided on 5 years ago. I can't fix all 400,000 lines of code that came after (I wish I could), but I can make some changes. I have run this little test on type identification:
#include <iostream>
#include <typeinfo>
#include <stdint.h>
#include <ctime>
using namespace std;
#define ADD_TYPE_ID \
static intptr_t type() { return reinterpret_cast<intptr_t>(&type); }\
virtual intptr_t getType() { return type(); }
struct Base
{
ADD_TYPE_ID;
};
template <typename T>
struct Derived : public Base
{
ADD_TYPE_ID;
};
int main()
{
Base* b = new Derived<int>();
cout << "Correct Type: " << (b->getType() == Derived<int>::type()) << endl; // true
cout << "Template Type: " << (b->getType() == Derived<float>::type()) << endl; // false
cout << "Base Type: " << (b->getType() == Base::type()) << endl; // false
clock_t begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
if (b->getType() == Derived<int>::type())
Derived <int>* d = static_cast<Derived<int>*> (b);
}
}
clock_t end = clock();
double elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
Derived<int>* d = dynamic_cast<Derived<int>*>(b);
if (d);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
Derived<int>* d = dynamic_cast<Derived<int>*>(b);
if ( typeid(d) == typeid(Derived<int>*) )
static_cast<Derived<int>*> (b);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
return 0;
}
It seems that using the class id (first times solution above) would be the fastest way to do type-checking at runtime.
Will this cause any problems with threading? Is there a better way to check for types at runtime (with not much re-factoring)?
Edit: Might I also add that this needs to work with the TI compilers, which currently only support up to '03
First off, note that there's a big difference between dynamic_cast and RTTI: The cast tells you whether you can treat a base object as some further derived, but not necessarily most-derived object. RTTI tells you the precise most-derived type. Naturally the former is more powerful and more expensive.
So then, there are two natural ways you can select on types if you have a polymorphic hierarchy. They're different; use the one that actually applies.
void method1(Base * p)
{
if (Derived * q = dynamic_cast<Derived *>(p))
{
// use q
}
}
void method2(Base * p)
{
if (typeid(*p) == typeid(Derived))
{
auto * q = static_cast<Derived *>(p);
// use q
}
}
Note also that method 2 is not generally available if the base class is a virtual base. Neither method applies if your classes are not polymorphic.
In a quick test I found method 2 to be significantly faster than your manual ID-based solution, which in turn is faster than the dynamic cast solution (method 1).
How about comparing the classes' virtual function tables?
Quick and dirty proof of concept:
void* instance_vtbl(void* c)
{
return *(void**)c;
}
template<typename C>
void* class_vtbl()
{
static C c;
return instance_vtbl(&c);
}
// ...
begin = clock();
{
for (size_t i = 0; i < 100000000; i++)
{
if (instance_vtbl(b) == class_vtbl<Derived<int>>())
Derived <int>* d = static_cast<Derived<int>*> (b);
}
}
end = clock();
elapsed = double(end - begin) / CLOCKS_PER_SEC;
cout << "Type elapsed: " << elapsed << endl;
With Visual C++'s /Ox switch, this appears 3x faster than the type/getType trick.
Given this type of code
class A {
};
class B : public A {
}
A * a;
B * b = dynamic_cast<B*> (a);
if( b != 0 ) // do something B specific
The polymorphic (right?) way to fix it is something like this
class A {
public:
virtual void specific() { /* do nothing */ }
};
class B : public A {
public:
virtual void specific() { /* do something B specific */ }
}
A * a;
if( a != 0 ) a->specific();
When MSVC 2005 first came out, dynamic_cast<> for 64-bit code was much slower than for 32-bit code. We wanted a quick and easy fix. This is what our code looks like. It probably violates all kinds of good design rules, but the conversion to remove dynamic_cast<> can be automated with a script.
class dbbMsgEph {
public:
virtual dbbResultEph * CastResultEph() { return 0; }
virtual const dbbResultEph * CastResultEph() const { return 0; }
};
class dbbResultEph : public dbbMsgEph {
public:
virtual dbbResultEph * CastResultEph() { return this; }
virtual const dbbResultEph * CastResultEph() const { return this; }
static dbbResultEph * Cast( dbbMsgEph * );
static const dbbResultEph * Cast( const dbbMsgEph * );
};
dbbResultEph *
dbbResultEph::Cast( dbbMsgEph * arg )
{
if( arg == 0 ) return 0;
return arg->CastResultEph();
}
const dbbResultEph *
dbbResultEph::Cast( const dbbMsgEph * arg )
{
if( arg == 0 ) return 0;
return arg->CastResultEph();
}
When we used to have
dbbMsgEph * pMsg;
dbbResultEph * pResult = dynamic_cast<dbbResultEph *> (pMsg);
we changed it to
dbbResultEph * pResult = dbbResultEph::Cast (pMsg);
using a simple sed(1) script. And virtual function calls are pretty efficient.
//in release module(VS2008) this is true:
cout << "Base Type: " << (b->getType() == Base::type()) << endl;
I guess it's because the optimization.So I change the implementation of Derived::type()
template <typename T>
struct Derived : public Base
{
static intptr_t type()
{
cout << "different type()" << endl;
return reinterpret_cast<intptr_t>(&type);
}
virtual intptr_t getType() { return type(); }
};
Then it's different.So how to deal with it if use this method???

cyclic negative number generation in C++

I have requirement as follows.
I have to generate increment negative numbers from -1 to -100 which is used a unique id for a request. Like it should be like this: -1, -2, -3, ...-100, -1, -2, and so on. How can I do this effectively? I am not supposed to use Boost. C++ STL is fine. I prefer to write simple function like int GetNextID() and it should generate ID. Request sample program on how to do this effectively?
Thanks for your time and help
int ID = -1;
auto getnext = [=] mutable {
if (ID == -100) ID = -1;
return ID--;
};
Fairly basic stuff here, really. If you have to ask somebody on the Interwebs to write this program for you, you should really consider finding some educational material in C++.
I love the functor solution:
template <int limit> class NegativeNumber
{
public:
NegativeNumber() : current(0) {};
int operator()()
{
return -(1 + (current++ % limit));
};
private:
int current;
};
Then, you can define any generator with any limit and use it:
NegativeNumber<5> five;
NegativeNumber<2> two;
for (int x = 0; x < 20; ++x)
std::cout << "limit five: " << five() << "\tlimit two: " << two() << '\n';
You can also pass the generator as parameter to another function, with each funtor with its own state:
void f5(NegativeNumber<5> &n)
{
std::cout << "limit five: " << n() << '\n';
}
void f2(NegativeNumber<2> &n)
{
std::cout << "limit two: " << n() << '\n';
}
f5(five);
f2(two);
If you don't like the template solution to declare the limit, there's also the no-template version:
class NegativeNumberNoTemplate
{
public:
NegativeNumberNoTemplate(int limit) : m_limit(limit), current(0) {};
int operator()()
{
return -(1 + (current++ % m_limit));
};
private:
const int m_limit;
int current;
};
Using as argument to a function works in the same way, and it's internal state is transfered as well:
void f(NegativeNumberNoTemplate &n)
{
std::cout << "no template: " << n() << '\n';
}
NegativeNumberNoTemplate notemplate(3);
f(notemplate);
I hope you don't want to use it with threading, they're not thread safe ;)
Here you have all the examples; hope it helps.
Something like.... (haven't compiled)
class myClass
{
int number = 0;
int GetValue ()
{
return - (number = ((number+1) % 101))
}
}
Even a simple problem like this could lead you to several approximations, both in the algorithmic solution and in the concrete usage of the programming language.
This was my first solution using C++03. I preferred to switch the sign after computing the value.
#include <iostream>
int GetNextID() {
// This variable is private to this function. Be careful of not calling it
// from multiple threads!
static int current_value = 0;
const int MAX_CYCLE_VALUE = 100;
return - (current_value++ % MAX_CYCLE_VALUE) - 1;
}
int main()
{
const int TOTAL_GETS = 500;
for (int i = 0; i < TOTAL_GETS; ++i)
std::cout << GetNextID() << std::endl;
}
A different solution taking into account that the integer modulo in C++ takes the sign of the dividend (!) as commented in the Wikipedia
#include <iostream>
int GetNextID() {
// This variable is private to this function. Be careful of not calling it
// from multiple threads!
static int current_value = 0;
const int MAX_CYCLE_VALUE = 10;
return (current_value-- % MAX_CYCLE_VALUE) - 1;
}
int main()
{
const int TOTAL_GETS = 50;
for (int i = 0; i < TOTAL_GETS; ++i)
std::cout << GetNextID() << std::endl;
}

Caching a single overridden computation C++11

I'd like advice on a way to cache a computation that is shared by two derived classes. As an illustration, I have two types of normalized vectors L1 and L2, which each define their own normalization constant (note: against good practice I'm inheriting from std::vector here as a quick illustration-- believe it or not, my real problem is not about L1 and L2 vectors!):
#include <vector>
#include <iostream>
#include <iterator>
#include <math.h>
struct NormalizedVector : public std::vector<double> {
NormalizedVector(std::initializer_list<double> init_list):
std::vector<double>(init_list) { }
double get_value(int i) const {
return (*this)[i] / get_normalization_constant();
}
virtual double get_normalization_constant() const = 0;
};
struct L1Vector : public NormalizedVector {
L1Vector(std::initializer_list<double> init_list):
NormalizedVector(init_list) { }
double get_normalization_constant() const {
double tot = 0.0;
for (int k=0; k<size(); ++k)
tot += (*this)[k];
return tot;
}
};
struct L2Vector : public NormalizedVector {
L2Vector(std::initializer_list<double> init_list):
NormalizedVector(init_list) { }
double get_normalization_constant() const {
double tot = 0.0;
for (int k=0; k<size(); ++k) {
double val = (*this)[k];
tot += val * val;
}
return sqrt(tot);
}
};
int main() {
L1Vector vec{0.25, 0.5, 1.0};
std::cout << "L1 ";
for (int k=0; k<vec.size(); ++k)
std::cout << vec.get_value(k) << " ";
std::cout << std::endl;
std::cout << "L2 ";
L2Vector vec2{0.25, 0.5, 1.0};
for (int k=0; k<vec2.size(); ++k)
std::cout << vec2.get_value(k) << " ";
std::cout << std::endl;
return 0;
}
This code is unnecessarily slow for large vectors because it calls get_normalization_constant() repeatedly, even though it doesn't change after construction (assuming modifiers like push_back have appropriately been disabled).
If I was only considering one form of normalization, I would simply use a double value to cache this result on construction:
struct NormalizedVector : public std::vector<double> {
NormalizedVector(std::initializer_list<double> init_list):
std::vector<double>(init_list) {
normalization_constant = get_normalization_constant();
}
double get_value(int i) const {
return (*this)[i] / normalization_constant;
}
virtual double get_normalization_constant() const = 0;
double normalization_constant;
};
However, this understandably doesn't compile because the NormalizedVector constructor tries to call a pure virtual function (the derived virtual table is not available during base initialization).
Option 1:
Derived classes must manually call the normalization_constant = get_normalization_constant(); function in their constructors.
Option 2:
Objects define a virtual function for initializing the constant:
init_normalization_constant() {
normalization_constant = get_normalization_constant();
}
Objects are then constructed by a factory:
struct NormalizedVector : public std::vector<double> {
NormalizedVector(std::initializer_list<double> init_list):
std::vector<double>(init_list) {
// init_normalization_constant();
}
double get_value(int i) const {
return (*this)[i] / normalization_constant;
}
virtual double get_normalization_constant() const = 0;
virtual void init_normalization_constant() {
normalization_constant = get_normalization_constant();
}
double normalization_constant;
};
// ...
// same code for derived types here
// ...
template <typename TYPE>
struct Factory {
template <typename ...ARGTYPES>
static TYPE construct_and_init(ARGTYPES...args) {
TYPE result(args...);
result.init_normalization_constant();
return result;
}
};
int main() {
L1Vector vec = Factory<L1Vector>::construct_and_init<std::initializer_list<double> >({0.25, 0.5, 1.0});
std::cout << "L1 ";
for (int k=0; k<vec.size(); ++k)
std::cout << vec.get_value(k) << " ";
std::cout << std::endl;
return 0;
}
Option 3:
Use an actual cache: get_normalization_constant is defined as a new type, CacheFunctor; the first time CacheFunctor is called, it saves the return value.
In Python, this works as originally coded, because the virtual table is always present, even in __init__ of a base class. In C++ this is much trickier.
I'd really appreciate the help; this comes up a lot for me. I feel like I'm getting the hang of good object oriented design in C++, but not always when it comes to making very efficient code (especially in the case of this sort of simple caching).
I suggest the non-virtual interface pattern. This pattern excels when you want a method to provide both common and unique functionality. (In this case, caching in common, computation in uniqueness.)
http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Non-Virtual_Interface
// UNTESTED
struct NormalizedVector : public std::vector<double> {
...
double normalization_constant;
bool cached;
virtual double do_get_normalization_constant() = 0;
double get_normalization_constant() {
if(!cached) {
cached = true;
normalization_constant = do_get_normalization_constant();
}
return normalization_constant;
};
P.s. You really ought not publicly derive from std::vector.
P.P.s. Invalidating the cache is as simple as setting cached to false.
Complete Solution
#include <vector>
#include <iostream>
#include <iterator>
#include <cmath>
#include <algorithm>
struct NormalizedVector : private std::vector<double> {
private:
typedef std::vector<double> Base;
protected:
using Base::operator[];
using Base::begin;
using Base::end;
public:
using Base::size;
NormalizedVector(std::initializer_list<double> init_list):
std::vector<double>(init_list) { }
double get_value(int i) const {
return (*this)[i] / get_normalization_constant();
}
virtual double do_get_normalization_constant() const = 0;
mutable bool normalization_constant_valid;
mutable double normalization_constant;
double get_normalization_constant() const {
if(!normalization_constant_valid) {
normalization_constant = do_get_normalization_constant();
normalization_constant_valid = true;
}
return normalization_constant;
}
void push_back(const double& value) {
normalization_constant_valid = false;
Base::push_back(value);
}
virtual ~NormalizedVector() {}
};
struct L1Vector : public NormalizedVector {
L1Vector(std::initializer_list<double> init_list):
NormalizedVector(init_list) { get_normalization_constant(); }
double do_get_normalization_constant() const {
return std::accumulate(begin(), end(), 0.0);
}
};
struct L2Vector : public NormalizedVector {
L2Vector(std::initializer_list<double> init_list):
NormalizedVector(init_list) { get_normalization_constant(); }
double do_get_normalization_constant() const {
return std::sqrt(
std::accumulate(begin(), end(), 0.0,
[](double a, double b) { return a + b * b; } ) );
}
};
std::ostream&
operator<<(std::ostream& os, NormalizedVector& vec) {
for (int k=0; k<vec.size(); ++k)
os << vec.get_value(k) << " ";
return os;
}
int main() {
L1Vector vec{0.25, 0.5, 1.0};
std::cout << "L1 " << vec << "\n";
vec.push_back(2.0);
std::cout << "L1 " << vec << "\n";
L2Vector vec2{0.25, 0.5, 1.0};
std::cout << "L2 " << vec2 << "\n";
vec2.push_back(2.0);
std::cout << "L2 " << vec2 << "\n";
return 0;
}
A quick and dirty solution is to use static member variable.
double get_normalization_constant() const {
static double tot = 0.0;
if( tot == 0.0 )
for (int k=0; k<size(); ++k)
tot += (*this)[k];
return tot;
}
In this case, it will only be computed once.. and each time it will return the latest value.
NOTE:
This double tot, will be shared will all objects of same type. Don't use it if you will create many object of the type L1Vector