I am designing a series of Vector classes in C++ that support SSE(SIMD). The operators have been overloaded for convenience. Example of class:
class vector2 {
public:
//...code
friend const vector2 operator+ (const vector2 & lhs, const vector2 & rhs);
//...code
protected:
float x, y;
};
So far the method checks to see if the CPU has a SSE(SIMD) feature, using a class I created called PROCESSOR, which does this check when the program is executed at run-time. Example of method:
const vector2 operator+ (const vector2 & lhs, const vector2 & rhs) {
vector2 temp;
if(PROCESSOR.SSE) {
_asm { //... The "SSE WAY"
}
} else {
// The "NORMAL WAY"
}
return temp;
}
So as you can see if SSE is available it will run the "SSE" way otherwise it will run "normal" way. However, it is very in-efficient having to check if SSE is available every time this operation is called. Is there a way to implement two versions of a method and call only the appropriate method? Since my PROCESSOR class only does the SSE check once, is there a way of setting my vector class can do the same?
To help you avoid code duplication you can create two vector classes, one for SSE and one for non-SSE. Then you can template your calling algorithms.
class vector_base { float x,y; } ;
class vector_sse : public vector_base { vector_sse operator+(...){...} };
class vector_nonsse : public vector_base { vector_nonsse operator+(...){...} };
template< typename VECTOR >
void do_somthing() {
for( /*lots*/) {
VECTOR v = ...;
VECTOR w = ...;
foo(v+w);
}
}
int main() {
if(PROCESSOR.SSE) { do_something<vector_sse>(); }
else { do_something<vector_nonsse>(); }
}
If you're likely to use other classes than vector (like matrix etc) in an SSE manner you might do better by tagging your types instead .. in which case the code looks like this:
class vector_base { float x,y; } ;
struct SSE_tag;
struct NONSSE_tag;
template<typename T>
class vector;
template<>
class vector<SSE_tag> : public vector_base { vector_sse operator+(...){...} };
template<>
class vector<NONSSE_tag> : public vector_base { vector_nonsse operator+(...){...} };
template< typename TAG >
void do_somthing() {
for( /*lots*/) {
vector<TAG> v = ...;
vector<TAG> w = ...;
matrix<TAG> m = ...;
foo(v+(m*w));
}
}
int main() {
if(PROCESSOR.SSE) { do_something<SSE_tag>(); }
else { do_something<NONSSE_tag>(); }
}
Split the function into two parts, one for SSE and one not. Create a function pointer and initialize it with the appropriate version of the function. You can make an inline function that calls the pointer if that makes your code look cleaner.
Unfortunately you'll still pay the price of an indirection for the function call. Whether this is faster than testing a boolean flag can only be determined by benchmarking.
The best way to deal with this problem is to make sure the amount of data being processed with each call is enough to make the overhead insignificant.
Related
I have a bit of a puzzle. I have a template class graph with a template parameter - a class vertex, that can be either symmetric or asymmetric, compressed or raw, and I only know which at runtime.
So if I wanted to get the graph of appropriate type from disk, run Bellman Ford on it and then free the memory, I would need to repeat the template instantiation in all four branches of conditionals, like so:
#include "graph.h"
int main(){
// parse cmd-line args, to get `compressed` `symmetric`
// TODO get rid of conditionals.
if (compressed) {
if (symmetric) {
graph<compressedSymmetricVertex> G =
readCompressedGraph<compressedSymmetricVertex>(iFile, symmetric,mmap);
bellman_ford(G,P);
} else {
graph<compressedAsymmetricVertex> G =
readCompressedGraph<compressedAsymmetricVertex>(iFile,symmetric,mmap);
bellman_ford(G,P);
if(G.transposed) G.transpose();
G.del();
}
} else {
if (symmetric) {
graph<symmetricVertex> G =
readGraph<symmetricVertex>(iFile,compressed,symmetric,binary,mmap);
bellman_ford(G,P);
G.del();
} else {
graph<asymmetricVertex> G =
readGraph<asymmetricVertex>(iFile,compressed,symmetric,binary,mmap);
bellman_ford(G,P);
if(G.transposed) G.transpose();
G.del();
}
}
return 0;
}
QUESTION: How can I extract everything except the call to the readGraph functions outside the conditionals with the following restrictions.
I cannot modify the graph template. Otherwise I would have simply moved the Vertex type into a union.
I cannot use std::variant because graph<T> cannot be default constructible.
Call overhead is an issue. If there are subtyping polymoprhism based solutions that don't involve making compressedAsymmetricVertex a subtype of vertex, I'm all ears.
Edit: Here is a sample header graph.h:
#pragma once
template <typename T>
struct graph{ T Data; graph(int a): Data(a) {} };
template <typename T>
graph<T> readGraph<T>(char*, bool, bool, bool) {}
template <typename T>
graph<T> readCompressedGraph<T> (char*, bool, bool) {}
class compressedAsymmetricVertex {};
class compressedSymmetricVertex {};
class symmetricVertex{};
class asymmetricVertex {};
Since you did not spell out all the types, and did not explain what is going on with the binary parameter, I can only give an approximate solution. Refine it according to your exact needs. This should be in line with:
class GraphWorker
{
public:
GraphWorker(bool compressed, bool symmetric)
: m_compressed(compressed), m_symmetric(symmetric)
{}
virtual void work(const PType & P, const char * iFile, bool binary, bool mmap ) const = 0;
protected:
const bool m_compressed;
const bool m_symmetric;
};
template <class GraphType>
class ConcreteGraphWorker : public GraphWorker
{
public:
ConcreteGraphWorker(bool compressed, bool symmetric)
: GraphWorker(compressed, symmetric)
{}
void work(const PType & P, const char * iFile, bool binary, bool mmap) const override
{
graph<GraphType> G =
readGraph<GraphType>(iFile, m_compressed, m_symmetric,
binary, mmap);
bellman_ford(G,P);
G.del();
}
};
static const std::unique_ptr<GraphWorker> workers[2][2] = {
{
std::make_unique<ConcreteGraphWorker<asymmetricVertex>>(false, false),
std::make_unique<ConcreteGraphWorker<symmetricVertex>>(false, true),
},
{
std::make_unique<ConcreteGraphWorker<compressedAsymmetricVertex>>(true, false),
std::make_unique<ConcreteGraphWorker<compressedSymmetricVertex>>(true, true),
}
};
int main()
{
workers[compressed][symmetric]->work(P, iFile, binary, mmap);
}
Some comments: It is better to avoid bool altogether, and use specific enumeration types. This means that instead of my two-dimensional array, you should use something like:
std::map<std::pair<Compression, Symmetry>, std::unique_ptr<GraphWorker>> workers;
But since there could be other unknown dependencies, I have decided to stick with the confusing bool variables. Also, having workers as a static variable has its drawbacks, and since I don't know your other requirements I did not know what to do with it. Another issue is the protected Boolean variables in the base class. Usually, I'd go with accessors instead.
I'm not sure if all this jumping-through-hoops, just to avoid a couple of conditionals, is worth it. This is much longer and trickier than the original code, and unless there are more than 4 options, or the code in work() is much longer, I'd recommend to stick with the conditionals.
edit: I have just realized that using lambda functions is arguably clearer (it is up to debate). Here it is:
int main()
{
using workerType = std::function<void(PType & P, const char *, bool, bool)>;
auto makeWorker = [](bool compressed, bool symmetric, auto *nullGrpah)
{
auto worker = [=](PType & P, const char *iFile, bool binary, bool mmap)
{
// decltype(*nullGraph) is a reference, std::decay_t fixes that.
using GraphType = std::decay_t<decltype(*nullGrpah)>;
auto G = readGraph<GraphType>(iFile, compressed, symmetric,
binary, mmap);
bellman_ford(G,P);
G.del();
};
return workerType(worker);
};
workerType workers[2][2] {
{
makeWorker(false, false, (asymmetricVertex*)nullptr),
makeWorker(false, true, (symmetricVertex*)nullptr)
},
{
makeWorker(true, false, (compressedAsymmetricVertex*)nullptr),
makeWorker(true, true, (compressedSymmetricVertex*)nullptr)
}
};
workers[compressed][symmetric](P, iFile, binary, mmap);
}
The simple baseline is that whenever you want to cross from "type only known at runtime" to "type must be known at compile-time" (i.e. templates), you will need a series of such conditionals. If you cannot modify graph at all, then you will be stuck with needing four different G variables (and branches) whenever you want to handle a G object in a non-templated function, as all the graph template variants are unrelated types and cannot be treated uniformly (std::variant aside).
One solution would be to do this transition exactly once, right after reading in compressed and symmetric, and stay fully templated from there:
template<class VertexT>
graph<VertexT> readTypedGraph()
{
if constexpr (isCompressed<VertexT>::value)
return readCompressedGraph<VertexT>(/*...*/);
else
return readGraph<VertexT>(/*...*/);
}
template<class VertexT>
void main_T()
{
// From now on you are fully compile-time type-informed.
graph<VertexT> G = readTypedGraph<VertexT>();
bellman_ford(G);
transposeGraphIfTransposed(G);
G.del();
}
// non-template main
int main()
{
// Read parameters.
bool compressed = true;
bool symmetric = false;
// Switch to fully-templated code.
if (compressed)
if (symmetric)
main_T<compressedSymmetricVertex>();
else
main_T<compressedAsymmetricVertex>();
// else
// etc.
return 0;
}
Demo
You will probably have to write a lot of meta-functions (such as isCompressed) but can otherwise code as normal (albeit your IDE won't help you as much). You're not locked down in any way.
My problem is pretty simple, i want to use lambda's in the same way i may use a functor as a 'comparator', let me explain a little better. I have two big structs, both of them have their own implementation of operator<, and i have also a useless class (this is just the name of the class in the context of this question) which use the two struct, everything looks like this:
struct be_less
{
//A lot of stuff
int val;
be_less(int p_v):val(p_v){}
bool operator<(const be_less& p_other) const
{
return val < p_other.val;
}
};
struct be_more
{
//A lot of stuff
int val;
be_more(int p_v):val(p_v){}
bool operator<(const be_more& p_other) const
{
return val > p_other.val;
}
};
class useless
{
priority_queue<be_less> less_q;
priority_queue<be_more> more_q;
public:
useless(const vector<int>& p_data)
{
for(auto elem:p_data)
{
less_q.emplace(elem);
more_q.emplace(elem);
}
}
};
I whould like to remove the duplication in the two struct's, the simpliest idea is to make the struct a template and provide two functor to do the comparison job:
template<typename Comp>
struct be_all
{
//Lot of stuff, better do not duplicate
int val;
be_all(int p_v):val{p_v}{}
bool operator<(const be_all<Comp>& p_other) const
{
return Comp()(val,p_other.val);
}
};
class comp_less
{
public:
bool operator()(int p_first,
int p_second)
{
return p_first < p_second;
}
};
class comp_more
{
public:
bool operator()(int p_first,
int p_second)
{
return p_first > p_second;
}
};
typedef be_all<comp_less> all_less;
typedef be_all<comp_more> all_more;
class useless
{
priority_queue<all_less> less_q;
priority_queue<all_more> more_q;
public:
useless(const vector<int>& p_data)
{
for(auto elem:p_data)
{
less_q.emplace(elem);
more_q.emplace(elem);
}
}
};
This work pretty well, now for sure i dont have any duplication in the struct code at the price of two additional function object. Please note that i'm very simplifying the implementation of operator<, the hipotetic real code does much more than just comparing two ints.
Then i was thinking about how to do the same thing using lambda (Just as an experiment).The only working solution i was able to implement is:
template<typename Comp>
struct be_all
{
int val;
function<bool(int,int)> Comparator;
be_all(Comp p_comp,int p_v):
Comparator(move(p_comp)),
val{p_v}
{}
bool operator<(const be_all& p_other) const
{
return Comparator(val, p_other.val);
}
};
auto be_less = [](int p_first,
int p_second)
{
return p_first < p_second;
};
auto be_more = [](int p_first,
int p_second)
{
return p_first > p_second;
};
typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;
class useless
{
priority_queue<all_less> less_q;
priority_queue<all_more> more_q;
public:
useless(const vector<int>& p_data)
{
for(auto elem:p_data)
{
less_q.emplace(be_less,elem);
more_q.emplace(be_more,elem);
}
}
};
This implementation not only add a new member to the data containing struct, but have also a very poor performance, i prepared a small test in which i create one instance for all the useless class i've show you here, every time i feed the constructor with a vector full of 2 milion integers, the results are the following:
Takes 48ms to execute the constructor of the first useless class
Takes 228ms to create the second useless class (functor)
Takes 557ms to create the third useless class (lambdas)
Clearly the price i pay for the removed duplication is very high, and in the original code the duplication is still there. Please note how bad is the performance of the third implementation, ten times slower that the original one, i believed that the reason of the third implementation being slower than the second was because of the additional parameter in the constructor of be_all... but:
Actually there's also a fourth case, where i still used the lambda but i get rid of the Comparator member and of the additional parameter in be_all, the code is the following:
template<typename Comp>
struct be_all
{
int val;
be_all(int p_v):val{p_v}
{}
bool operator<(const be_all& p_other) const
{
return Comp(val, p_other.val);
}
};
bool be_less = [](int p_first,
int p_second)
{
return p_first < p_second;
};
bool be_more = [](int p_first,
int p_second)
{
return p_first > p_second;
};
typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;
class useless
{
priority_queue<all_less> less_q;
priority_queue<all_more> more_q;
public:
useless(const vector<int>& p_data)
{
for(auto elem:p_data)
{
less_q.emplace(elem);
more_q.emplace(elem);
}
}
};
If i remove auto from the lambda and use bool instead the code build even if i use Comp(val, p_other.val) in operator<.
What's very strange to me is that this fourth implementation (lambda without the Comparator member) is even slower than the other, at the end the average performance i was able to register are the following:
48ms
228ms
557ms
698ms
Why the functor are so much faster than lambdas in this scenario? I was expecting lambda's to be at least performing good as the ordinary functor, can someone of you comment please? And is there any technial reason why the fourth implementation is slower than the third?
PS:
The compilator i'm using is g++4.8.2 with -O3. In my test i create for each useless class an instance and using chrono i take account of the required time:
namespace benchmark
{
template<typename T>
long run()
{
auto start=chrono::high_resolution_clock::now();
T t(data::plenty_of_data);
auto stop=chrono::high_resolution_clock::now();
return chrono::duration_cast<chrono::milliseconds>(stop-start).count();
}
}
and:
cout<<"Bad code: "<<benchmark::run<bad_code::useless>()<<"ms\n";
cout<<"Bad code2: "<<benchmark::run<bad_code2::useless>()<<"ms\n";
cout<<"Bad code3: "<<benchmark::run<bad_code3::useless>()<<"ms\n";
cout<<"Bad code4: "<<benchmark::run<bad_code4::useless>()<<"ms\n";
The set of input integers is the same for all, plenty_of_data is a vector full of 2 million intergers.
Thanks for your time
You are not comparing the runtime of a lambda and a functor. Instead, the numbers indicate the difference in using a functor and an std::function. And std::function<R(Args...)>, for example, can store any Callable satisfying the signature R(Args...). It does this through type-erasure. So, the difference you see comes from the overhead of a virtual call in std::function::operator().
For example, the libc++ implementation(3.5) has a base class template<class _Fp, class _Alloc, class _Rp, class ..._ArgTypes> __base with a virtual operator(). std::function stores a __base<...>*. Whenever you create an std::function with a callable F, an object of type template<class F, class _Alloc, class R, class ...Args> class __func is created, which inherits from __base<...> and overrides the virtual operator().
I want to design a custom compare functor for std::set, which uses cached values of the enclosing class (in which the set is defined).
I know that in C++, there is no direct access from the nested class to the enclosing class and that you need to store a pointer in the nested class (as several questions/answers on SO already nicely explained).
But my question is how do you import such a pointer (pModel in my code skeleton) in a comparison functor ?
My code skeleton:
using namespace std;
class Face;
class Model
{
public:
// ...
map<Face, double> areaCached;
double area(Face f)
{
if (areaCached.find(f) == areaCached.end())
{
double calculatedValue; // perform very expensive calculation
areaCached[f] = calculatedValue;
}
return areaCached[f];
}
struct CompareByArea
{
// how can I import the pModel pointer here?
bool operator() (const Face f1, const Face f2) const
{
return pModel->area(f1) < pModel->area(f2);
}
};
set<Face, CompareByArea> sortedFaces;
};
The different associative containers take a comparison object as a constructor parameter. That is, you'd add a pointer to your comparison function and add a constructor setting this pointer. Then you construct you set correspondingly:
class Model {
struct CompareByArea {
Model* model;
CompareByArea(Model* model): model(model) {}
bool operator()(Face const& f1, Face const& f2) const {
return model->area(f1) < model->area(f2);
}
};
std::set<Face, CompareByArea> sortedFaces;
// ...
public:
Model(): sortedFaces(CompareByArea(this)) {}
// ...
};
The use of this may issue warnings about use of this before it is fully constructed but as long as this isn't use in constructor of CompareByArea to access the Model there isn't an issue.
I give you an example with references, just because I prefer them to pointers.
struct CompareByArea
{
CompareByArea(Model& aModel):model(aModel)
bool operator() (const Face& f1, const Face& f2) const
{
return model.area(f1) < model.area(f2);
}
Model& model;
};
And you should priviledge references over pointers in C++. It's easier to read and understand.
Note: I know similar questions to this have been asked on SO before, but I did not find them helpful or very clear.
Second note: For the scope of this project/assignment, I'm trying to avoid third party libraries, such as Boost.
I am trying to see if there is a way I can have a single vector hold multiple types, in each of its indices. For example, say I have the following code sample:
vector<something magical to hold various types> vec;
int x = 3;
string hi = "Hello World";
MyStruct s = {3, "Hi", 4.01};
vec.push_back(x);
vec.push_back(hi);
vec.push_back(s);
I've heard vector<void*> could work, but then it gets tricky with memory allocation and then there is always the possibility that certain portions in nearby memory could be unintentionally overridden if a value inserted into a certain index is larger than expected.
In my actual application, I know what possible types may be inserted into a vector, but these types do not all derive from the same super class, and there is no guarantee that all of these types will be pushed onto the vector or in what order.
Is there a way that I can safely accomplish the objective I demonstrated in my code sample?
Thank you for your time.
The objects hold by the std::vector<T> need to be of a homogenous type. If you need to put objects of different type into one vector you need somehow erase their type and make them all look similar. You could use the moral equivalent of boost::any or boost::variant<...>. The idea of boost::any is to encapsulate a type hierarchy, storing a pointer to the base but pointing to a templatized derived. A very rough and incomplete outline looks something like this:
#include <algorithm>
#include <iostream>
class any
{
private:
struct base {
virtual ~base() {}
virtual base* clone() const = 0;
};
template <typename T>
struct data: base {
data(T const& value): value_(value) {}
base* clone() const { return new data<T>(*this); }
T value_;
};
base* ptr_;
public:
template <typename T> any(T const& value): ptr_(new data<T>(value)) {}
any(any const& other): ptr_(other.ptr_->clone()) {}
any& operator= (any const& other) {
any(other).swap(*this);
return *this;
}
~any() { delete this->ptr_; }
void swap(any& other) { std::swap(this->ptr_, other.ptr_); }
template <typename T>
T& get() {
return dynamic_cast<data<T>&>(*this->ptr_).value_;
}
};
int main()
{
any a0(17);
any a1(3.14);
try { a0.get<double>(); } catch (...) {}
a0 = a1;
std::cout << a0.get<double>() << "\n";
}
As suggested you can use various forms of unions, variants, etc. Depending on what you want to do with your stored objects, external polymorphism could do exactly what you want, if you can define all necessary operations in a base class interface.
Here's an example if all we want to do is print the objects to the console:
#include <iostream>
#include <string>
#include <vector>
#include <memory>
class any_type
{
public:
virtual ~any_type() {}
virtual void print() = 0;
};
template <class T>
class concrete_type : public any_type
{
public:
concrete_type(const T& value) : value_(value)
{}
virtual void print()
{
std::cout << value_ << '\n';
}
private:
T value_;
};
int main()
{
std::vector<std::unique_ptr<any_type>> v(2);
v[0].reset(new concrete_type<int>(99));
v[1].reset(new concrete_type<std::string>("Bottles of Beer"));
for(size_t x = 0; x < 2; ++x)
{
v[x]->print();
}
return 0;
}
In order to do that, you'll definitely need a wrapper class to somehow conceal the type information of your objects from the vector.
It's probably also good to have this class throw an exception when you try to get Type-A back when you have previously stored a Type-B into it.
Here is part of the Holder class from one of my projects. You can probably start from here.
Note: due to the use of unrestricted unions, this only works in C++11. More information about this can be found here: What are Unrestricted Unions proposed in C++11?
class Holder {
public:
enum Type {
BOOL,
INT,
STRING,
// Other types you want to store into vector.
};
template<typename T>
Holder (Type type, T val);
~Holder () {
// You want to properly destroy
// union members below that have non-trivial constructors
}
operator bool () const {
if (type_ != BOOL) {
throw SomeException();
}
return impl_.bool_;
}
// Do the same for other operators
// Or maybe use templates?
private:
union Impl {
bool bool_;
int int_;
string string_;
Impl() { new(&string_) string; }
} impl_;
Type type_;
// Other stuff.
};
I am currently working on a game "engine" that needs to move values between a 3D engine, a physics engine and a scripting language. Since I need to apply vectors from the physics engine to 3D objects very often and want to be able to control both the 3D, as well as the physics objects through the scripting system, I need a mechanism to convert a vector of one type (e.g. vector3d<float>) to a vector of the other type (e.g. btVector3). Unfortunately I can make no assumptions on how the classes/structs are laid out, so a simple reinterpret_cast probably won't do.
So the question is: Is there some sort of 'static'/non-member casting method to achieve basically this:
vector3d<float> operator vector3d<float>(btVector3 vector) {
// convert and return
}
btVector3 operator btVector3(vector3d<float> vector) {
// convert and return
}
Right now this won't compile since casting operators need to be member methods.
(error C2801: 'operator foo' must be a non-static member)
I would suggest writing them as a pair of free functions (i.e. don't worry about making them 'operators'):
vector3d<float> vector3dFromBt(const btVector3& src) {
// convert and return
}
btVector3 btVectorFrom3d(const vector3d<float>& src) {
// convert and return
}
void f(void)
{
vector3d<float> one;
// ...populate...
btVector3 two(btVectorFrom3d(one));
// ...
vector3d<float> three(vector3dFromBt(two));
}
You could also use a templated wrapper class like:
template<class V>
class vector_cast {};
template<>
class vector_cast<vector3d> {
const vector3d& v;
public:
vector_cast(const vector3d& v) : v(v) {};
operator vector3d () const {
return vector3d(v);
}
operator btVector3 () const {
// convert and return
}
};
template<>
class vector_cast<btVector3> {
const btVector3& v;
public:
vector_cast(const btVector3& v) : v(v) {};
operator btVector3 () const {
return btVector3(v);
}
operator vector3d () const {
// convert and return
}
};
Usage:
void set_origin(btVector3 v);
// in your code:
vector3d v;
// do some fancy computations
set_origin(vector_cast(v));
// --- OR the other way round --- //
void set_velocity(vector3d v);
// in your code:
btVector3 v;
// do some other computations
set_velocity(vector_cast(v));
Your statement in the question is correct. The type conversion operator needs to be a non-static member. If you really want conversion-type semantics, you could extend each of those classes for use in your application code:
// header:
class ConvertibleVector3d;
ConvertibleBtVector : public btVector3
{
operator ConvertibleVector3d() const;
}
ConvertibleVector3d : public vector3d<float>
{
operator ConvertibleBtVector() const;
}
//impl:
ConvertibleBtVector::operator ConvertibleVector3d() const
{
ConvertibleVector3d retVal;
// convert this into retVal...
return retVal;
}
ConvertibleVector3d::operator ConvertibleBtVector() const;
{
ConvertibleBtVector retVal;
// convert this into retVal...
return retVal;
}
void f(void)
{
ConvertibleVector3d one;
// ...populate...
ConvertibleBtVector two(one);
// ...
ConvertibleVector3d three;
three = two;
}
The names are a bit verbose but hopefully the intent is clear.
The public inheritance means you should be able to use instances of these classes in the same way as the base class, except they will be assignable and constructable from each other. Of course this couples the two classes, but that may be acceptable since it sounds like that's what your application intends to do anyway.