Is there a nice way of including certain members only in the debug build of a program?
I have an indexed data structure of which I use a large number of instances, which carry certain status flag in case some contents of the data structure have changed, but the index hasn't been updated.
The status flag are only used to check that all uses of the index call the update functionality in case the data has been changed, but for performance and storage reasons, since there are lots of instances and the data structure might be changed a lot before update is called, I would like to keep this data only for the debug build.
There are basically two types of operations on these flags:
Setting/Resetting the flag
asserting that the flag is not set, i.e. certain parts of the index are still valid.
Are there any nicer ways of achieving this than sprinkling my code with #ifndef NDEBUG statements?
Note: In my special use case, the performance hit might not be that large, but I'm still looking for a general way to approach this, since there are probably much more complex use cases for the same idea.
You can reduce amount of #ifdefing by providing a base class with debug facilities:
class
t_MyDebugHelper
{
#ifdef NDEBUG
public: void
Set_Something(int value)
{
(void) value; // not used
}
public: void
Verify_Something(void)
{}
#else
private: ::std::string m_some_info;
private: int m_some_value;
public: void
Set_Something(int value)
{
m_some_value = value;
}
public: void
Verify_Something(void)
{
// implementation
}
#endif
};
class
t_MyClass
: public t_MyDebugHelper
{
public: void
SomeMethod(void)
{
t_MyDebugHelper::Verify_Something();
t_MyDebugHelper::Set_Something(42);
...
}
};
This method does not allow you to get rid of ifdef completely, however it allows to avoid them in the main code logic. In Release build all the debug helper functions will result in noop and t_MyDebugHelper class won't increase target class size due to empty base class optimization. If debug helper needs an access to t_MyClass methods then a CRTP could be applied.
Related
I have a C++20 program where the configuration is passed externally via JSON. According to the “Clean Architecture” I would like to transfer the information into a self-defined structure as soon as possible. The usage of JSON is only to be apparent in the “outer ring” and not spread through my whole program. So I want my own Config struct. But I am not sure how to write the constructor in a way that is safe against missing initializations, avoids redundancy and also separates the external library from my core entities.
One way of separation would be to define the structure without a constructor:
struct Config {
bool flag;
int number;
};
And then in a different file I can write a factory function that depends on the JSON library.
Config make_config(json const &json_config) {
return {.flag = json_config["flag"], .number = json_config["number"]};
}
This is somewhat safe to write, because one can directly see how the struct field names correspond to the JSON field. Also I don't have so much redundancy. But I don't really notice if fields are not initialized.
Another way would be to have a an explicit constructor. Clang-tidy would warn me if I forget to initialize a field:
struct Config {
Config(bool const flag, int const number) : flag(flag), number(number) {}
bool flag;
int number;
};
And then the factory would use the constructor:
Config make_config(json const &json_config) {
return Config(json_config["flag"], json_config["number"]);
}
I just have to specify the name of the field five times now. And in the factory function the correspondence is not clearly visible. Surely the IDE will show the parameter hints, but it feel brittle.
A really compact way of writing it would be to have a constructor that takes JSON, like this:
struct Config {
Config(json const &json_config)
: flag(json_config["flag"]), number(json_config["number"]) {}
bool flag;
int number;
};
That is really short, would warn me about uninitialized fields, the correspondence between fields and JSON is directly visible. But I need to import the JSON header in my Config.h file, which I really dislike. It also means that I need to recompile everything that uses the Config class if I should change the way that the configuration is loaded.
Surely C++ is a language where a lot of boilerplate code is needed. And in theory I like the second variant the best. It is the most encapsulated, the most separated one. But it is the worst to write and maintain. Given that in the realistic code the number of fields is significantly larger, I would sacrifice compilation time for less redundancy and more maintainability.
Is there some alternative way to organize this, or is the most separated variant also the one with the most boilerplate code?
I'd go with the constructor approach, however:
// header, possibly config.h
// only pre-declare!
class json;
struct Config
{
Config(json const& json_config); // only declare!
bool flag;
int number;
};
// now have a separate source file config.cpp:
#include "config.h"
#include <json.h>
Config::Config(json const& json_config)
: flag(json_config["flag"]), number(json_config["number"])
{ }
Clean approach and you avoid indirect inclusions of the json header. Sure, the constructor is duplicated as declaration and definition, but that's the usual C++ way.
I am writing a program that has the option to visualize the output of an algorithm I am working on - this is done by changing a const bool VISUALIZE_OUTPUT variable defined in a header file. In the main file, I want to have this kind of pattern:
if(VISUALIZE_OUTPUT) {
VisualizerObject vis_object;
}
...
if(VISUALIZE_OUTPUT) {
vis_object.initscene(objects_here);
}
...
if(VISUALIZE_OUTPUT) {
vis_object.drawScene(objects_here);
}
However, this clearly won't compile since vis_object goes out of scope. I don't want to declare the object before the condition since it is a big object and it needs to available for multiple points in the code (I can't just have one conditional statement where everything is done).
What is the preferred way of doing this?
Declare the object on the heap and refer to it by using a pointer (or
unique_ptr)?
Declare the object on the heap and make a reference to it
since it won't ever change?
Some other alternative?
A reference will not be useable here, because at declaration it should refere to an already existing object, and live in a scope englobing all your if(VISUALIZE_OUTPUT). Long story short, the object will have to be created unconditionally.
So IMHO a simple way would be to create it on the heap and use it through a pointer - do not forget do delete it when done. The good point is that the pointer could be initialized to nullptr, and so it could be unconditionnaly deleted.
But I think that the best way would be to encapsulate everything in an object created in highest scope. This object would then contain methods to create, use internally and finally destroy the actual vis_object. That way, if you do not need it, nothing will be actually instanciated, but the main procedure will not be cluttered with raw pointer processing.
I would use Null_object_pattern:
struct IVisualizerObject
{
virtual ~IVisualizerObject() = default;
virtual void initscene(Object&) = 0;
virtual void drawScene(Object&) = 0;
// ...
};
struct NullVisualizerObject : IVisualizerObject
{
void initscene(Object&) override { /* Empty */ }
void drawScene(Object&) override { /* Empty */}
// ...
};
struct VisualizerObject : IVisualizerObject
{
void initscene(Object& o) override { /*Implementation*/}
void drawScene(Object& o) override { /*Implementation*/}
// ...
};
And finally:
std::unique_ptr<IVisualizerObject> vis_object;
if (VISUALIZE_OUTPUT) {
vis_object = std::make_unique<VisualizerObject>();
} else {
vis_object = std::make_unique<NullVisualizer>();
}
// ...
vis_object->initscene(objects_here);
//...
vis_object->drawScene(objects_here);
I'll give a few options. All have upsides and downsides.
If it is NOT possible to modify VisualizerObject, as I noted in comments, the effect could be achieved by using the preprocessor, since the preprocessor does not respect scope, and the question specifically seeks controlling lifetime of an object in a manner that crosses scope boundaries.
#ifdef VISUALIZE_OUTPUT
VisualizerObject vis_object;
#endif
#ifdef VISUALIZE_OUTPUT
vis_object.initscene(objects_here);
#endif
The compiler will diagnose any usage of vis_object that are not in #ifdef/#endif.
The big criticism, of course, is that use of the preprocessor is considered poor practice in C++. The advantage is that the approach can be used even if it is not possible to modify the VisualizerObject class (e.g. because it is in a third-party library without source code provided).
However, this is the only option that has the feature requested by the OP of object lifetime crossing scope boundaries.
If it is possible to modify the VisualizerObject class, make it a template with two specialisations
template<bool visualise> struct VisualizerObject
{
// implement all member functions required to do nothing and have no members
VisualizerObject() {};
void initscene(types_here) {};
};
template<> struct VisualizerObject<true> // heavyweight implementation with lots of members
{
VisualizerObject(): heavy1(), heavy2() {};
void initscene(types_here) { expensive_operations_here();};
HeavyWeight1 heavy1;
HeavyWeight2 heavy2;
};
int main()
{
VisualizerObject<VISUALIZE_OUTPUT> vis_object;
...
vis_object.initscene(objects_here);
...
vis_object.drawScene(objects_here);
}
The above will work in all C++ versions. Essentially, it works by either instantiating a lightweight object with member functions that do nothing, or instantiating the heavyweight version.
It would also be possible to use the above approach to wrap a VisualizerObject.
template<bool visualise> VisualizerWrapper
{
// implement all required member functions to do nothing
// don't supply any members either
}
template<> VisualizerWrapper<true>
{
VisualizerWrapper() : object() {};
// implement all member functions as forwarders
void initscene(types_here) { object.initscene(types_here);};
VisualizerObject object;
}
int main()
{
VisualizerWrapper<VISUALIZE_OUTPUT> vis_object;
...
vis_object.initscene(objects_here);
...
vis_object.drawScene(objects_here);
}
The disadvantage of both of the template approaches is maintenance - when adding a member function to one class (template specialisation) it is necessary to add a function with the same signature to the other. In large team settings, it is likely that testing/building will be mostly done with one setting of VISUALIZE_OUTPUT or the other - so it is easy to get one version out of alignment (different interface) to the other. Problems of that (e.g. a failed build on changing the setting) are likely to emerge at inconvenient times - such as when there is a tight deadline to deliver a different version of the product.
Pedantically, the other downside of the template options is that they don't comply with the desired "kind of pattern" i.e. the if is not required in
if(VISUALIZE_OUTPUT)
{
vis_object.initscene(objects_here);
}
and object lifetimes do not cross scope boundaries.
I am author of a library with a number of optimization algorithms, in which I already invested quite a bit of profiling/tuning. I am currently writing front-end programs for this library.
At the moment, the library routines themselves are pretty much black boxes. Consider a method along the lines of bool fit(vector_t const& data, double targetError). They work and do what I want them to do, but for the frontends, a bit of runtime information would be nice. For example, it would be nice if information like "current error" or "number of iterations left" could be displayed. I do not want a simple if (verbose) cerr << "Info\n"; pattern, since the library should be equally usable in a GUI environment.
I deliberately write could, because I want to keep the impact of this information emission as low as possible. How can I define and implement an interface, that
Emits an object with run-time information whenever an observer is registered
has minimal run-time costs with regards to the algorithm if an information object is emitted, and
has close to no run-time costs if no observer is registered?
Basically, the run-time costs of this optional introspection should be as low as possible, and close to zero if no introspection is wanted. How can this be achieved? Do libraries exist for this? (I guess the answer is yes, and probably hidden in the Boost project, but without known what to look for…)
Just move all costs to compile time!
The easiest approach is:
template<bool logging>
bool fit(vector_t const& data, double targetError)
{
// ... computations ...
if (logging) {
std::cout << "This is log entry\n";
}
// ... computations ...
}
Usage:
fit<true>(data, epsilon); // Will print logs.
fit<false>(data, epsilon); // Will not print logs with zero overhead.
Nowadays almost any compiler will optimize all such checks out while compiling.
A more flexible approach is to pass a logger as a template parameter:
class ConsoleLogger
{
public:
template<typename... Args>
static void log(Args... args) {
// Print args using compile-time recursion.
// Anyway, this prototype is only an example.
// Your may define any logging interface you wish.
}
};
class FileLogger
{
// Some implementation of log() ...
};
class RemoteCloudLogger
{
// Some implementation of log() ...
};
class NullLogger
{
template<typename... Args>
static void log(Args... args) {
// Intentionally left blank. Any calls will be optimized out.
}
};
template<typename Logger>
bool fit(vector_t const& data, double targetError)
{
// ... computations ...
Logger::log("Current error: ", currentError);
Logger::log("Iterations passed: ", i);
// ... computations ...
}
Usage:
fit<ConsoleLogger>(data, epsilon); // Will log to console.
fit<RemoteCloudLogger>(data, epsilon); // Will log to a file on remote cloud server.
fit<NullLogger>(data, epsilon); // Any calls to Logger::log will be optimized out, yielding zero overhead.
Obviously, you can write a logger class which will collect all logged information to a structure with introspection data.
For example, you might define fit's interface as follows:
template<typename Logger>
bool fit(vector_t const& data, double targetError, IntrospectionData& data)
{...}
and define only two logging classes: IntrospectionDataLogger and NullLogger, whose log methods accept a reference to IntrospectionData structure. And again, the last class contains empty methods that will be thrown away by your compiler.
You could allocate the struct on the stack and pass a reference to the observer. The observer can then do whatever is necessary with the struct, e.g. copy it, print it, display it in a GUI etc. Alternatively, if you want to track only one or two properties you might want to just pass them as separate arguments.
class Observer
{
...
public:
virtual void observe(MyInfoStruct *info) = 0;
}
...
if(hasObservers()) // ideally inline
{
MyInfoStruct info = {currentError, bla, bla};
for(observer : observers)
{
observer->observe(&info);
}
}
That way there should be no overhead when there are no observers except for the if statement. The overhead for emitting the information should be dominated by the virtual calls to the observers.
Further optimizations:
You may want to outline the presumably cold observer iteration code into a separate function to improve code locality for the hot path.
If there are any frequent virtual calls in your code anyway, consider adding an "Observed" version of the interface and try to perform the observation work there, and completely omit it from the original version. If the observed version can easily be injected once upon placement of the observer, you may be able to omit checking for observers redundantly. If you just want to track arguments to a function this is easy, by making the "Observed" version just forward to the original, however if you want to track information while an algorith is running this may not be practical.
Depending on what you want to track, it may be possible to write this as a template:
struct NoObserverDispatch {
bool hasObservers() {return false;}
void observe(MyInfoStruct *) {}
};
struct GenericObserverDispatch {
bool hasObservers() {return !observers.isEmpty();}
void observe(MyInfoStruct *) { for (obs : observers) obs->observe(info); }
private:
vector<unique_ptr<Observer> > observers;
};
template<typename ObserverDispatch>
class Fitter
{
ObserverDispatch obs;
virtual bool fit(vector_t const& data, double targetError)
{
...
if(obs.hasObservers()) { // will be optimized away unless needed
MyInfoStruct info = {currentError, bla, bla};
obs.observe(&info);
}
}
};
Obviously this assumes that you can replace the Fitter whenever an observer is placed. Another option would be to let the user pick the template instantiation, but that may be less clean and has the drawback that you will have to ship the code.
I have some classes implementing some computations which I have
to optimize for different SIMD implementations e.g. Altivec and
SSE. I don't want to polute the code with #ifdef ... #endif blocks
for each method I have to optimize so I tried a couple of other
approaches, but unfotunately I'm not very satisfied of how it turned
out for reasons I'll try to clarify. So I'm looking for some advice
on how I could improve what I have already done.
1.Different implementation files with crude includes
I have the same header file describing the class interface with different
"pseudo" implementation files for plain C++, Altivec and SSE only for the
relevant methods:
// Algo.h
#ifndef ALGO_H_INCLUDED_
#define ALGO_H_INCLUDED_
class Algo
{
public:
Algo();
~Algo();
void process();
protected:
void computeSome();
void computeMore();
};
#endif
// Algo.cpp
#include "Algo.h"
Algo::Algo() { }
Algo::~Algo() { }
void Algo::process()
{
computeSome();
computeMore();
}
#if defined(ALTIVEC)
#include "Algo_Altivec.cpp"
#elif defined(SSE)
#include "Algo_SSE.cpp"
#else
#include "Algo_Scalar.cpp"
#endif
// Algo_Altivec.cpp
void Algo::computeSome()
{
}
void Algo::computeMore()
{
}
... same for the other implementation files
Pros:
the split is quite straightforward and easy to do
there is no "overhead"(don't know how to say it better) to objects of my class
by which I mean no extra inheritance, no addition of member variables etc.
much cleaner than #ifdef-ing all over the place
Cons:
I have three additional files for maintenance; I could put the Scalar
implementation in the Algo.cpp file though and end up with just two but the
inclusion part will look and fell a bit dirtier
they are not compilable units per-se and have to be excluded from the
project structure
if I do not have the specific optimized implementation yet for let's say
SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
I cannot fallback to the plain C++ implementation if nedded; ? is it even possible
to do that in the described scenario ?
I do not feel any structural cohesion in the approach
2.Different implementation files with private inheritance
// Algo.h
class Algo : private AlgoImpl
{
... as before
}
// AlgoImpl.h
#ifndef ALGOIMPL_H_INCLUDED_
#define ALGOIMPL_H_INCLUDED_
class AlgoImpl
{
protected:
AlgoImpl();
~AlgoImpl();
void computeSomeImpl();
void computeMoreImpl();
};
#endif
// Algo.cpp
...
void Algo::computeSome()
{
computeSomeImpl();
}
void Algo::computeMore()
{
computeMoreImpl();
}
// Algo_SSE.cpp
AlgoImpl::AlgoImpl()
{
}
AlgoImpl::~AlgoImpl()
{
}
void AlgoImpl::computeSomeImpl()
{
}
void AlgoImpl::computeMoreImpl()
{
}
Pros:
the split is quite straightforward and easy to do
much cleaner than #ifdef-ing all over the place
still there is no "overhead" to my class - EBCO should kick in
the semantic of the class is much more cleaner at least comparing to the above
that is private inheritance == is implemented in terms of
the different files are compilable, can be included in the project
and selected via the build system
Cons:
I have three additional files for maintenance
if I do not have the specific optimized implementation yet for let's say
SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
I cannot fallback to the plain C++ implementation if nedded
3.Is basically method 2 but with virtual functions in the AlgoImpl class. That
would allow me to overcome the duplicate implementation of plain C++ code if needed
by providing an empty implementation in the base class and override in the derived
although I will have to disable that behavior when I actually implement the optimized
version. Also the virtual functions will bring some "overhead" to objects of my class.
4.A form of tag dispatching via enable_if<>
Pros:
the split is quite straightforward and easy to do
much cleaner than #ifdef ing all over the place
still there is no "overhead" to my class
will eliminate the need for different files for different implementations
Cons:
templates will be a bit more "cryptic" and seem to bring an unnecessary
overhead(at least for some people in some contexts)
if I do not have the specific optimized implementation yet for let's say
SSE I would have to duplicate some code from the plain(Scalar) C++ implementation
I cannot fallback to the plain C++ implementation if needed
What I couldn't figure out yet for any of the variants is how to properly and
cleanly fallback to the plain C++ implementation.
Also I don't want to over-engineer things and in that respect the first variant
seems the most "KISS" like even considering the disadvantages.
You could use a policy based approach with templates kind of like the way the standard library does for allocators, comparators and the like. Each implementation has a policy class which defines computeSome() and computeMore(). Your Algo class takes a policy as a parameter and defers to its implementation.
template <class policy_t>
class algo_with_policy_t {
policy_t policy_;
public:
algo_with_policy_t() { }
~algo_with_policy_t() { }
void process()
{
policy_.computeSome();
policy_.computeMore();
}
};
struct altivec_policy_t {
void computeSome();
void computeMore();
};
struct sse_policy_t {
void computeSome();
void computeMore();
};
struct scalar_policy_t {
void computeSome();
void computeMore();
};
// let user select exact implementation
typedef algo_with_policy_t<altivec_policy_t> algo_altivec_t;
typedef algo_with_policy_t<sse_policy_t> algo_sse_t;
typedef algo_with_policy_t<scalar_policy_t> algo_scalar_t;
// let user have default implementation
typedef
#if defined(ALTIVEC)
algo_altivec_t
#elif defined(SSE)
algo_sse_t
#else
algo_scalar_t
#endif
algo_default_t;
This lets you have all the different implementations defined within the same file (like solution 1) and compiled into the same program (unlike solution 1). It has no performance overheads (unlike virtual functions). You can either select the implementation at run time or get a default implementation chosen by the compile time configuration.
template <class algo_t>
void use_algo(algo_t algo)
{
algo.process();
}
void select_algo(bool use_scalar)
{
if (!use_scalar) {
use_algo(algo_default_t());
} else {
use_algo(algo_scalar_t());
}
}
As requested in the comments, here's a summary of what I did:
Set up policy_list helper template utility
This maintains a list of policies, and gives them a "runtime check" call before calling the first suitable implementaiton
#include <cassert>
template <typename P, typename N=void>
struct policy_list {
static void apply() {
if (P::runtime_check()) {
P::impl();
}
else {
N::apply();
}
}
};
template <typename P>
struct policy_list<P,void> {
static void apply() {
assert(P::runtime_check());
P::impl();
}
};
Set up specific policies
These policies implement a both a runtime test and an actual implementation of the algorithm in question. For my actual problem impl took another template parameter that specified what exactly it was they were implementing, here though the example assumes there is only one thing to be implemented. The runtime tests are cached in a static bool for some (e.g. the Altivec one I used) the test was really slow. For others (e.g. the OpenCL one) the test is actually "is this function pointer NULL?" after one attempt at setting it with dlsym().
#include <iostream>
// runtime SSE detection (That's another question!)
extern bool have_sse();
struct sse_policy {
static void impl() {
std::cout << "SSE" << std::endl;
}
static bool runtime_check() {
static bool result = have_sse();
// have_sse lives in another TU and does some cpuid asm stuff
return result;
}
};
// Runtime OpenCL detection
extern bool have_opencl();
struct opencl_policy {
static void impl() {
std::cout << "OpenCL" << std::endl;
}
static bool runtime_check() {
static bool result = have_opencl();
// have_opencl lives in another TU and does some LoadLibrary or dlopen()
return result;
}
};
struct basic_policy {
static void impl() {
std::cout << "Standard C++ policy" << std::endl;
}
static bool runtime_check() { return true; } // All implementations do this
};
Set per architecture policy_list
Trivial example sets one of two possible lists based on ARCH_HAS_SSE preprocessor macro. You might generate this from your build script, or use a series of typedefs, or hack support for "holes" in the policy_list that might be void on some architectures skipping straight to the next one, without trying to check for support. GCC sets some preprocessor macors for you that might help, e.g. __SSE2__.
#ifdef ARCH_HAS_SSE
typedef policy_list<opencl_policy,
policy_list<sse_policy,
policy_list<basic_policy
> > > active_policy;
#else
typedef policy_list<opencl_policy,
policy_list<basic_policy
> > active_policy;
#endif
You can use this to compile multiple variants on the same platform too, e.g. and SSE and no-SSE binary on x86.
Use the policy list
Fairly straightforward, call the apply() static method on the policy_list. Trust that it will call the impl() method on the first policy that passes the runtime test.
int main() {
active_policy::apply();
}
If you take the "per operation template" approach I mentioned earlier it might be something more like:
int main() {
Matrix m1, m2;
Vector v1;
active_policy::apply<matrix_mult_t>(m1, m2);
active_policy::apply<vector_mult_t>(m1, v1);
}
In that case you end up making your Matrix and Vector types aware of the policy_list in order that they can decide how/where to store the data. You can also use heuristics for this too, e.g. "small vector/matrix lives in main memory no matter what" and make the runtime_check() or another function test the appropriateness of a particular approach to a given implementation for a specific instance.
I also had a custom allocator for containers, which produced suitably aligned memory always on any SSE/Altivec enabled build, regardless of if the specific machine had support for Altivec. It was just easier that way, although it could be a typedef in a given policy and you always assume that the highest priority policy has the strictest allocator needs.
Example have_altivec():
I've included a sample have_altivec() implementation for completeness, simply because it's the shortest and therefore most appropriate for posting here. The x86/x86_64 CPUID one is messy because you have to support the compiler specific ways of writing inline ASM. The OpenCL one is messy because we check some of the implementation limits and extensions too.
#if HAVE_SETJMP && !(defined(__APPLE__) && defined(__MACH__))
jmp_buf jmpbuf;
void illegal_instruction(int sig) {
// Bad in general - https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler
// But actually Ok on this platform in this scenario
longjmp(jmpbuf, 1);
}
#endif
bool have_altivec()
{
volatile sig_atomic_t altivec = 0;
#ifdef __APPLE__
int selectors[2] = { CTL_HW, HW_VECTORUNIT };
int hasVectorUnit = 0;
size_t length = sizeof(hasVectorUnit);
int error = sysctl(selectors, 2, &hasVectorUnit, &length, NULL, 0);
if (0 == error)
altivec = (hasVectorUnit != 0);
#elif HAVE_SETJMP_H
void (*handler) (int sig);
handler = signal(SIGILL, illegal_instruction);
if (setjmp(jmpbuf) == 0) {
asm volatile ("mtspr 256, %0\n\t" "vand %%v0, %%v0, %%v0"::"r" (-1));
altivec = 1;
}
signal(SIGILL, handler);
#endif
return altivec;
}
Conclusion
Basically you pay no penalty for platforms that can never support an implementation (the compiler generates no code for them) and only a small penalty (potentially just a very predictable by the CPU test/jmp pair if your compiler is half-decent at optimising) for platforms that could support something but don't. You pay no extra cost for platforms that the first choice implementation runs on. The details of the runtime tests vary between the technology in question.
If the virtual function overhead is acceptable, option 3 plus a few ifdefs seems a good compromise IMO. There are two variations that you could consider: one with abstract base class, and the other with the plain C implementation as the base class.
Having the C implementation as the base class lets you gradually add the vector optimized versions, falling back on the non-vectorized versions as you please, using an abstract interface would be a little cleaner to read.
Also, having separate C++ and vectorized versions of your class let you easily write unit tests that
Ensure that the vectorized code is giving the right result (easy to mess this up, and vector floating registers can have different precision than FPU, causing different results)
Compare the performance of the C++ vs the vectorized. It's often good to make sure the vectorized code is actually doing you any good. Compilers can generate very tight C++ code that sometimes does as well or better than vectorized code.
Here's one with the plain-c++ implementations as the base class. Adding an abstract interface would just add a common base class to all three of these:
// Algo.h:
class Algo_Impl // Default Plain C++ implementation
{
public:
virtual ComputeSome();
virtual ComputeSomeMore();
...
};
// Algo_SSE.h:
class Algo_Impl_SSE : public Algo_Impl // SSE
{
public:
virtual ComputeSome();
virtual ComputeSomeMore();
...
};
// Algo_Altivec.h:
class Algo_Impl_Altivec : public Algo_Impl // Altivec implementation
{
public:
virtual ComputeSome();
virtual ComputeSomeMore();
...
};
// Client.cpp:
Algo_Impl *myAlgo = 0;
#ifdef SSE
myAlgo = new Algo_Impl_SSE;
#elseif defined(ALTIVEC)
myAlgo = new Algo_Impl_Altivec;
#else
myAlgo = new Algo_Impl_Default;
#endif
...
You may consider to employ adapter patterns. There are a few types of adapters and it's quite an extensible concept. Here is an interesting article Structural Patterns: Adapter and Façade
that discusses very similar matter to the one in your question - the Accelerate framework as an example of the Adapter patter.
I think it is a good idea to discuss a solution on the level of design patterns without focusing on implementation detail like C++ language. Once you decide that the adapter states the right solutiojn for you, you can look for variants specific to your implemementation. For example, in C++ world there is known adapter variant called generic adapter pattern.
This isn't really a whole answer: just a variant on one of your existing options. In option 1 you've assumed that you include algo_altivec.cpp &c. into algo.cpp, but you don't have to do this. You could omit algo.cpp entirely, and have your build system decide which of algo_altivec.cpp, algo_sse.cpp, &c. to build. You'd have to do something like this anyway whichever option you use, since each platform can't compile every implementation; my suggestion is only that whichever option you choose, instead of having #if ALTIVEC_ENABLED everywhere in the source, where ALTIVEC_ENABLED is set from the build system, you just have the build system decide directly whether to compile algo_altivec.cpp .
This is a bit trickier to achieve in MSVC than make, scons, &c., but still possible. It's commonplace to switch in a whole directory rather than individual source files; that is, instead of algo_altivec.cpp and friends, you'd have platform/altivec/algo.cpp, platform/sse/algo.cpp, and so one. This way, when you have a second algorithm you need platform-specific implementations for, you can just add the extra source file to each directory.
Although my suggestion's mainly intended to be a variant of option 1, you can combine this with any of your options, to let you decide in the build system and at runtime which options to offer. In that case, though, you'll probably need implementation-specific header files too.
In order to hide the implementation details you may just use an abstract interface with static creator and provide three 3 implementation classes:
// --------------------- Algo.h ---------------------
#pragma once
typedef boost::shared_ptr<class Algo> AlgoPtr;
class Algo
{
public:
static AlgoPtr Create(std::string type);
~Algo();
void process();
protected:
virtual void computeSome() = 0;
virtual void computeMore() = 0;
};
// --------------------- Algo.cpp ---------------------
class PlainAlgo: public Algo { ... };
class AltivecAlgo: public Algo { ... };
class SSEAlgo: public Algo { ... };
static AlgoPtr Algo::Create(std::string type) { /* Factory implementation */ }
Please note, that since PlainAlgo, AlivecAlgo and SSEAlgo classes are defined in Algo.cpp, they are only seen from this compilation unit and therefore the implementation details hidden from the outside world.
Here is how one can use your class then:
AlgoPtr algo = Algo::Create("SSE");
algo->Process();
It seems to me that your first strategy, with separate C++ files and #including the specific implementation, is the simplest and cleanest. I would only add some comments to your Algo.cpp indicating which methods are in the #included files.
e.g.
// Algo.cpp
#include "Algo.h"
Algo::Algo() { }
Algo::~Algo() { }
void Algo::process()
{
computeSome();
computeMore();
}
// The following methods are implemented in separate,
// platform-specific files.
// void Algo::computeSome()
// void Algo::computeMore()
#if defined(ALTIVEC)
#include "Algo_Altivec.cpp"
#elif defined(SSE)
#include "Algo_SSE.cpp"
#else
#include "Algo_Scalar.cpp"
#endif
Policy-like templates (mixins) are fine until the requirement to fall back to default implementation. It's runtime opeation and should be handled by runtime polymorphism. Strategy pattern can handle this fine.
There's one drawback of this approach: Strategy-like algorithm implemented cannot be inlined. Such inlining can provide reasonable performance improvement in rare cases. If this is an issue you'll need to cover higher-level logic by Strategy.
For debugging, I would like to add some counter variables to my class. But it would be nice to do it without changing the header to cause much recompiling.
If Ive understood the keyword correctly, the following two snippets would be quite identical. Assuming of course that there is only one instance.
class FooA
{
public:
FooA() : count(0) {}
~FooA() {}
void update()
{
++count;
}
private:
int count;
};
vs.
class FooB
{
public:
FooB() {}
~FooB() {}
void update()
{
static int count = 0;
++count;
}
};
In FooA, count can be accessed anywhere within the class, and also bloats the header, as the variable should be removed when not needed anymore.
In FooB, the variable is only visible within the one function where it exists. Easy to remove later. The only drawback I can think of is the fact that FooB's count is shared among all instances of the class, but thats not a problem in my case.
Is this correct use of the keyword? I assume that once count is created in FooB, it stays created and is not re-initialized to zero every call to update.
Are there any other caveats or headsup I should be aware of?
Edit: After being notified that this would cause problems in multithreaded environments, I clarify that my codebase is singlethreaded.
Your assumptions about static function variables are correct. If you access this from multiple threads, it may not be correct. Consider using InterlockedIncrement().
What you really want, for your long term C++ toolbox is a threadsafe, general purpose debug counters class that allows you to drop it in anywhere and use it, and be accessible from anywhere else to print it. If your code is performance sensitive, you probably want it to automatically do nothing in non-debug builds.
The interface for such a class would probably look like:
class Counters {
public:
// Counters singleton request pattern.
// Counters::get()["my-counter"]++;
static Counters& get() {
if (!_counters) _counters = new Counters();
}
// Bad idea if you want to deal with multithreaded things.
// If you do, either provide an Increment(int inc_by); function instead of this,
// or return some sort of atomic counter instead of an int.
int& operator[](const string& key) {
if (__DEBUG__) {
return _counter_map.operator[](key);
} else {
return _bogus;
}
}
// you have to deal with exposing iteration support.
private:
Counters() {}
// Kill copy and operator=
void Counters(const Counters&);
Counters& operator=(const Counters&);
// Singleton member.
static Counters* _counters;
// Map to store the counters.
std::map<string, int> _counter_map;
// Bogus counter for opt builds.
int _bogus;
};
Once you have this, you can drop it in at will wherever you want in your .cpp file by calling:
void Foo::update() {
// Leave this in permanently, it will automatically get killed in OPT.
Counters::get()["update-counter"]++;
}
And in your main, if you have built in iteration support, you do:
int main(...) {
...
for (Counters::const_iterator i = Counters::get().begin(); i != Countes::get().end(); ++i) {
cout << i.first << ": " << i.second;
}
...
}
Creating the counters class is somewhat heavy weight, but if you are doing a bunch of cpp coding, you may find it useful to write once and then be able to just link it in as part of any lib.
The major problems with static variables occur when they are used together with multi-threading. If your app is single-threaded, what you are doing is quite correct.
What I usually do in this situation is to put count in a anonymous namespace in the source file for the class. This means that you can add/remove the variable at will, it can can used anywhere in the file, and there is no chance of a name conflict. It does have the drawback that it can only be used in functions in the source file, not inlined functions in the header file, but I think that is what you want.
In file FooC.cpp
namespace {
int count=0;
}
void FooC::update()
{
++count;
}