How fast is dynamic_cast<> - c++

... approximately compared to a typical std::string::operator==()? I give some more details below, I'm not sure if they are of any relevance. Answer with complexity or approximation is good enough. Thanks!
Details: I will use it inside a for loop over a list to find some specific instances. I estimate my average level of inheritance to 3.5 classes. The one I'm looking for has a parent class, a grandparent and above that two "interfaces", i.e. to abstract classes with a couple of virtual void abc() = 0;.
There is no sub-class to the one I'll be looking for.

It depends hugely on your compiler, your particular class hierarchy, the hardware, all sorts of factors. You really need to measure it directly inside your particular application. You can use rdtsc or (on Windows) QueryPerformanceCounter to get a relatively high-precision timer for that purpose. Be sure to time loops or sleds of several thousand dynamic_cast<>s, because even QPC only has a ¼μs resolution.
In our app, a dynamic_cast<> costs about 1 microsecond, and a string comparison about 3ns/character.
Both dynamic_cast<> and stricmp() are at the top of our profiles, which means the performance cost of using them is significant. (Frankly in our line of work it's unacceptable to have those functions so high on the profile and I've had to go and rewrite a bunch of someone else's code that uses them.)

The best answer is to measure, my guess would be that dynamic_cast is faster than comparing any but the shortest strings (or perhaps even than short strings).
That being said, trying to determine the type of an object is usually a sign of bad design, according to Liskov's substitution principle you should just treat the object the same and have the virtual functions behave the correct way without examining the type from the outside.
Edit: After re-reading your question I'll stick to There is no sub-class to the one I'll be looking for.
In that case you can use typeid directly, I believe it should be faster than either of your options (although looking for a specific type is still a code smell in my opinion)
#include <iostream>
#include <typeinfo>
struct top {
virtual ~top() {}
};
struct left : top { };
struct right : top { };
int main()
{
left lft;
top &tp = lft;
std::cout << std::boolalpha << (typeid(lft) == typeid(left)) << std::endl;
std::cout << std::boolalpha << (typeid(tp) == typeid(left)) << std::endl;
std::cout << std::boolalpha << (typeid(tp) == typeid(right)) << std::endl;
}
Output:
true
true
false

Related

Why is this variable returning 32766?

I wrote a very basic evolution algorithm. The way it's supposed to work is that the user types in the desired value, and the amount of generations to try to reach it. Then, the program will run through, taking the nearest value in an array to the goal and mutating it four times (while also leaving the original, in case it's right) to try and get closer to the goal. In theory, it should take roughly |n|/2 generations to reach the value, as mutations happen in either one or two points.
Here's the code to demonstrate what I mean:
#include <iostream>
using namespace std;
int gen [5] = {0, 0, 0, 0, 0}; int goal; int gens; int best; int i = 0; int fit;
int dif(int in) {
return abs(gen[in] - goal);
}
void nextgen() {
int fit [5] = {dif(1), dif(2), dif(3), dif(4), dif(5)};
best = *max_element(fit, fit + 6);
int gen [5] = {best - 2, best - 1, best, best + 1, best + 2};
}
int main() {
cout << "Goal: "; cin >> goal; cout << "Gens: "; cin >> gens;
while(i < gens) {
nextgen(); cout << "Generation " << i + 1 << ": " << best << "\n";
i = i + 1;
}
}
It's pretty simple code. However, it seems that the int best bit of the output is returning 32766 every time, no matter what I do. Do you know what I've done wrong?
I've tried outputting the entire generation (which is even worse––a jumbled mess of non user friendly data that appears meaningless), I've reworked the code, I've added varibles and functions to try and pin down exactly where the error is, and I watched the entire code aesthetic youtube channel to make sure this looked good for you guys.
Looks like you're driving C++ without a license or safety belt. Joke aside, please keep trying and learning. But with C/C++ you should always enable compiler warnings. The godbolt link in the comment from #user4581301 is really good, the compiler flags -Wall -Wextra -pedantic -O2 -fsanitize=address,undefined are all best practice. (I would add -Werror.)
Why you got 32766 is possible to analyze with a debugger, but it's not meaningful. A number close to 32768 (=2^15) should trigger all the warning bells (could be an integer overflow). Your code is accessing uninitialized memory (among other issues), leading to what is called undefined behaviour. This means it may produce different output depending on your machine, compiler, optimization flags, OS, standard libraries, etc. - even adding a debug-print could change what it does.
For optimization algorithms (like GAs) it's also super easy to fool yourself into thinking that your implementation is correct, because the optimization will find a way to avoid (or exploit) any bugs. I've had one in my NN implementation that was accessing some data from the previous example by accident, and it took several days until I even noticed there was a problem.
If you want to focus on the algorithms, I suggest to start with a different language (anything except C/C++/Assembly). My advice would be either Python (though it can be 50x slower, it's much easier to learn and write) or Rust (just as fast as C++ and just as complicated, but no undefined behaviour). With Rust, every mistake in your code above would have given you either a warning by default, a compiler error, or a runtime error instead of wrong output. Though C++ with the flags mentioned above does the same for your specific code.

What optimizations are enabled by non-type template parameters?

I found this example at cppreference.com, and it seems to be the defacto example used through-out StackOverflow:
template<int N>
struct S {
int a[N];
};
Surely, non-type templatization has more value than this example. What other optimizations does this syntax enable? Why was it created?
I am curious, because I have code that is dependent on the version of a separate library that is installed. I am working in an embedded environment, so optimization is important, but I would like to have readable code as well. That being said, I would like to use this style of templating to handle version differences (examples below). First, am I thinking of this correctly, and MOST IMPORTANTLY does it provide a benefit or drawback over using a #ifdef statement?
Attempt 1:
template<int VERSION = 500>
void print (char *s);
template<int VERSION>
void print (char *s) {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
template<>
void print<500> (char *s) {
// print using 500 syntax
}
template<>
void print<600> (char *s) {
// print using 600 syntax
}
OR - Since the template is constant at compile time, could a compiler consider the other branches of the if statement dead code using syntax similar to:
Attempt 2:
template<int VERSION = 500>
void print (char *s) {
if (VERSION == 500) {
// print using 500 syntax
} else if (VERSION == 600) {
// print using 600 syntax
} else {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
}
Would either attempt produce output comparable in size to this?
void print (char *s) {
#if defined(500)
// print using 500 syntax
#elif defined(600)
// print using 600 syntax
#else
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
#endif
}
If you can't tell I'm somewhat mystified by all this, and the deeper the explanation the better as far as I'm concerned.
Compilers find dead code elimination easy. That is the case where you have a chain of ifs depending (only) on a template parameter's value or type. All branches must contain valid code, but when compiled and optimized the dead branches evaporate.
A classic example is a per pixel operation written with template parameters that control details of code flow. The body can be full of branches, yet the compiled output branchless.
Similar techniques can be used to unroll loops (say scanline loops). Care must be taken to understand the code size multiplication that can result: especially if your compiler lacks ICF (aka comdat folding) such as the gold gcc linker and msvc (among others) have.
Fancier things can also be done, like manual jump tables.
You can do pure compile time type checks with no runtime behaviour at alll stuff like dimensional analysis. Or distinguish between points and vectors in n-space.
Enums can be used to name types or switches. Pointers to functions to enable efficient inlining. Pointers to data to allow 'global' state that is mockable, or siloable, or decoupled from implementation. Pointers to strings to allow efficient readable names in code. Lists of integral values for myriads of purposes, like the indexes trick to unpack tuples. Complex operations on static data, like compile time sorting of data in multiple indexes, or checking integrity of static data with complex invariants.
I am sure I missed some.
An obvious optimization is when using an integer, the compiler has a constant rather than a variable:
int foo(size_t); // definition not visible
// vs
template<size_t N>
size_t foo() {return N*N;}
With the template, there's nothing to compute at runtime, and the result may be used as a constant, which can aid other optimizations. You can take this example further by declaring it constexpr, as 5gon12eder mentioned below.
Next example:
int foo(double, size_t); // definition not visible
// vs
template<size_t N>
size_t foo(double p) {
double r(p);
for (size_t i(0) i < N; ++i) {
r *= p;
}
return r;
}
Ok. Now the number of iterations of the loop is known. The loop may be unrolled/optimized accordingly, which can be good for size, speed, and eliminating branches.
Also, basing off your example, std::array<> exists. std::array<> can be much better than std::vector<> in some contexts, because std::vector<> uses heap allocations and non-local memory.
There's also the possibility that some specializations will have different implementations. You can separate those and (potentially) reduce other referenced definitions.
Of course, templates<> can also work against you unnecessarily duplication of your programs.
templates<> also require longer symbol names.
Getting back to your version example: Yes, it's certainly possible that if VERSION is known at compilation, the code which is never executed can be deleted and you may also be able to reduce referenced functions. The primary difference will be that void print (char *s) will have a shorter name than the template (whose symbol name includes all template parameters). For one function, that's counting bytes. For complex programs with many functions and templates, that cost can go up quickly.
There is an enormous range of potential applications of non-typename template parameters. In his book The C++ Programming Language, Stroustrup gives an interesting example that sketches out a type-safe zero-overhead framework for dealing with physical quantities. Basically, the idea is that he writes a template that accepts integers denoting the powers of fundamental physical quantities such as length or mass and then defines arithmetic on them. In the resulting framework, you can add speed with speed or divide distance by time but you cannot add mass to time. Have a look at Boost.Units for an industry-strength implementation of this idea.
For your second question. Any reasonable compiler should be able to produce exactly the same machine code for
#define FOO
#ifdef FOO
do_foo();
#else
do_bar();
#endif
and
#define FOO_P 1
if (FOO_P)
do_foo();
else
do_bar();
except that the second version is much more readable and the compiler can catch errors in both branches simultaneously. Using a template is a third way to generate the same code but I doubt that it will improve readability.

Performance of dynamic_cast?

Before reading the question:
This question is not about how useful it is to use dynamic_cast. Its just about its performance.
I've recently developed a design where dynamic_cast is used a lot.
When discussing it with co-workers almost everyone says that dynamic_cast shouldn't be used because of its bad performance (these are co-workers which have different backgrounds and in some cases do not know each other. I'm working in a huge company)
I decided to test the performance of this method instead of just believing them.
The following code was used:
ptime firstValue( microsec_clock::local_time() );
ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);
ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;
The above code uses methods from boost::date_time on Linux to get usable values.
I've done 3 dynamic_cast in one execution, the code for measuring them is the same.
The results of 1 execution were the following:
Cast1 lasts: 74 microsec
Cast2 lasts: 2 microsec
Cast3 lasts: 1 microsec
The first cast always took 74-111 microsec, the following casts in the same execution took 1-3 microsec.
So finally my questions:
Is dynamic_cast really performing bad?
According to the testresults its not. Is my testcode correct?
Why do so much developers think that it is slow if it isn't?
Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.
Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).
Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL. All of this takes up cycles.
I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...
Performance is meaningless without comparing equivalent functionality. Most people say dynamic_cast is slow without comparing to equivalent behavior. Call them out on this. Put another way:
If 'works' isn't a requirement, I can write code that fails faster than yours.
There are various ways to implement dynamic_cast, and some are faster than others. Stroustrup published a paper about using primes to improve dynamic_cast, for example. Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.
However, not using dynamic_cast will always be faster than using it — but if you don't actually need dynamic_cast, then don't use it! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.
Here are a few benchmarks:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html
According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.
I'll quote the conclusion from the first article:
dynamic_cast is slow for anything but casting to the base type; that
particular cast is optimized out
the inheritance level has a big impact on dynamic_cast
member variable + reinterpret_cast is the fastest reliable way to
determine type; however, that has a lot higher maintenance overhead
when coding
Absolute numbers are on the order of 100 ns for a single cast. Values like 74 msec doesn't seem close to reality.
Your mileage may vary, to understate the situation.
The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast seems odd, since in most cases that takes zero instructions for practical purposes, as does e.g. a cast from unsigned to int).
I've been looking into how it works in clang/g++. Assuming that you are dynamic_casting from a B* to a D*, where B is a (direct or indirect) base of D, and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:
for dynamic_cast<D*>( p ) where p is B*
type_info const * curr_typ = &typeid( *p );
while(1) {
if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
if( *curr_typ == typeid(B)) return nullptr; //failed
curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}
So, yes, it's pretty fast when *p is actually a D; just one successful type_info compare.
The worst case is when the cast fails, and there are a lot of steps from D to B; in this case there are a lot of failed type comparisons.
How long does type comparison take? it does this, on clang/g++:
compare_eq( type_info const &a, type_info const & b ){
if( &a == &b) return true; // same object
return strcmp( a.name(), b.name())==0;
}
The strcmp is needed since it's possible to have two distinct string objects providing the type_info.name() for the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). But, in most cases, when types are actually equal, they reference the same type name string; thus most successful type comparisons are very fast.
The name() method just returns a pointer to a fixed string containing the mangled name of the class.
So there's another factor: if many of the classes on the way from D to B have names starting with MyAppNameSpace::AbstractSyntaxNode<, then the failing compares are going to take longer than usual; the strcmp won't fail until it reaches a difference in the mangled type names.
And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not. So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.
Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not. Microsecond resolution is nowhere near good enough. We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.
There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.
I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code. But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.
The question doesn't mention the alternative.
Prior to RTTI being widely available, or simply to avoid using RTTI, the traditional method is to use a virtual method to check the type of the class, and then static_cast as appropriate. This has the disadvantage that it doesn't work for multiple inheritance, but has the advantage that it doesn't have to spend time checking a multiple inheritance hierarchy either!
In my tests:
dynamic_cast runs at about 14.4953 nanoseconds.
Checking a virtual method and static_casting runs at about twice the speed, 6.55936 nanoseconds.
This is for testing with a 1:1 ratio of valid:invalid casts, using the following code with optimisations disabled. I used Windows for performance checking.
#include <iostream>
#include <windows.h>
struct BaseClass
{
virtual int GetClass() volatile
{ return 0; }
};
struct DerivedClass final : public BaseClass
{
virtual int GetClass() volatile final override
{ return 1; }
};
volatile DerivedClass *ManualCast(volatile BaseClass *lp)
{
if (lp->GetClass() == 1)
{
return static_cast<volatile DerivedClass *>(lp);
}
return nullptr;
}
LARGE_INTEGER perfFreq;
LARGE_INTEGER startTime;
LARGE_INTEGER endTime;
void PrintTime()
{
float seconds = static_cast<float>(endTime.LowPart - startTime.LowPart) / static_cast<float>(perfFreq.LowPart);
std::cout << "T=" << seconds << std::endl;
}
BaseClass *Make()
{
return new BaseClass();
}
BaseClass *Make2()
{
return new DerivedClass();
}
int main()
{
volatile BaseClass *base = Make();
volatile BaseClass *derived = Make2();
int unused = 0;
const int t = 1000000000;
QueryPerformanceFrequency(&perfFreq);
QueryPerformanceCounter(&startTime);
for (int n = 0; n < t; ++n)
{
volatile DerivedClass *alpha = dynamic_cast<volatile DerivedClass *>(base);
volatile DerivedClass *beta = dynamic_cast<volatile DerivedClass *>(derived);
unused += alpha ? 1 : 0;
unused += beta ? 1 : 0;
}
QueryPerformanceCounter(&endTime);
PrintTime();
QueryPerformanceCounter(&startTime);
for (int n = 0; n < t; ++n)
{
volatile DerivedClass *alpha = ManualCast(base);
volatile DerivedClass *beta = ManualCast(derived);
unused += alpha ? 1 : 0;
unused += beta ? 1 : 0;
}
QueryPerformanceCounter(&endTime);
PrintTime();
std::cout << unused;
delete base;
delete derived;
}

How far to go with a strongly typed language?

Let's say I am writing an API, and one of my functions take a parameter that represents a channel, and will only ever be between the values 0 and 15. I could write it like this:
void Func(unsigned char channel)
{
if(channel < 0 || channel > 15)
{ // throw some exception }
// do something
}
Or do I take advantage of C++ being a strongly typed language, and make myself a type:
class CChannel
{
public:
CChannel(unsigned char value) : m_Value(value)
{
if(channel < 0 || channel > 15)
{ // throw some exception }
}
operator unsigned char() { return m_Value; }
private:
unsigned char m_Value;
}
My function now becomes this:
void Func(const CChannel &channel)
{
// No input checking required
// do something
}
But is this total overkill? I like the self-documentation and the guarantee it is what it says it is, but is it worth paying the construction and destruction of such an object, let alone all the additional typing? Please let me know your comments and alternatives.
If you wanted this simpler approach generalize it so you can get more use out of it, instead of tailor it to a specific thing. Then the question is not "should I make a entire new class for this specific thing?" but "should I use my utilities?"; the latter is always yes. And utilities are always helpful.
So make something like:
template <typename T>
void check_range(const T& pX, const T& pMin, const T& pMax)
{
if (pX < pMin || pX > pMax)
throw std::out_of_range("check_range failed"); // or something else
}
Now you've already got this nice utility for checking ranges. Your code, even without the channel type, can already be made cleaner by using it. You can go further:
template <typename T, T Min, T Max>
class ranged_value
{
public:
typedef T value_type;
static const value_type minimum = Min;
static const value_type maximum = Max;
ranged_value(const value_type& pValue = value_type()) :
mValue(pValue)
{
check_range(mValue, minimum, maximum);
}
const value_type& value(void) const
{
return mValue;
}
// arguably dangerous
operator const value_type&(void) const
{
return mValue;
}
private:
value_type mValue;
};
Now you've got a nice utility, and can just do:
typedef ranged_value<unsigned char, 0, 15> channel;
void foo(const channel& pChannel);
And it's re-usable in other scenarios. Just stick it all in a "checked_ranges.hpp" file and use it whenever you need. It's never bad to make abstractions, and having utilities around isn't harmful.
Also, never worry about overhead. Creating a class simply consists of running the same code you would do anyway. Additionally, clean code is to be preferred over anything else; performance is a last concern. Once you're done, then you can get a profiler to measure (not guess) where the slow parts are.
Yes, the idea is worthwhile, but (IMO) writing a complete, separate class for each range of integers is kind of pointless. I've run into enough situations that call for limited range integers that I've written a template for the purpose:
template <class T, T lower, T upper>
class bounded {
T val;
void assure_range(T v) {
if ( v < lower || upper <= v)
throw std::range_error("Value out of range");
}
public:
bounded &operator=(T v) {
assure_range(v);
val = v;
return *this;
}
bounded(T const &v=T()) {
assure_range(v);
val = v;
}
operator T() { return val; }
};
Using it would be something like:
bounded<unsigned, 0, 16> channel;
Of course, you can get more elaborate than this, but this simple one still handles about 90% of situations pretty well.
No, it is not overkill - you should always try to represent abstractions as classes. There are a zillion reasons for doing this and the overhead is minimal. I would call the class Channel though, not CChannel.
Can't believe nobody mentioned enum's so far. Won't give you a bulletproof protection, but still better than a plain integer datatype.
Looks like overkill, especially the operator unsigned char() accessor. You're not encapsulating data, you're making evident things more complicated and, probably, more error-prone.
Data types like your Channel are usually a part of something more abstracted.
So, if you use that type in your ChannelSwitcher class, you could use commented typedef right in the ChannelSwitcher's body (and, probably, your typedef is going to be public).
// Currently used channel type
typedef unsigned char Channel;
Whether you throw an exception when constructing your "CChannel" object or at the entrance to the method that requires the constraint makes little difference. In either case you're making runtime assertions, which means the type system really isn't doing you any good, is it?
If you want to know how far you can go with a strongly typed language, the answer is "very far, but not with C++." The kind of power you need to statically enforce a constraint like, "this method may only be invoked with a number between 0 and 15" requires something called dependent types--that is, types which depend on values.
To put the concept into pseudo-C++ syntax (pretending C++ had dependent types), you might write this:
void Func(unsigned char channel, IsBetween<0, channel, 15> proof) {
...
}
Note that IsBetween is parameterized by values rather than types. In order to call this function in your program now, you must provide to the compiler the second argument, proof, which must have the type IsBetween<0, channel, 15>. Which is to say, you have to prove at compile-time that channel is between 0 and 15! This idea of types which represent propositions, whose values are proofs of those propositions, is called the Curry-Howard Correspondence.
Of course, proving such things can be difficult. Depending on your problem domain, the cost/benefit ratio can easily tip in favor of just slapping runtime checks on your code.
Whether something is overkill or not often depends on lots of different factors. What might be overkill in one situation might not in another.
This case might not be overkill if you had lots of different functions that all accepted channels and all had to do the same range checking. The Channel class would avoid code duplication, and also improve readability of the functions (as would naming the class Channel instead of CChannel - Neil B. is right).
Sometimes when the range is small enough I will instead define an enum for the input.
If you add constants for the 16 different channels, and also a static method that fetches the channel for a given value (or throws an exception if out of range) then this can work without any additional overhead of object creation per method call.
Without knowing how this code is going to be used, it's hard to say if it's overkill or not or pleasant to use. Try it out yourself - write a few test cases using both approaches of a char and a typesafe class - and see which you like. If you get sick of it after writing a few test cases, then it's probably best avoided, but if you find yourself liking the approach, then it might be a keeper.
If this is an API that's going to be used by many, then perhaps opening it up to some review might give you valuable feedback, since they presumably know the API domain quite well.
In my opinion, I don't think what you are proposing is a big overhead, but for me, I prefer to save the typing and just put in the documentation that anything outside of 0..15 is undefined and use an assert() in the function to catch errors for debug builds. I don't think the added complexity offers much more protection for programmers who are already used to C++ language programming which contains alot of undefined behaviours in its specs.
You have to make a choice. There is no silver bullet here.
Performance
From the performance perspective, the overhead isn't going to be much if at all. (unless you've got to counting cpu cycles) So most likely this shouldn't be the determining factor.
Simplicity/ease of use etc
Make the API simple and easy to understand/learn.
You should know/decide whether numbers/enums/class would be easier for the api user
Maintainability
If you are very sure the channel
type is going to be an integer in
the foreseeable future , I would go
without the abstraction (consider
using enums)
If you have a lot of use cases for a
bounded values, consider using the
templates (Jerry)
If you think, Channel can
potentially have methods make it a
class right now.
Coding effort
Its a one time thing. So always think maintenance.
The channel example is a tough one:
At first it looks like a simple limited-range integer type, like you find in Pascal and Ada. C++ gives you no way to say this, but an enum is good enough.
If you look closer, could it be one of those design decisions that are likely to change? Could you start referring to "channel" by frequency? By call letters (WGBH, come in)? By network?
A lot depends on your plans. What's the main goal of the API? What's the cost model? Will channels be created very frequently (I suspect not)?
To get a slightly different perspective, let's look at the cost of screwing up:
You expose the rep as int. Clients write a lot of code, the interface is either respected or your library halts with an assertion failure. Creating channels is dirt cheap. But if you need to change the way you're doing things, you lose "backward bug-compatibility" and annoy authors of sloppy clients.
You keep it abstract. Everybody has to use the abstraction (not so bad), and everybody is futureproofed against changes in the API. Maintaining backwards compatibility is a piece of cake. But creating channels is more costly, and worse, the API has to state carefully when it is safe to destroy a channel and who is responsible for the decision and the destruction. Worse case scenario is that creating/destroying channels leads to a big memory leak or other performance failure—in which case you fall back to the enum.
I'm a sloppy programmer, and if it were for my own work, I'd go with the enum and eat the cost if the design decision changes down the line. But if this API were to go out to a lot of other programmers as clients, I'd use the abstraction.
Evidently I'm a moral relativist.
An integer with values only ever between 0 and 15 is an unsigned 4-bit integer (or half-byte, nibble. I imagine if this channel switching logic would be implemented in hardware, then the channel number might be represented as that, a 4-bit register).
If C++ had that as a type you would be done right there:
void Func(unsigned nibble channel)
{
// do something
}
Alas, unfortunately it doesn't. You could relax the API specification to express that the channel number is given as an unsigned char, with the actual channel being computed using a modulo 16 operation:
void Func(unsigned char channel)
{
channel &= 0x0f; // truncate
// do something
}
Or, use a bitfield:
#include <iostream>
struct Channel {
// 4-bit unsigned field
unsigned int n : 4;
};
void Func(Channel channel)
{
// do something with channel.n
}
int main()
{
Channel channel = {9};
std::cout << "channel is" << channel.n << '\n';
Func (channel);
}
The latter might be less efficient.
I vote for your first approach, because it's simpler and easier to understand, maintain, and extend, and because it is more likely to map directly to other languages should your API have to be reimplemented/translated/ported/etc.
This is abstraction my friend! It's always neater to work with objects

Pure/const functions in C++

I'm thinking of using pure/const functions more heavily in my C++ code. (pure/const attribute in GCC)
However, I am curious how strict I should be about it and what could possibly break.
The most obvious case are debug outputs (in whatever form, could be on cout, in some file or in some custom debug class). I probably will have a lot of functions, which don't have any side effects despite this sort of debug output. No matter if the debug output is made or not, this will absolutely have no effect on the rest of my application.
Or another case I'm thinking of is the use of some SmartPointer class which may do some extra stuff in global memory when being in debug mode. If I use such an object in a pure/const function, it does have some slight side effects (in the sense that some memory probably will be different) which should not have any real side effects though (in the sense that the behaviour is in any way different).
Similar also for mutexes and other stuff. I can think of many complex cases where it has some side effects (in the sense of that some memory will be different, maybe even some threads are created, some filesystem manipulation is made, etc) but has no computational difference (all those side effects could very well be left out and I would even prefer that).
So, to summarize, I want to mark functions as pure/const which are not pure/const in a strict sense. An easy example:
int foo(int) __attribute__((const));
int bar(int x) {
int sum = 0;
for(int i = 0; i < 100; ++i)
sum += foo(x);
return sum;
}
int foo_callcounter = 0;
int main() {
cout << "bar 42 = " << bar(42) << endl;
cout << "foo callcounter = " << foo_callcounter << endl;
}
int foo(int x) {
cout << "DEBUG: foo(" << x << ")" << endl;
foo_callcounter++;
return x; // or whatever
}
Note that the function foo is not const in a strict sense. Though, it doesn't matter what foo_callcounter is in the end. It also doesn't matter if the debug statement is not made (in case the function is not called).
I would expect the output:
DEBUG: foo(42)
bar 42 = 4200
foo callcounter = 1
And without optimisation:
DEBUG: foo(42) (100 times)
bar 42 = 4200
foo callcounter = 100
Both cases are totally fine because what only matters for my usecase is the return value of bar(42).
How does it work out in practice? If I mark such functions as pure/const, could it break anything (considering that the code is all correct)?
Note that I know that some compilers might not support this attribute at all. (BTW., I am collecting them here.) I also know how to make use of thes attributes in a way that the code stays portable (via #defines). Also, all compilers which are interesting to me support it in some way; so I don't care about if my code runs slower with compilers which do not.
I also know that the optimised code probably will look different depending on the compiler and even the compiler version.
Very relevant is also this LWN article "Implications of pure and constant functions", especially the "Cheats" chapter. (Thanks ArtemGr for the hint.)
I'm thinking of using pure/const functions more heavily in my C++ code.
That’s a slippery slope. These attributes are non-standard and their benefit is restricted mostly to micro-optimizations.
That’s not a good trade-off. Write clean code instead, don’t apply such micro-optimizations unless you’ve profiled carefully and there’s no way around it. Or not at all.
Notice that in principle these attributes are quite nice because they state implied assumptions of the functions explicitly for both the compiler and the programmer. That’s good. However, there are other methods of making similar assumptions explicit (including documentation). But since these attributes are non-standard, they have no place in normal code. They should be restricted to very judicious use in performance-critical libraries where the author tries to emit best code for every compiler. That is, the writer is aware of the fact that only GCC can use these attributes, and has made different choices for other compilers.
You could definitely break the portability of your code. And why would you want to implement your own smart pointer - learning experience apart? Aren't there enough of them available for you in (near) standard libraries?
I would expect the output:
I would expect the input:
int bar(int x) {
return foo(x) * 100;
}
Your code actually looks strange for me. As a maintainer I would think that either foo actually has side effects or more likely rewrite it immediately to the above function.
How does it work out in practice? If I mark such functions as pure/const, could it break anything (considering that the code is all correct)?
If the code is all correct then no. But the chances that your code is correct are small. If your code is incorrect then this feature can mask out bugs:
int foo(int x) {
globalmutex.lock();
// complicated calculation code
return -1;
// more complicated calculation
globalmutex.unlock();
return x;
}
Now given the bar from above:
int main() {
cout << bar(-1);
}
This terminates with __attribute__((const)) but deadlocks otherwise.
It also highly depends on the implementation. For example:
void f() {
for(;;)
{
globalmutex.unlock();
cout << foo(42) << '\n';
globalmutex.lock();
}
}
Where the compiler should move the call foo(42)? Is it allowed to optimize this code? Not in general! So unless the loop is really trivial you have no benefits of your feature. But if your loop is trivial you can easily optimize it yourself.
EDIT: as Albert requested a less obvious situation, here it comes:
F
or example if you implement operator << for an ostream, you use the ostream::sentry which locks the stream buffer. Suppose you call pure/const f after you released or before you locked it. Someone uses this operator cout << YourType() and f also uses cout << "debug info". According to you the compiler is free to put the invocation of f into the critical section. Deadlock occurs.
I would examine the generated asm to see what difference they make. (My guess would be that switching from C++ streams to something else would yield more of a real benefit, see: http://typethinker.blogspot.com/2010/05/are-c-iostreams-really-slow.html )
I think nobody knows this (with the exception of gcc programmers), simply because you rely on undefined and undocumented behaviour, which can change from version to version. But how about something like this:
#ifdef NDEBUG \
#define safe_pure __attribute__((pure)) \
#else \
#define safe_pure \
#endif
I know it's not exactly what you want, but now you can use the pure attribute without breaking the rules.
If you do want to know the answer, you may ask in the gcc forums (mailing list, whatever), they should be able to give you the exact answer.
Meaning of the code: When NDEBUG (symbol used in assert macros) is defined, we don't debug, have no side effects, can use pure attribute. When it is defined, we have side effects, so it won't use pure attribute.