Should you write your functions like this:
int foo()
{
if (bar)
return 1;
return 2;
}
Or like this?
int foo()
{
if (bar)
return 1;
else
return 2;
}
Is one way objectively better than the other, or is it a matter of personal preference? If one way is better, why?
Nothing different in the matter of performance. Here Its personal preference. I prefer the second one as it is clean and understandable. Experienced folks can grasp both the syntax right away But when you place the first syntax in front of a new programmer he will get confused.
I prefer neat code (keeping into account both readability and performance). When there is certainly no performance improvement with the first syntax I would choose second syntax.
I try to have one exit point from a function whenever possible. Makes the code more maintainable and debugable. So I'd do something like this:
int foo()
{
var retVal;
if (bar) {
retVal = 1;
}
else {
retVal = 2;
}
return retVal;
}
Or this if you want to be more concise...
int foo()
{
var retVal = 2;
if (bar) {
retVal = 1;
}
return retVal;
}
there is no performance improvement in any of the statements above and totally your choice. compiler is smart enough to figure out what you are trying to do. personally i prefer the first one because that means less code. however in c#, msil generated will be the same for both scenarios.
Related
In some cases it happens for me to declare a variable without knowing its value first like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
Is it the standard professional practice to assign some clearly wrong value like -1000 anyway (making potential bugs more reproducible) or it is preferred not to add the code that does nothing useful as long as there are no bugs? From one side, looks reasonable to remove randomness, from the other side, magical and even "clearly wrong" numbers somehow do not look attractive.
In many cases it is possible to declare when the value is first known, or use a ternary operator, but here we would need it nested so also rather clumsy.
Declaring inside the block would move the variable out of the scope prematurely.
Or would this case justify the usage of std::optional<int> a and assert(a) later, making sure we have the value?
EDIT: The bugs I am talking about would occur if suddenly all 3 conditions are false that should "absolutely never happen".
As far as I know the most popular and safest way is using inline lambda call. Note that the if should be exhaustive (I added SOME_DEFAULT_VALUE as a placeholder). I suppose that if you don't know what to put in final else block you should consider a few options:
using optional and putting none in the else,
throwing exception that describes the problem,
putting assert if logically this situation should never happen
const int a = [&] {
if (c1) {
return 1;
} else if (c2) {
return 2;
} else if (c3) {
return -3;
} else {
return SOME_DEFAULT_VALUE;
}
}();
do_something_with(a);
In a situation when the initialization logic duplicates somewhere you can simply extract the lambda to a named function as other answers suggest
In my opinion, the safest option, if you dont want this other value (its just useless), then it may lead to really subtle bug which may be hard to find. Therefore I would throw an expectation when any of the conditions is not met:
int get_init_value(bool c1, bool c2, bool c3) {
if (c1) { return 1; }
else if (c2) { return 2; }
else if (c3) { return -3; }
throw std::logic_error("noone of conditions to define value was met");
}
That way we avoid getting some weird values that want actually match our code, but they would compile anyways ( debugging it may take a lot of time). I consider it way better than just assigning it some clearly wrong value.
Opinion based answer!
I know the example is a simplification of a real, more complex example, but IMHO it seems nowadays this kind of design issue emerge more often, and people sometimes kinda tend to over-complicate it.
Isn't it the whole purpose of a variable to hold some value? Thus isn't having a default value for this variable also a feasible thing?
So what exactly is wrong with:
int a = -1000; // or some other value meant to used for undefined
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
It is simple and readable... No lambdas, exceptions and other stuff making the code unnecessary complicated...
Or like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
} else {
a = -1000; // default for unkown state
}
do_something_with(a);
You could introduce a constant const int undefined = -1000; and use the constant.
Or an enum if c1, c2, c3 are states in some sort (which it most likely is)...
You could rearrange the code to eliminate the variable if it is not needed elsewhere.
if (c1) {
do_something_with(1);
} else if (c2) {
do_something_with(2);
} else if (c3) {
do_something_with(-3);
}
I would introduce a default value. I'm usually using MAX value of the type for this.
Shortest you can do this with the ternary operator like this:
#include <climits>
int a = c1 ? 1 : c2 ? 2 : c3 ? -3 : INT_MAX;
do_something_with(a);
I understand your real code is much more complicated than the outline presented, but IMHO the main problem here is
should we do_something_with(a) at all if a is undefined,
rather than
what the initial value should be.
And the solution might be adding explicitly some status flag like a_is_defined to the actual parameter a instead of using magic constans.
int a = 0;
bool a_is_defined = false;
When you set them both according to c... conditions and pass them to do_something() you'll be able to make a clear distinction between a specific if(a_is_defined) {...} path and a default (error handling?) else {...}.
Or even provide separate routines to explicitly handle both paths one level earlier: if(a_is_defined) do_someting_with(a); else do_something_else();.
I am facing a situation where i need cppchecks to pass but it gets tricky sometimes. What do you generally do in such circumstances ?
For example.
#include<iostream>
using namespace std;
void fun1();
int fun2();
int main()
{
fun1();
}
void fun1()
{
int retVal;
if (-1 == (retVal = fun2()))
{
cout <<"Failure. fun2 returned a -1"<< endl;
}
}
int fun2()
{
return -1;
}
We usually see code such as the above. cppcheck for the above file would give an output as below -
cppcheck --suppress=redundantAssignment
--enable='warning,style,performance,portability' --inline-suppr --language='c++' retval_neverused.cpp Checking retval_neverused.cpp... [retval_neverused.cpp:13]: (style) Variable 'retVal' is assigned a
value that is never used.
I don't want to add some dummy line printing retVal just for the sake of cppcheck. Infact it can be a situation where I throw an exception and I don't want the exception to have something trivial as the value of retVal in it.
CppCheck is kinda right though. You don't need retVal at all. just check the return value of fun2 directly: if( -1 == fun2() )
As an aside, assigning variables inside conditional expressions is really bad practice. It makes it a lot harder to catch typos where you meant to type == but actually typed =.
You could rewrite as:
const int retval = fun2();
if (retval == -1)
This technique is, IMHO, easier to debug because you can see, with a debugger, the value returned from fun2 before the if statement is executed.
Debugging with the function call in the if expression is a little more complicated to see the return value from the function.
One common way is with something like this:
#define OK_UNUSED(x) (void)x
void fun1()
{
int retVal;
if (-1 == (retVal = fun2()))
{
OK_UNUSED (retVal);
cout <<"Failure. fun2 returned a -1"<< endl;
}
}
This indicates to humans that retVal is intentionally unused and makes CppCheck think it's used, suppressing the warning.
Note that this macro should not be used if evaluating its parameter has consequences. In that case, you need something fancier like:
#define OK_UNUSED(x) if(false && (x)) ; else (void) 0
I am learning how c++ is compiled into assembly and I found how exceptions works under the hood very interesting. If its okay to have more then one execution paths for exceptions why not for normal functions.
For example, lets say you have a function that can return a pointer to class A or something derived from A. The way your supposed to do it is with RTTI.
But why not, instead, have the called function, after computing the return value, jump back to the caller function into the specific location that matchs up with the return type. Like how exceptions, the execution flow can go normal or, if it throws, it lands in one of your catch handlers.
Here is my code:
class A
{
public:
virtual int GetValue() { return 0; }
};
class B : public A
{
public:
int VarB;
int GetValue() override { return VarB; }
};
class C : public A
{
public:
int VarC;
int GetValue() override { return VarC; }
};
A* Foo(int i)
{
if(i == 1) return new B;
if(i == 2)return new C;
return new A;
}
void main()
{
A* a = Foo(2);
if(B* b = dynamic_cast<B*>(a))
{
b->VarB = 1;
}
else if(C* c = dynamic_cast<C*>(a)) // Line 36
{
c->VarC = 2;
}
else
{
assert(a->GetValue() == 0);
}
}
So instead of doing it with RTTI and dynamic_cast checks, why not have the Foo function just jump to the appropriate location in main. So in this case Foo returns a pointer to C, Foo should instead jump to line 36 directly.
Whats wrong with this? Why aren't people doing this? Is there a performance reason? I would think this would be cheaper then RTTI.
Or is this just a language limitation, regardless if its a good idea or not?
First of all, there are million different ways of defining the language. C++ is defined as it is defined. Nice or not really does not matter. If you want to improve the language, you are free to write a proposal to C++ committee. They will review it and maybe include in future standards. Sometimes this happens.
Second, although exceptions are dispatched under the hood, there are no strong reasons to think that this is more efficient comparing your handwritten code that uses RTTI. Exception dispatch still requires CPU cycles. There is no miracle there. The real difference is that for using RTTI you need to write the code yourself, while the exception dispatch code is generated for you by compiler.
You may want to call you function 10000 times and find out what will run faster: RTTI based code or exception dispatch.
Is there a better way for this "idiom"?
if(State s = loadSomething()) { } else return s;
In other words, I want to do something, which may return an error (with a message) or a success state, and if there was an error I want to return it. This can become very repetitive, so I want to shorten it. For example
if(State s = loadFoobar(&loadPointer, &results)) { } else return s;
if(State s = loadBaz(&loadPointer, &results)) { } else return s;
if(State s = loadBuz(&loadPointer, &results)) { } else return s;
This must not use exceptions which I would favor otherwise (unsuitable for this build). I could write up a little class BooleanNegator<State> that stores the value, and negates its boolean evaluation. But I want to avoid doing this ad-hoc, and prefer a boost/standard solution.
You could do:
for (State s = loadSomething(); !s; ) return s;
but I am not sure if it is more elegant, and it is definitely less readable...
I assume the context is something like
State SomeFunction()
{
if(State s = loadSomething()) { } else return s;
return do_something_else();
}
without throwing exceptions where do_something_else() does something of relevance to SomeFunction() and returns a State. Either way, the result of continuing within the function needs to result in a State being returned, as falling off the end will cause the caller to exhibit undefined behaviour.
In that case, I would simply restructure the function to
State SomeFunction()
{
if (State s = loadSomething())
return do_something_else();
else
return s;
}
Implicit assumptions are that State has some operator (e.g. operator bool()) that can be tested, that copying a State is possible (implied by the existence of a loadSomething() that returns one) and relatively inexpensive, and that two instances of State can exist at one time.
Aside from some smart/hacky uses of different keywords to get the same behavior, or adding more-or-less complex extra templates or macros to get unless() keyword or to somehow manage to inject ! operator, I'd stick to just the basic things.
This is one of the places I'd (probably) inject extra "unnecessary" brackets:
void someFunction()
{
// some other code
{ State s = loadSomething(); if(!s) return s; }
// some other code
}
However, in this exact case, I'd expand it to emphasise the return keyword, which can be easily overlooked when it's squashed to a one-liner. So, unless the one-liner is repeated many times and unless it's clear&obvious that there's a return inside, I'd probably write:
void someFunction()
{
// some other code
{
State s = loadSomething();
if(!s)
return s;
}
// some other code
}
It might look like elevating the scope of the s, but actually it is equivalent to declaring State s in if(). All thanks to the extra brackets which explicitly limit the visibility of local s.
However, some people just "hate" seeing { .. } not coupled with a keyword/class/function/etc, or even consider it to be unreadable due to "suggesting that a if/while/etc keyword was accidentally deleted".
One more idea came to me after you added the repetitive example. You could have tried a trick known from scripting languages where && and || may return a non-bool values:
State s = loadFoobar(&loadPointer, &results);
s = s || loadBaz(&loadPointer, &results);
s = s || loadBuz(&loadPointer, &results);
if(!s) return s;
however there's a problem: in contrast to script languages, in C++ such overloads of && and || lose their short-circuit semantics which makes this attempt pointless.
However, as dyp pointed out the obvious thing, once the s scope is elevated, now simple if can be introduced back. Its visibility can be limited back again with extra {}:
{
State s;
if(!(s = loadFoobar(&loadPointer, &results))) return s;
if(!(s = loadBaz(&loadPointer, &results))) return s;
if(!(s = loadBuz(&loadPointer, &results))) return s;
}
I have a method where performance is really important (I know premature optimization is the root of all evil. I know I should and I did profile my code. In this application every tenth of a second I save is a big win.) This method uses different heuristics to generate and return elements. The heuristics are used sequentially: the first heuristic is used until it can no longer return elements, then the second heuristic is used until it can no longer return elements and so on until all heuristics have been used. On each call of the method I use a switch to move to the right heuristic. This is ugly, but work well. Here is some pseudo code
class MyClass
{
private:
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
// This switch statement will be optimized as a jump table by the compiler.
// Note that there is no break statments between the cases.
switch (m_step)
{
case 0:
if (UseHeuristic1())
{
m_step = 1; // Heuristic one is special it will never provide more than one element.
return theElem;
}
m_step = 1;
case 1:
DoSomeOneTimeInitialisationForHeuristic2();
m_step = 2;
case 2:
if (UseHeuristic2())
{
return theElem;
}
m_step = 3;
case 3:
if (UseHeuristic3())
{
return theElem;
}
m_step = 4; // But the method should not be called again
}
return someErrorCode;
};
}
As I said, this works and it's efficient since at each call, the execution jumps right where it should. If a heuristic can't provide an element, m_step is incremented (so the next time we don't try this heuristic again) and because there is no break statement, the next heuristic is tried. Also note that some steps (like step 1) never return an element, but are one time initialization for the next heuristic.
The reason initializations are not all done upfront is that they might never be needed. It is always possible (and common) for GetElem to not get called again after it returned an element, even if there are still elements it could return.
While this is an efficient implementation, I find it really ugly. The case statement is a hack; using it without break is also hackish; the method gets really long, even if each heuristic is encapsulated in its own method.
How should I refactor this code so it's more readable and elegant while keeping it as efficient as possible?
Wrap each heuristic in an iterator. Initialize it completely on the first call to hasNext(). Then collect all iterators in a list and use a super-iterator to iterate through all of them:
boolean hasNext () {
if (list.isEmpty()) return false;
if (list.get(0).hasNext()) return true;
while (!list.isEmpty()) {
list.remove (0);
if (list.get(0).hasNext()) return true;
}
return false;
}
Object next () {
return list.get (0).next ();
}
Note: In this case, a linked list might be a tiny bit faster than an ArrayList but you should still check this.
[EDIT] Changed "turn each" into "wrap each" to make my intentions more clear.
I don't think your code is so bad, but if you're doing this kind of thing a lot, and you want to hide the mechanisms so that the logic is clearer, you could look at Simon Tatham's coroutine macros. They're intended for C (using static variables) rather than C++ (using member variables), but it's trivial to change that.
The result should look something like this:
Elem GetElem()
{
crBegin;
if (UseHeuristic1())
{
crReturn(theElem);
}
DoSomeOneTimeInitialisationForHeuristic2();
while (UseHeuristic2())
{
crReturn(theElem);
}
while (UseHeuristic3())
{
crReturn(theElem);
}
crFinish;
return someErrorCode;
}
To my mind if you do not need to modify this code much, eg to add new heuristics then document it well and don't touch it.
However if new heuristics are added and removed and you think that this is an error prone process then you should consider refactoring it. The obvious choice for this would be to introduce the State design pattern. This will replace your switch statement with polymorphism which might slow things down but you would have to profile both to be sure.
It looks like there really isn't much to optimize in this code - probably most of the optimization can be done in the UseHeuristic functions. What's in them?
You can turn the control flow inside-out.
template <class Callback> // a callback that returns true when it's done
void Walk(Callback fn)
{
if (UseHeuristic1()) {
if (fn(theElem))
return;
}
DoSomeOneTimeInitialisationForHeuristic2();
while (UseHeuristic2()) {
if (fn(theElem))
return;
}
while (UseHeuristic3()) {
if (fn(theElem))
return;
}
}
This might earn you a few nanoseconds if the switch dispatch and the return statements are throwing the CPU off its stride, and the recipient is inlineable.
Of course, this kind of optimization is futile if the heuristics themselves are nontrivial. And much depends on what the caller looks like.
That's micro optimization, but there is no need to set m_elem value when you are not returning from GetElem. See code below.
Larger optimization definitely need simplifying control flow (less jumps, less returns, less tests, less function calls), because as soon as a jump is done processor cache are emptied (well some processors have branch prediction, but it's no silver bullet). You can give a try at solutions proposed by Aaron or Jason, and there is others (for instance you can implement several get_elem functions annd call them through a function pointer, but I'm quite sure it'll be slower).
If the problem allow it, it can also be efficient to compute several elements at once in heuristics and use some cache, or to make it truly parallel with some thread computing elements and this one merely a customer waiting for results... no way to say more without some details on the context.
class MyClass
{
private:
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
// This switch statement will be optimized as a jump table by the compiler.
// Note that there is no break statments between the cases.
switch (m_step)
{
case 0:
if (UseHeuristic1())
{
m_step = 1; // Heuristic one is special it will never provide more than one element.
return theElem;
}
case 1:
DoSomeOneTimeInitialisationForHeuristic2();
m_step = 2;
case 2:
if (UseHeuristic2())
{
return theElem;
}
case 3:
m_step = 4;
case 4:
if (UseHeuristic3())
{
return theElem;
}
m_step = 5; // But the method should not be called again
}
return someErrorCode;
};
}
What you really can do here is replacing conditional with State pattern.
http://en.wikipedia.org/wiki/State_pattern
May be it would be less performant because of the virtual method call, maybe it would be better performant because of less state maintaining code, but the code would be definitely much clearer and maintainable, as always with patterns.
What could improve performance, is elimination of DoSomeOneTimeInitialisationForHeuristic2();
with separate state between. 1 and 2.
Since each heuristic is represented by a function with an identical signature, you can make a table of function pointers and walk through it.
class MyClass
{
private:
typedef bool heuristic_function();
typedef heuristic_function * heuristic_function_ptr;
static heuristic_function_ptr heuristic_table[4];
unsigned int m_step;
public:
MyClass() : m_step(0) {};
Elem GetElem()
{
while (m_step < sizeof(heuristic_table)/sizeof(heuristic_table[0]))
{
if (heuristic_table[m_step]())
{
return theElem;
}
++m_step;
}
return someErrorCode;
};
};
MyClass::heuristic_function_ptr MyClass::heuristic_table[4] = { UseHeuristic1, DoSomeOneTimeInitialisationForHeuristic2, UseHeuristic2, UseHeuristic3 };
If the element code you are processing can be converted to an integral value, then you can construct a table of function pointers and index based on the element. The table would have one entry for each 'handled' element, and one for each known but unhandled element. For unknown elements, do a quick check before indexing the function pointer table.
Calling the element-processing function is fast.
Here's working sample code:
#include <cstdlib>
#include <iostream>
using namespace std;
typedef void (*ElementHandlerFn)(void);
void ProcessElement0()
{
cout << "Element 0" << endl;
}
void ProcessElement1()
{
cout << "Element 1" << endl;
}
void ProcessElement2()
{
cout << "Element 2" << endl;
}
void ProcessElement3()
{
cout << "Element 3" << endl;
}
void ProcessElement7()
{
cout << "Element 7" << endl;
}
void ProcessUnhandledElement()
{
cout << "> Unhandled Element <" << endl;
}
int main()
{
// construct a table of function pointers, one for each possible element (even unhandled elements)
// note: i am assuming that there are 10 possible elements -- 0, 1, 2 ... 9 --
// and that 5 of them (0, 1, 2, 3, 7) are 'handled'.
static const size_t MaxElement = 9;
ElementHandlerFn handlers[] =
{
ProcessElement0,
ProcessElement1,
ProcessElement2,
ProcessElement3,
ProcessUnhandledElement,
ProcessUnhandledElement,
ProcessUnhandledElement,
ProcessElement7,
ProcessUnhandledElement,
ProcessUnhandledElement
};
// mock up some elements to simulate input, including 'invalid' elements like 12
int testElements [] = {0, 1, 2, 3, 7, 4, 9, 12, 3, 3, 2, 7, 8 };
size_t numTestElements = sizeof(testElements)/sizeof(testElements[0]);
// process each test element
for( size_t ix = 0; ix < numTestElements; ++ix )
{
// for some robustness...
if( testElements[ix] > MaxElement )
cout << "Invalid Input!" << endl;
// otherwise process normally
else
handlers[testElements[ix]]();
}
return 0;
}
If it ain't broke don't fix it.
It looks pretty efficient as is. It doesn't look hard to understand either. Adding iterators etc. is probably going to make it harder to understand.
You are probably better off doing
Performance analysis. Is time really spent in this procedure at all, or is most of it in the functions that it calls? I can't see any significant time being spent here.
More unit tests, to prevent someone from breaking it if they have to modify it.
Additional comments in the code.