Can/do compilers simplify logical expressions involving functions? - c++

Some functions which calculate booleans:
bool a()
{
return trueorfalse;
}
bool b()
{
//...
}
bool c()
{
//...
}
This condition
//somewhere else
if((a()&&b()&&c()) || (a()&&b()&&!c()) )
{
doSomething();
}
can also be written as
if(a()&&b())
{
doSomething();
}
Will compilers usually optimize this away?
And what about pure boolean values:
if((a&&b&&c) || (a&&b&&!c))
{
doSomething();
}

Since the functions may have side effects, the conditional cannot be "optimized" in any way, since all the functions will have to be called (conditionally) in a well-defined manner.
If you do want optimization, you can assign the result to variables first:
const bool ba = a(), bb = b(), bc = c();
if (ba && bb && bc || ba && bb && !bc) { /* ... */ } // probably optimized to "ba && bb"
It's possible that constexpr functions introduced in C++11 will allow for optimization if they yield a constant expression, though, but I'm not sure.
You can even condense this down: In the following code, f() has to be called twice:
if (f() && false || f() && true)
{
// ...
}

No they won't. The reason why is that the optimization would be visible to the user because it would change the observable side effects. For example In your optimized version c() would never execute even though the user explicitly tried to do so. This can and will lead to bugs.

Since your premise a flawed, no, they won't.
(a()&&b()&&c()) || (a()&&b()&&!c()) definitely can't be rewritten as (a()&&b())
C (and C++) isn't a functional programming language (like Haskell).

But the problem is that it can't be refactored in that way, generally speaking!
If any of the functions have side effects that change the result of c() then the second call would possibly return a different result from the first one.
Not only that, but due to short-circuit execution things could be muddied even further.

Very often in C the return value of a function gives whether the function was executed successfully of not. For example calling a graphics routine, converting a file. Think how often you use pointers to change something external to the function. Or call another function that outputs something. As someone said this isn't functional programming.
If the compiler is able to determine that foo() changes and does nothing then it may by all means simplify it but I would NOT count on it.
Here is a very simple example
bool foo()
{
std::cout << "this needs to be printed each time foo() is called, even though its called in a logical expression\n";
return true;
}
int main()
{
if ((foo() && !(foo()) || foo() && !(foo())))
return 0;
return 1;
}
Edit any boolean algebra of variables should be simplified.

Related

How to deal with declaration of the primitive type without the initial value known (C++)?

In some cases it happens for me to declare a variable without knowing its value first like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
Is it the standard professional practice to assign some clearly wrong value like -1000 anyway (making potential bugs more reproducible) or it is preferred not to add the code that does nothing useful as long as there are no bugs? From one side, looks reasonable to remove randomness, from the other side, magical and even "clearly wrong" numbers somehow do not look attractive.
In many cases it is possible to declare when the value is first known, or use a ternary operator, but here we would need it nested so also rather clumsy.
Declaring inside the block would move the variable out of the scope prematurely.
Or would this case justify the usage of std::optional<int> a and assert(a) later, making sure we have the value?
EDIT: The bugs I am talking about would occur if suddenly all 3 conditions are false that should "absolutely never happen".
As far as I know the most popular and safest way is using inline lambda call. Note that the if should be exhaustive (I added SOME_DEFAULT_VALUE as a placeholder). I suppose that if you don't know what to put in final else block you should consider a few options:
using optional and putting none in the else,
throwing exception that describes the problem,
putting assert if logically this situation should never happen
const int a = [&] {
if (c1) {
return 1;
} else if (c2) {
return 2;
} else if (c3) {
return -3;
} else {
return SOME_DEFAULT_VALUE;
}
}();
do_something_with(a);
In a situation when the initialization logic duplicates somewhere you can simply extract the lambda to a named function as other answers suggest
In my opinion, the safest option, if you dont want this other value (its just useless), then it may lead to really subtle bug which may be hard to find. Therefore I would throw an expectation when any of the conditions is not met:
int get_init_value(bool c1, bool c2, bool c3) {
if (c1) { return 1; }
else if (c2) { return 2; }
else if (c3) { return -3; }
throw std::logic_error("noone of conditions to define value was met");
}
That way we avoid getting some weird values that want actually match our code, but they would compile anyways ( debugging it may take a lot of time). I consider it way better than just assigning it some clearly wrong value.
Opinion based answer!
I know the example is a simplification of a real, more complex example, but IMHO it seems nowadays this kind of design issue emerge more often, and people sometimes kinda tend to over-complicate it.
Isn't it the whole purpose of a variable to hold some value? Thus isn't having a default value for this variable also a feasible thing?
So what exactly is wrong with:
int a = -1000; // or some other value meant to used for undefined
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
}
do_something_with(a);
It is simple and readable... No lambdas, exceptions and other stuff making the code unnecessary complicated...
Or like:
int a;
if (c1) {
a = 1;
} else if (c2) {
a = 2;
} else if (c3) {
a = -3;
} else {
a = -1000; // default for unkown state
}
do_something_with(a);
You could introduce a constant const int undefined = -1000; and use the constant.
Or an enum if c1, c2, c3 are states in some sort (which it most likely is)...
You could rearrange the code to eliminate the variable if it is not needed elsewhere.
if (c1) {
do_something_with(1);
} else if (c2) {
do_something_with(2);
} else if (c3) {
do_something_with(-3);
}
I would introduce a default value. I'm usually using MAX value of the type for this.
Shortest you can do this with the ternary operator like this:
#include <climits>
int a = c1 ? 1 : c2 ? 2 : c3 ? -3 : INT_MAX;
do_something_with(a);
I understand your real code is much more complicated than the outline presented, but IMHO the main problem here is
should we do_something_with(a) at all if a is undefined,
rather than
what the initial value should be.
And the solution might be adding explicitly some status flag like a_is_defined to the actual parameter a instead of using magic constans.
int a = 0;
bool a_is_defined = false;
When you set them both according to c... conditions and pass them to do_something() you'll be able to make a clear distinction between a specific if(a_is_defined) {...} path and a default (error handling?) else {...}.
Or even provide separate routines to explicitly handle both paths one level earlier: if(a_is_defined) do_someting_with(a); else do_something_else();.

Jump as an alternative to RTTI

I am learning how c++ is compiled into assembly and I found how exceptions works under the hood very interesting. If its okay to have more then one execution paths for exceptions why not for normal functions.
For example, lets say you have a function that can return a pointer to class A or something derived from A. The way your supposed to do it is with RTTI.
But why not, instead, have the called function, after computing the return value, jump back to the caller function into the specific location that matchs up with the return type. Like how exceptions, the execution flow can go normal or, if it throws, it lands in one of your catch handlers.
Here is my code:
class A
{
public:
virtual int GetValue() { return 0; }
};
class B : public A
{
public:
int VarB;
int GetValue() override { return VarB; }
};
class C : public A
{
public:
int VarC;
int GetValue() override { return VarC; }
};
A* Foo(int i)
{
if(i == 1) return new B;
if(i == 2)return new C;
return new A;
}
void main()
{
A* a = Foo(2);
if(B* b = dynamic_cast<B*>(a))
{
b->VarB = 1;
}
else if(C* c = dynamic_cast<C*>(a)) // Line 36
{
c->VarC = 2;
}
else
{
assert(a->GetValue() == 0);
}
}
So instead of doing it with RTTI and dynamic_cast checks, why not have the Foo function just jump to the appropriate location in main. So in this case Foo returns a pointer to C, Foo should instead jump to line 36 directly.
Whats wrong with this? Why aren't people doing this? Is there a performance reason? I would think this would be cheaper then RTTI.
Or is this just a language limitation, regardless if its a good idea or not?
First of all, there are million different ways of defining the language. C++ is defined as it is defined. Nice or not really does not matter. If you want to improve the language, you are free to write a proposal to C++ committee. They will review it and maybe include in future standards. Sometimes this happens.
Second, although exceptions are dispatched under the hood, there are no strong reasons to think that this is more efficient comparing your handwritten code that uses RTTI. Exception dispatch still requires CPU cycles. There is no miracle there. The real difference is that for using RTTI you need to write the code yourself, while the exception dispatch code is generated for you by compiler.
You may want to call you function 10000 times and find out what will run faster: RTTI based code or exception dispatch.

Elegant way for "if(T t = ...) { } else return t;"?

Is there a better way for this "idiom"?
if(State s = loadSomething()) { } else return s;
In other words, I want to do something, which may return an error (with a message) or a success state, and if there was an error I want to return it. This can become very repetitive, so I want to shorten it. For example
if(State s = loadFoobar(&loadPointer, &results)) { } else return s;
if(State s = loadBaz(&loadPointer, &results)) { } else return s;
if(State s = loadBuz(&loadPointer, &results)) { } else return s;
This must not use exceptions which I would favor otherwise (unsuitable for this build). I could write up a little class BooleanNegator<State> that stores the value, and negates its boolean evaluation. But I want to avoid doing this ad-hoc, and prefer a boost/standard solution.
You could do:
for (State s = loadSomething(); !s; ) return s;
but I am not sure if it is more elegant, and it is definitely less readable...
I assume the context is something like
State SomeFunction()
{
if(State s = loadSomething()) { } else return s;
return do_something_else();
}
without throwing exceptions where do_something_else() does something of relevance to SomeFunction() and returns a State. Either way, the result of continuing within the function needs to result in a State being returned, as falling off the end will cause the caller to exhibit undefined behaviour.
In that case, I would simply restructure the function to
State SomeFunction()
{
if (State s = loadSomething())
return do_something_else();
else
return s;
}
Implicit assumptions are that State has some operator (e.g. operator bool()) that can be tested, that copying a State is possible (implied by the existence of a loadSomething() that returns one) and relatively inexpensive, and that two instances of State can exist at one time.
Aside from some smart/hacky uses of different keywords to get the same behavior, or adding more-or-less complex extra templates or macros to get unless() keyword or to somehow manage to inject ! operator, I'd stick to just the basic things.
This is one of the places I'd (probably) inject extra "unnecessary" brackets:
void someFunction()
{
// some other code
{ State s = loadSomething(); if(!s) return s; }
// some other code
}
However, in this exact case, I'd expand it to emphasise the return keyword, which can be easily overlooked when it's squashed to a one-liner. So, unless the one-liner is repeated many times and unless it's clear&obvious that there's a return inside, I'd probably write:
void someFunction()
{
// some other code
{
State s = loadSomething();
if(!s)
return s;
}
// some other code
}
It might look like elevating the scope of the s, but actually it is equivalent to declaring State s in if(). All thanks to the extra brackets which explicitly limit the visibility of local s.
However, some people just "hate" seeing { .. } not coupled with a keyword/class/function/etc, or even consider it to be unreadable due to "suggesting that a if/while/etc keyword was accidentally deleted".
One more idea came to me after you added the repetitive example. You could have tried a trick known from scripting languages where && and || may return a non-bool values:
State s = loadFoobar(&loadPointer, &results);
s = s || loadBaz(&loadPointer, &results);
s = s || loadBuz(&loadPointer, &results);
if(!s) return s;
however there's a problem: in contrast to script languages, in C++ such overloads of && and || lose their short-circuit semantics which makes this attempt pointless.
However, as dyp pointed out the obvious thing, once the s scope is elevated, now simple if can be introduced back. Its visibility can be limited back again with extra {}:
{
State s;
if(!(s = loadFoobar(&loadPointer, &results))) return s;
if(!(s = loadBaz(&loadPointer, &results))) return s;
if(!(s = loadBuz(&loadPointer, &results))) return s;
}

C++ inline function & context specific optimization

I have read in Scott Meyers' Effective C++ book that:
When you inline a function you may enable the compiler to perform context specific optimizations on the body of function. Such optimization would be impossible for normal function calls.
Now the question is: what is context specific optimization and why it is necessary?
I don't think "context specific optimization" is a defined term, but I think it basically means the compiler can analyse the call site and the code around it and use this information to optimise the function.
Here's an example. It's contrived, of course, but it should demonstrate the idea:
Function:
int foo(int i)
{
if (i < 0) throw std::invalid_argument("");
return -i;
}
Call site:
int bar()
{
int i = 5;
return foo(i);
}
If foo is compiled separately, it must contain a comparison and exception-throwing code. If it's inlined in bar, the compiler sees this code:
int bar()
{
int i = 5;
if (i < 0) throw std::invalid_argument("");
return -i;
}
Any sane optimiser will evaluate this as
int bar()
{
return -5;
}
If the compile choose to inline a function, it will replace a function call to this function by the body of the function. It now has more code to optimize inside the caller function body. Therefore, it often leads to better code.
Imagine that:
bool callee(bool a){
if(a) return false;
else return true;
}
void caller(){
if(callee(true)){
//Do something
}
//Do something
}
Once inlined, the code will be like this (approximatively):
void caller(){
bool a = true;
bool ret;
if(a) ret = false;
else ret = true;
if(ret){
//Do something
}
//Do something
}
Which may be optimized further too:
void caller(){
if(false){
//Do something
}
//Do something
}
And then to:
void caller(){
//Do something
}
The function is now much smaller and you don't have the cost of the function call and especially (regarding the question) the cost of branching.
Say the function is
void fun( bool b) { if(b) do_sth1(); else do_sth2(); }
and it is called in the context with pre-defined false parameter
bool param = false;
...
fun( param);
then the compiler may reduce the function body to
...
do_sth2();
I don't think that context specific optimization means something specific and you probably can't find exact definition.
Nice example would be classical getter for some class attributes, without inlining it program has to:
jump to getter body
move value to registry (eax on x86 under windows with default Visual studio settings)
jump back to callee
move value from eax to local variable
While using inlining can skip almost all the work and move value directly to local variable.
Optimizations strictly depend on compiler but lot of think can happen (variable allocation may be skipped, code may get reorder and so on... But you always save call/jump which is expensive instruction.
More reading on optimisation here.

"missing return statement", but I know it is there

Assume I have the following function:
// Precondition: foo is '0' or 'MAGIC_NUMBER_4711'
// Returns: -1 if foo is '0'
// 1 if foo is 'MAGIC_NUMBER_4711'
int transmogrify(int foo) {
if (foo == 0) {
return -1;
} else if (foo == MAGIC_NUMBER_4711) {
return 1;
}
}
The compiler complains "missing return statement", but I know that foo never has different values than 0 or MAGIC_NUMBER_4711, or else my function shall have no defined semantics.
What are preferable solutions to this?
Is this really an issue, i.e. what does the standard say?
Sometimes, your compiler is not able to deduce that your function actually has no missing return. In such cases, several solutions exist:
Assume the following simplified code (though modern compilers will see that there is no path leak, just exemplary):
if (foo == 0) {
return bar;
} else {
return frob;
}
Restructure your code
if (foo == 0) {
return bar;
}
return frob;
This works good if you can interpret the if-statement as a kind of firewall or precondition.
abort()
if (foo == 0) {
return bar;
} else {
return frob;
}
abort(); return -1; // unreachable
Return something else accordingly. The comment tells fellow programmers and yourself why this is there.
throw
#include <stdexcept>
if (foo == 0) {
return bar;
} else {
return frob;
}
throw std::runtime_error ("impossible");
Disadvantages of Single Function Exit Point
flow of control control
Some fall back to one-return-per-function a.k.a. single-function-exit-point as a workaround. This might be seen as obsolete in C++ because you almost never know where the function will really exit:
void foo(int&);
int bar () {
int ret = -1;
foo (ret);
return ret;
}
Looks nice and looks like SFEP, but reverse engineering the 3rd party proprietary libfoo reveals:
void foo (int &) {
if (rand()%2) throw ":P";
}
This argument does not hold true if bar() is nothrow and so can only call nothrow functions.
complexity
Every mutable variable increases the complexity of your code and puts a higher burden on the cerebral capacity on your code's maintainer. It means more code and more state to test and verify, in turn means that you suck off more state from the maintainers brain, in turn means less maintainer's brain capacity left for the important stuff.
missing default constructor
Some classes have no default construction and you would have to write really bogus code, if possible at all:
File mogrify() {
File f ("/dev/random"); // need bogus init because it requires readable stream
...
}
That's quite a hack just to get it declared.
In C89 and in C99, the return statement is never required. Even if it is a function that has a return different than void.
C99 only says:
(C99, 6.9.1p12 "If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined."
In C++11, the Standard says:
(C++11, 6.6.3p2) "Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function"
Just because you can tell that the input will only have one of two values doesn't mean the compiler can, so it's expected that it will generate such a warning.
You have a couple options for helping the compiler figure this out.
You could use an enumerated type for which the two values are the only valid enumerated values. Then the compiler can tell immediately that one of the two branches has to execute and there's no missing return.
You could abort at the end of the function.
You could throw an appropriate exception at the end of the function.
Note that the latter two options are better than silencing the warning because it predictably shows you when the pre-conditions are violated rather than allowing undefined behavior. Since the function takes an int and not a class or enumerated type, it's only a matter of time before someone calls it with a value other than the two allowed values and you want to catch those as early in the development cycle as possible rather than pushing them off as undefined behavior because it violated the function's requirements.
Actually the compiler is doing exactly what it should.
int transmogrify(int foo) {
if (foo == 0) {
return -1;
} else if (foo == MAGIC_NUMBER_4711) {
return 1;
}
// you know you shouldn't get here, but the compiler has
// NO WAY of knowing that. In addition, you are putting
// great potential for the caller to create a nice bug.
// Why don't you catch the error using an ELSE clause?
else {
error( "transmorgify had invalid value %d", foo ) ;
return 0 ;
}
}