Concerning speed, if I need to calculate a large expression, say:
switch1*(large expression 1)+switch2*(large expression 2)
Depending on my input, switch1 can be 0 or 1, as can switch2 be. What would be the quickest for c++ to do, making an if statement or write it down as above?
So, essentially you are asking about Short-Circuit Evaluation, and you are asking whether C++ does it for arithmetic expressions besides boolean expressions.
It is kind of hard to prove a negative, but as far as I know, C++ only does short-circuit evaluation for logical expressions, so there is no equivalent for arithmetic expressions. Also, I can think of plenty of code that would horribly break if arithmetic expressions were being evaluated in a short-circuit fashion, so I don't think that could ever be so.
Theoretically, the compiler could generate code to avoid computing the expression if it could show that it had no side-effects. But in practice, it's unlikely to add code to check if the value is zero because it has no reason to think that it will be.
On the other hand, logical operations (|| and &&) are guaranteed to short-circuit in C++. So you should use them.
result = 0;
if(switch1) {
result += large_expresion_1();
}
if(switch2)
result += large_expression2();
}
But if you are optimising "large expressions" be sure to check whether it's actually quicker to calculate them both then add them in a branchless manner. eg something like
result = ((-(uint64_t)(switch1)) & large_expression_1) + ((-(uint64_t)(switch2)) & large_expression_2);
A bunch of such bithacks are documented here:
https://graphics.stanford.edu/~seander/bithacks.html
Benchmark and separate to that, read the generated assembly language to find out what the compiler is actually doing for (or to) you.
switch1 can be 0 or 1, as can switch2 be
If switch1 and switch2 can really only have values of 0 or 1 then it would be better for them to be booleans rather than integers.
With boolean switches, your statement becomes:
result = (switch1 ? (large expression 1) : 0)
+ (switch2 ? (large expression 2) : 0)
It is the case that in this form, the expressions will be computed even if their result won't be used. A simple and clear way to avoid wasted computation is the obvious one:
result = 0;
if(switch1) {
result += large expression 1;
}
if(switch2) {
result += large expression 2;
}
You could tidy this up by extracting methods, into which you pass the switches:
result = guardedLargeExpression1(switch1, otherparams1)
+ guardedLargeExpression2(switch2, otherparams2);
... with ...
int guardedLargeExpression1(bool switch, foo params) {
if(switch) {
return 0;
}
return large expression(...);
}
You could also do clever stuff with pointers to functions:
int guardedFunctionCall(bool switch, int *functionptr(foo), foo arg) {
if(switch) {
return 0;
}
return (*functionptr)(arg);
}
... which is approaching the kind of thing you'd do in Java when you lazily evaluate code using a Supplier.
Or, since you're in C++ not C, you can do something more OO and actually use the C++ equivalent of Supplier: What is the C++ equivalent of a java.util.function.Supplier?
If depends on the exact conditions. CPU, compiler, the exact expressions.
An if can slow down a program, if if becomes a conditional jump in the assembly code, and the condition cannot be predicted.
If the conditional cannot be predicted, and the "large expressions" are actually simple, it may be faster to do the "multiply-way".
However, if the expressions are slow to calculate, or the if can be branch predicted perfectly (or it doesn't compile to a conditional jump), then I think the if way will be faster.
All in all, you should try both solutions, and check which is faster.
Related
I was messing around with some c++ code after learning that you cannot increment booleans in the standard. I thought incrementing booleans would be useful because I like compacting functions if I can and
unsigned char b = 0; //type doesn't really matter assuming no overflow
while (...) {
if (b++) //do something
...}
is sometimes useful to have, but it would be nice to not have to worry about integer overflow. Since booleans cannot take on any value besides 0 or 1, I thought that perhaps this would work. Of course, you could just have a boolean assigned to 1 after you do your operation but that takes an extra line of code.
Anyway, I thought about how I might achieve this a different way and I came up with this.
bool b = 0;
while (...)
if (b & (b=1)) ...
This turned out to always evaluate to true, even on the first pass through. I thought - sure, its just doing the bitwise operator from right to left, except that when I swapped the order, it did the same thing.
So my question is, how is this expression being evaluated so that it simplifies to always being true?
As an aside, the way to do what I wanted is like this I guess:
bool b = 0;
while (...) if (!b && !(b=1)) //code executes every time except the first iteration
There's a way to get it to only execute once, also.
Or just do the actually readable and obvious solution.
bool b = 0;
while (...) {
if (b) {} else {b=1;}
}
This is what std::exchange is for:
if (std::exchange(b, true)) {
// b was already true
}
As mentioned, sequencing rules mean that b & (b=1) is undefined behaviour in both C and C++. You can't modify a variable and read from it "without proper separation" (e.g., as separate statements).
There is no "strange behavior" here. The & operator, in your case is a bitwise arithmetic operator. These operator are commutative, so both operand must be evaluated before computing the resulting value.
As mentioned earlier, this is considered as undefined behavior, as the standard does not prevent computing + loading into a register the left operand, before doing the same to the second, or doing so in reverse order.
If you are using C++14 or above, you could use std::exchange do achieve this, or re-implement it quite easily using the following :
bool exchange_bool(bool &var, bool &&val) {
bool ret = var;
var = val;
return ret;
}
If you end up re-implementing it, consider using template instead of hard-coded types.
As a proper answer to your question :
So my question is, how is this expression being evaluated so that it simplifies to always being true?
It does not simplify, if we strictly follow the standard, but I'd guess you are always using the same compiler, which end up producing the same output.
Side point: You should not "compact" functions. Most of the time the price of this is less readability. This might be fine if you're the only dev working on a project, but on bigger project, or if you come back to your code later, that could be an issue. Last but not least, "compact" function are more error prone, because of their lack of readability.
If you really want shorter function, you should consider using sub-function that will handle part of the function computation. This would enable you to write short function, to quickly fix or update behavior, without impacting clarity.
I don't see what meaning you can assign to a "bitwise increment" operator. If I am right, what you achieve is just setting the low-order bit, which is done by b|= 1 (following the C conventions, this could have been defined as a unary b||, but the || was assigned another purpose). But I don't feel that this is a useful bitwise operation, and it can increment only once. IMO the standard increment operator is more bitwise.
If your goal was in fact to "increment" a boolean variable, i.e. set it to true (prefix or postfix), b= b || true is a way. (There is no "logical OR assignement" b||= true, and even less a "logical increment" b||||.)
Often I convert some if statements into boolean expressions for code compactness. For instance, if I have something like
foo(int x)
{
if (x > 5) return 100 + 5;
return 100;
}
I'll do it like
foo(int x)
{
return 100 + (x > 5) * 5;
}
This is very simple so no problem, the thing is when I have multiple tests, I can greatly simplify them (at the expense of readability but that's a different issue).
So the question is if that (x > 5) evaluation is as onerous as explicitly branching with it.
In both cases the expression (x > 5) has to be checked if it evaluates to true . And as demonstrated already, both versions compile to the same assembly even without any optimization enabled.
However, the Philosophy section of C++ Core Guidelines has these two rules you would do well to pay heed to:
P.1: Express ideas directly in code
P.3: Express intent
Though these rules cannot be enforced in anyway, adhering to them will make you adopt the version with the if statement.
Doing so will make it less onerous for those who have to maintain the code after you and even yourself a few months later.
You seem to be conflating C++ language constructs with patterns in the assembly. It may have been viable to reason about code on this level given the compilers of the late eighties or early nineties. At this point, however, compilers apply a lot of optimizations and transformations whose correctness or utility is not even obvious to the average programmer. A very simple example is the common beginner's mistake of assuming the following equivalences:
std::uint16_t a = ...;
a *= 2; // a multiplication in assembly
a *= 17; // ditto
a /= 3; // a division in assembly
They may then be surprised to find out that their compiler of choice translates these into the assembly equivalent of e.g.:
a <<= 1u;
a = (a << 4u) + a; // or even (a << 4u) | a if a < 16
a *= 43691u;
Note that the last transformation is only allowed if a is known to be a multiple of the divisor, so you may not see this kind of optimization all too often. How does it even work? In mathematical terms, uint16_t can be thought of as the residue class ring Z/(2^16)Z, and in this ring, there exists a multiplicative inverse for any element that is coprime to 2^16 (i.e. not divisible by 2). If d (e.g. 3) is coprime to 2, it has such an inverse, and then dividing by d is simply equivalent to multiplying by the inverse of d if the remainder is known to be zero. (I won't go into how this inverse can be calculated here.)
Here is another surprising optimization:
long arithsum(long n)
{
long result = 0;
for (long i=0; i<=n; ++i)
result += i;
return result;
}
GCC with -O3 rather mundanely translates this into an unrolled loop of additions. My version (9.0.0svn-something) of Clang, however, will pull a Gauss on you if you do this, and translate this into something like:
long arithsum(long n)
{
return (n * (n+1)) >> 1;
}
Anyway, the same caveats apply to if/switch etc. – while these are control flow structures, and so you'd think they correspond to branching, this may not be so. Likewise, what appears to be a non-branching operation might be translated to a branching operation if the compiler has an optimization rule under which this seems beneficial, or even if it is just unable to translate its own AST or intermediate representation into machine code without use of branching (on the given architecture).
TL;DR: Before you try to outsmart your compiler, figure out which assembly the compiler produces for the straightforward / readable code in this first place. If this assembly is good, there is no point in making the code more subtle / less readable.
Assuming by onerous you mean 1/0. Sure it might work in C/C++ due to implicit typecasting but might not for other languages. If that's what you want to achieve why not use ternary operator (? :) which also makes the code more readable
foo(int x) {
return (x > 5) ? (100 + 5) : 100;
}
Also read this stackoverflow article -- bool to int conversion
Is there any difference in the performance of the following code snippets? Which one performs best and why?
int i = 1000000000;
while(i != 0) { i--; }
or
int i = 1000000000;
while(i) { i--; }
or
int i = 1000000000;
while(i > 0) { i--; }
I see a lot of people use the first example and wonder why. Easier to read?
They are all the same in this context and any decent compiler will generate equivalent code for all three.
In any case, trying to hand-optimize trivial things like this (integer comparisons) is pointless. Your compiler will figure it out and do a much better job during code-gen than you ever could. So just stop trying and instead just write the most readable code you can and then trust the compiler - in any case, none of this makes any performance difference.
Is there any difference in the performance of the following code snippets?
No.
First two are equivalent, and all three can be optimized to exactly same assembly.
I see a lot of people use the first example and wonder why. Easier to read?
It requires the reader to know fewer language rules than the second one. In particular, the second program requires the knowledge that conditional expression is converted to bool, and that the conversion from int has the same result as inequality with zero.
Note that if i were replaced with a floating point number, or if the decrement were modified to have more complexity (for example: decrement by 2), then the third option would be easiest to prove correct. With integers and single decrement, there is no difference.
While answering another question I got curious about this. I'm well aware that
if( __builtin_expect( !!a, 0 ) ) {
// not likely
} else {
// quite likely
}
will make the "quite likely" branch faster (in general) by doing something along the lines of hinting to the processor / changing the assembly code order / some kind of magic. (if anyone can clarify that magic that would also be great).
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1? i.e. will
__builtin_expect( !!a, 0 ) ? /* unlikely */ : /* likely */;
or
int x = __builtin_expect( t / 10, 7 );
if( x == 7 ) {
// likely
} else {
// unlikely
}
or
if( __builtin_expect( a, 3 ) ) {
// likely
// uh-oh, what happens if a is 2?
} else {
// unlikely
}
have any effect? And does all of this depend on the architecture being targeted?
Did you read the GCC documentation?
Built-in Function: long __builtin_expect (long exp, long c)
You may use __builtin_expect to provide the compiler with branch
prediction information. In general, you should prefer to use actual
profile feedback for this (-fprofile-arcs), as programmers are
notoriously bad at predicting how their programs actually perform.
However, there are applications in which this data is hard to collect.
The return value is the value of exp, which should be an integral
expression. The semantics of the built-in are that it is expected that
exp == c. For example:
if (__builtin_expect (x, 0))
foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
foo (*ptr);
when testing pointer or floating-point values.
To explain this a bit... __builtin_expect is specifically useful for communicating which branch you think the program's likely to take. You ask how the compiler can use that insight - well, consider this code:
if (x == 0)
return 10 * y;
else
return 39;
In machine code, the CPU can typically be asked to "goto" another line (which takes time, and depending on the CPU may prevent other execution optimisations - i.e. beneath the level of machine code - for example, see the Branches heading under http://en.wikipedia.org/wiki/Instruction_pipeline), or to call some other code, but there's not really an if/else concept where both true and false code are equal... you have to branch away to find the code for one or the other. The way that's done is basically, in pseudo-code:
test whether x is 0
if it was goto else_return_39
return 10 * y
else_return_39:
return 39
Given most CPUs are slower following the goto down to the else_return_39: label than to just fall through to return 10 * y, code for the "true" branch will be reached faster than for the false branch. Of course, the machine code could test whether x is not 0, put the "false" code (return 39) first and thereby reverse the performance characteristics.
This is what __builtin_expect controls - you can tell the compiler to put the true or the false branch where less branching is needed to reach it, thereby getting a tiny performance boost.
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?
a) Whether the surrounding function is inlined or not doesn't change the need for branching where the if statement appears (unless the optimiser sees the condition the if statement tests is always true or false and only one branch could never run). So, it's equally applicable to inlined code.
[ Your comment shows you were interested in conditional expressions - a ? b : c - I'm not sure - there's a disputed answer to that question at https://stackoverflow.com/questions/14784481/can-i-use-gccs-builtin-expect-with-ternary-operator-in-c that might prove insightful one way or the other, or the basis for further exploration ]
b) variables - you postulated:
int x = __builtin_expect( t / 10, 7 );
if( x == 7 ) {
That won't work - the compiler's not obliged to associate such expectations with variables and remember them the next time an if is seen. You can verify this (as I did for gcc 3.4.4) using gcc -S to produce assembly language output: the assembly doesn't change regardless of the expected value.
c) values other than 0 and 1
It works for integral (long) values, so yes. The last paragraph of the documentation quoted above address this, specifically:
you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
foo (*ptr);
when testing pointer or floating-point values.
Why? Well, if the pointer type is larger than long, then calling __builtin_conversion(long, long) would effectively slice off some of the less-significant bits for the test, and fail to incorporate the (high-order) rest in the test. Similarly, floating point values might be larger than a long, and the conversion not produce the result you expect. By using a boolean expression such as ptr != NULL (given true converts to 1 and false to 0) you're sure to get intended results.
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?
It works for an expression context that is used to determine branching.
So, a) Yes. b) No. c) Yes.
And does all of this depend on the architecture being targeted?
Yes!
It leverages architectures that use instruction pipelining, which allow a CPU to begin working on upcoming instructions before the current instruction has been completed.
(if anyone can clarify that magic that would also be great).
("Branch prediction" complicates this description, so I'm intentionally omitting it)
Any code resembling an if statement implies that an expression may result in the CPU jumping to a different location in the program. These jumps invalidate what's in the CPU's instruction pipeline.
__builtin_expect allows (without guarantee) gcc to try to assemble the code so the likely scenario involves fewer jumps than the alternate.
I've recently (only on SO actually) run into uses of the C/C++ comma operator. From what I can tell, it creates a sequence point on the line between the left and right hand side operators so that you have a predictable (defined) order of evaluation.
I'm a little confused about why this would be provided in the language as it seems like a patch that can be applied to code that shouldn't work in the first place. I find it hard to imagine a place it could be used that wasn't overly complex (and in need of refactoring).
Can someone explain the purpose of this language feature and where it may be used in real code (within reason), if ever?
It can be useful in the condition of while() loops:
while (update_thing(&foo), foo != 0) {
/* ... */
}
This avoids having to duplicate the update_thing() line while still maintaining the exit condition within the while() controlling expression, where you expect to find it. It also plays nicely with continue;.
It's also useful in writing complex macros that evaluate to a value.
The comma operator just separates expressions, so you can do multiple things instead of just one where only a single expression is required. It lets you do things like
(x) (y)
for (int i = 0, j = 0; ...; ++i, ++j)
Note that x is not the comma operator but y is.
You really don't have to think about it. It has some more arcane uses, but I don't believe they're ever absolutely necessary, so they're just curiosities.
Within for loop constructs it can make sense. Though I generally find them harder to read in this instance.
It's also really handy for angering your coworkers and people on SO.
bool guess() {
return true, false;
}
Playing Devil's Advocate, it might be reasonable to reverse the question:
Is it good practice to always use the semi-colon terminator?
Some points:
Replacing most semi-colons with commas would immediately make the structure of most C and C++ code clearer, and would eliminate some common errors.
This is more in the flavor of functional programming as opposed to imperative.
Javascript's 'automatic semicolon insertion' is one of its controversial syntactic features.
Whether this practice would increase 'common errors' is unknown, because nobody does this.
But of course if you did do this, you would likely annoy your fellow programmers, and become a pariah on SO.
Edit: See AndreyT's excellent 2009 answer to Uses of C comma operator. And Joel 2008 also talks a bit about the two parallel syntactic categories in C#/C/C++.
As a simple example, the structure of while (foo) a, b, c; is clear, but while (foo) a; b; c; is misleading in the absence of indentation or braces, or both.
Edit #2: As AndreyT states:
[The] C language (as well as C++) is historically a mix of two completely different programming styles, which one can refer to as "statement programming" and "expression programming".
But his assertion that "in practice statement programming produces much more readable code" [emphasis added] is patently false. Using his example, in your opinion, which of the following two lines is more readable?
a = rand(), ++a, b = rand(), c = a + b / 2, d = a < c - 5 ? a : b;
a = rand(); ++a; b = rand(); c = a + b / 2; if (a < c - 5) d = a; else d = b;
Answer: They are both unreadable. It is the white space which gives the readability--hurray for Python!. The first is shorter. But the semi-colon version does have more pixels of black space, or green space if you have a Hazeltine terminal--which may be the real issue here?
Everyone is saying that it is often used in a for loop, and that's true. However, I find it's more useful in the condition statement of the for loop. For example:
for (int x; x=get_x(), x!=sentinel; )
{
// use x
}
Rewriting this without the comma operator would require doing at least one of a few things that I'm not entirely comfortable with, such as declaring x outside the scope where it's used, or special casing the first call to get_x().
I'm also plotting ways I can utilize it with C++11 constexpr functions, since I guess they can only consist of single statements.
I think the only common example is the for loop:
for (int i = 0, j = 3; i < 10 ; ++i, ++j)
As mentioned in the c-faq:
Once in a while, you find yourself in a situation in which C expects a
single expression, but you have two things you want to say. The most
common (and in fact the only common) example is in a for loop,
specifically the first and third controlling expressions.
The only reasonable use I can think of is in the for construct
for (int count=0, bit=1; count<10; count=count+1, bit=bit<<1)
{
...
}
as it allows increment of multiple variables at the same time, still keeping the for construct structure (easy to read and understand for a trained eye).
In other cases I agree it's sort of a bad hack...
I also use the comma operator to glue together related operations:
void superclass::insert(item i) {
add(i), numInQ++, numLeft--;
}
The comma operator is useful for putting sequence in places where you can't insert a block of code. As pointed out this is handy in writing compact and readable loops. Additionally, it is useful in macro definitions. The following macro increments the number of warnings and if a boolean variable is set will also show the warning.
#define WARN if (++nwarnings, show_warnings) std::cerr
So that you may write (example 1):
if (warning_condition)
WARN << "some warning message.\n";
The comma operator is effectively a poor mans lambda function.
Though posted a few months after C++11 was ratified, I don't see any answers here pertaining to constexpr functions. This answer to a not-entirely-related question references a discussion on the comma operator and its usefulness in constant expressions, where the new constexpr keyword was mentioned specifically.
While C++14 did relax some of the restrictions on constexpr functions, it's still useful to note that the comma operator can grant you predictably ordered operations within a constexpr function, such as (from the aforementioned discussion):
template<typename T>
constexpr T my_array<T>::at(size_type n)
{
return (n < size() || throw "n too large"), (*this)[n];
}
Or even something like:
constexpr MyConstexprObject& operator+=(int value)
{
return (m_value += value), *this;
}
Whether this is useful is entirely up to the implementation, but these are just two quick examples of how the comma operator might be applied in a constexpr function.