Lets suppose that a = 2^k
is there any difference in terms of performance or correctness between int c = b%a and int c = b & (a-1)?
For two’s complement int and a a power of two, b % a equals b & a-1 if and only if b is non-negative or a multiple of a.
As a consequence, a compiler can replace b % a by b & a-1 only if it knows b is non-negative or knows it is a multiple of a. (In the latter case, it should replace the expression with zero.) On typical current processors, an AND and a subtract instruction will be at least as fast, and often faster, than a remainder (divide) instruction, so b & a-1 is preferred, and the programmer seeking performance should use it if they know the conditions are satisfied, unless they are sure the compiler will generate an AND for b % a or they also want the quotient b/a. (If the quotient is desired, the compiler must generate a divide instruction, and processors typically provide the remainder along with the quotient.)
Of course, the compiler can be assured that b is non-negative by making it an unsigned int. Ensuring the compiler knows that a is a power of two is more complicated, unless a is a constant.
Related
Is there any c++ integer constant that is relatively prime with 1? There isnt one in the set of mathematical integers, but I was wondering if there was such a number for the c++ language.
NaN or INT_MAX or some other #define ed constants maybe?
No. You did not use the tag "language-lawyer", so I won't refer to the standard but will instead just look at what typical compilers do. Using Compiler Explorer, I see that GCC 11.2 compiles the following C++ function to assembly that always just returns 0:
int foo(int x) {
return x % 1;
}
Also Clang 13.0.1 does that too.
So the compiler developers have thought about this and determined that there are no integers that satisfy the condition you are looking for.
% is defined in expr.mul#4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second.
If the second operand of / or % is zero the behavior is undefined.
For integral operands the / operator yields the algebraic quotient with any fractional part discarded; if the quotient a/b is representable in the type of the result, (a/b)*b + a%b is equal to a; otherwise, the behavior of both a/b and a%b is undefined.
Your question can be phrased as: Are there integers a and k!=0 such that (a/1)*1 + k == a. However, only 0 can be added to (a/1)*1 == a to get a. In different words, there is no integer a such that (a/1)*1 is not a.
No, a%1 == 0 for any integer a.
https://en.cppreference.com/w/cpp/algorithm/reduce
It says that the behavior of an operation is not defined if the operation is not commutative, but why? We just divide the array into blocks and then merge the result. Is it only necessary to have associativity?
std::reduce requires both associativity and commutativity. Associativity is clearly needed for a parallel algorithm, since you want to perform the calculation on separate chunks and then combine them.
As for commutativity: According to a reddit post by MSVC STL developer Billy O'Neal, this is required in order to allow vectorization to SIMD instructions:
Commutativity is also necessary to enable vectorization, since the code you want for reduce to come out as something like:
vecRegister = load_contiguous(first);
while (a vector register sized chunk is left) {
first += packSize;
vecRegister = add_packed(load_contiguous(first), vecRegister);
}
// combine vecRegister's packed components
etc., which given ints and SSE registers and a * b * c * d * e * f * g * h gives something like (a * e) * (b * f) * (c * g) * (d * h).
Most other languages aren't doing explicit things to make vectorizing their reduction possible. And nothing says we can't add a noncommutative_reduce or something like that in the future if someone comes up with a compelling use case.
The behavior is actually non-deterministic if the operation between the operands is not commutative. "non-deterministic" is not the same as "undefined". Floating point math is not commutative, for example. This is one reason why a call to std::reduce may not be deterministic, because the binary function is applied in an unspecified order.
Refer to this note in the standard:
Note: The difference between reduce and accumulate is that reduce applies binary_op in an
unspecified order, which yields a nondeterministic result for non-associative or non-commutative
binary_op such as floating-point addition. —end note ]
The standard defines the generalized sum as follows: numeric.defns
Define GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aN) as follows:
a1 when N is 1, otherwise
op(GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aK),
op(GENERALIZED_NONCOMMUTATIVE_SUM(op, aM, ..., aN)) for any K where 1
Define GENERALIZED_SUM(op, a1, ..., aN) as GENERALIZED_NONCOMMUTATIVE_SUM(op, b1, ..., bN), where b1, ..., bN may be any permutation of a1, ..., aN.
So, the order of summation as well as the order of operands is unspecified. So if the binary operation is not commutative or not associative, the result is unspecified.
That is also explicitly stated here.
Regarding why: It gives the library vendors more freedom, so they may or may not implement it better. As an example where the implementation can benefit from commutativity. Consider the sum a+b+c+d+e, we first calculate a+b and c+d in parallel. Now a+b returns before c+d does (as it can happen, because it is done in parallel). Instead of waiting for the return value of c+d we now can directly compute (a+b)+e and then add this result to the result of c+d. So in the end, we computed ((a+b)+e)+(c+d), which is a rearrangement of a+b+c+d+e.
Why does std::reduce need commutativity?
For speed.
If the operator is commutative, then you can rearrange the order of operations without affecting the results.
And if you can rearrange the order of operations, you can have different threads, or processes, or hardware accelerators or what-not working independently on some of the operations to be performed, and not care about the order in which they complete their partial sums, nor their internal ordering of operations, and then finally add up the partial sums whichever way is convenient.
Is it guaranteed that (2 ^ 32) == 34?
In C++20, yes.
Here's how [expr.xor] defines it:
Given the coefficients xi and yi of the base-2 representation ([basic.fundamental]) of the converted operands x and y, the coefficient ri of the base-2 representation of the result r is 1 if either (but not both) of xi and yi are 1, and 0 otherwise.
And [basic.fundamental] covers what a base-2 representation means:
Each value x of an unsigned integer type with width N has a unique representation x = x020 + x121 + … + xN-12N-1, where each coefficient xi is either 0 or 1; this is called the base-2 representation of x. The base-2 representation of a value of signed integer type is the base-2 representation of the congruent value of the corresponding unsigned integer type.
In short, it doesn't really matter how it's done "physically": the operation must satisfy the more abstract, arithmetic notion of base-2 (whether this matches the bits in memory or not; of course in reality it will) and so XOR is entirely well-defined.
However, this was not always the case. The wording was introduced by P1236R1, to make it crystal clear how integer operations behave and to abstract away the kind of wooly notion of a "bit".
In C++11, all we knew is that signed integers must follow "A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position" (footnote 49; be advised that this is non-normative).
This gets us most of the way there, actually, but the specific wording in [expr.xor] wasn't there: all we knew is that "the result is the bitwise exclusive OR function of the operands". At this juncture, whether that refers to a sufficiently commonly understood operation is really up to you. You'll be hard-pressed to find a dissenting opinion on what this operation was permitted to do, mind you.
So:
In C++11, YMMV.
yes
Or at least for the unedited version of the question when it was written as:
2 ^ 32 == 34
Given that the relational operator == has a higher precedence than bitwise XOR ^, the expression is evaluated as:
2 ^ (32 == 34)
that is: 2 ^ 0
which is by definition 2 and thus true
No matter how the values are represented internally, the result of 2 ^ 32 is 34. The ^ operator means a binary XOR and the result you must get if you do that operation correctly is independent of how you do the operation.
The same is true of 2 + 32. You can represent 2 and 32 in binary, in decimal, or any other way you want, but the result you get had better be the way you represent 34, whatever that is.
I don't know if the standard formally defines exclusive or, but it's a well known operation with a consistent definition. The one thing that is explicitly left out of the standard is the mapping of integer numbers to bits. Your assertion would hold for the commonly used twos-complement representation and the uncommon ones-complement.
I have the following function:
char f1( int a, unsigned b ) { return abs(a) <= b; }
For execution speed, I want to rewrite it as follows:
char f2( int a, unsigned b ) { return (unsigned)(a+b) <= 2*b; } // redundant cast
Or alternatively with this signature that could have subtle implications even for non-negative b:
char f3( int a, int b ) { return (unsigned)(a+b) <= 2*b; }
Both of these alternatives work under a simple test on one platform, but I need it to portable. Assuming non-negative b and no risk of overflow, is this a valid optimization for typical hardware and C compilers? Is it also valid for C++?
Note: As C++ on gcc 4.8 x86_64 with -O3, f1() uses 6 machine instructions and f2() uses 4. The instructions for f3() are identical to those for f2(). Also of interest: if b is given as a literal, both functions compile to 3 instructions that directly map to the operations specified in f2().
Starting with the original code with signature
char f2( int a, unsigned b );
this contains the expression
a + b
Since one of these operands has a signed and the other an (corresponding) unsigned integer type (thus they have the same "integer conversion rank"), then - following the "Usual arithmetic conversions" (§ 6.3.1.8) - the operand with signed integer type is converted to the unsigned type of the other operand.
Conversion to an unsigned integer type is well defined, even if the value in question cannot be represented by the new type:
[..] if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60
§ 6.3.1.3/2
Footnote 60 just says that the described arithmetic works with the mathematical value, not the typed one.
Now, with the updated code
char f2_updated( int a, int b ); // called f3 in the question
things would look different. But since b is assumed to be non-negative, and assuming that INT_MAX <= UINT_MAX you can convert b to an unsigned without fearing it to have a different mathematical value afterwards. Thus you could write
char f2_updated( int a, int b ) {
return f2(a, (unsigned)b); // cast unnecessary but to make it clear
}
Looking again at f2 the expression 2*b further limits the allowed range of b to be not larger than UINT_MAX/2 (otherwise the mathematical result would be wrong).
So as long as you stay within these bounds, every thing is fine.
Note: Unsigned types do not overflow, they "wrap" according to modular arithmetic.
Quotes from N1570 (a C11 working draft)
A final remark:
IMO the only really reasonable choice to write this function is as
#include <stdbool.h>
#include <assert.h>
bool abs_bounded(int value, unsigned bound) {
assert(bound <= (UINT_MAX / 2));
/* NOTE: Casting to unsigned makes the implicit conversion that
otherwise would happen explicit. */
return ((unsigned)value + bound) <= (2 * bound);
}
Using a signed type for the bound does not make much sense, because the absolute of a value cannot be less than a negative number. abs_bounded(value, something_negative) would be always false. If there's the possibility of a negative bound, then I'd catch this outside of this function (otherwise it does "too much"), like:
int some_bound;
// ...
if ((some_bound >= 0) && abs_bounded(my_value, some_bound)) {
// yeeeha
}
As OP wants fast and portable code (and b is positive), it first makes sense to code safely:
// return abs(a) <= b;
inline bool f1_safe(int a, unsigned b ) {
return (a >= 0 && a <= b) || (a < 0 && 0u - a <= b);
}
This works for all a,b (assuming UINT_MAX > INT_MAX). Next, compare alternatives using an optimized compile (let the compiler do what it does best).
The following slight variation on OP's code will work in C/C++ but risks portability issues unless "Assuming non-negative b and no risk of overflow" can be certain on all target machines.
bool f2(int a, unsigned b) { return a+b <= b*2; }
In the end, OP goal of fast and portable code may find code the works optimally for the select platform, but not with others - such is micro-optimization.
To determine if the 2 expressions are equivalent for your purpose, you must study the domain of definition:
abs(a) <= b is defined for all values of int a and unsigned b, with just one special case for a = INT_MIN;. On 2s complement architectures, abs(INT_MIN) is not defined but most likely evaluates to INT_MIN, which converted to unsigned as required for the <= with an unsigned value, yields the correct value.
(unsigned)(a+b) <= 2*b may produce a different result for b > UINT_MAX/2. For example, it will evaluate to false for a = 1 and b = UINT_MAX/2+1. There might be more cases where you alternate formula gives an incorrect result.
EDIT: OK, the question was edited... and b is now an int.
Note that a+b invokes undefined behavior in case of overflow and the same for 2*b. So you make the assumption that neither a+b nor 2*b overflow. Furthermore, if b is negative, you little trick does not work.
If a is in the range -INT_MAX/2..INT_MAX/2 and b in the range 0..INT_MAX/2, it seems to function as expected. The behavior is identical in C and C++.
Whether it is an optimization depends completely on the compiler, command line options, hardware capabilities, surrounding code, inlining, etc. You already address this part and tell us that you shave one or two instructions... Just remember that this kind of micro-optimization is not absolute. Even counting instructions does not necessarily help find the best performance. Did you perform some benchmarks to measure if this optimization is worthwhile? Is the difference even measurable?
Micro-optimizing such a piece of code is self-defeating: it makes the code less readable and potentially incorrect. b might not be negative in the current version, but if the next maintainer changes that, he/she might not see the potential implications.
Yes, this is portable to compliant platforms. The conversion from signed to unsigned is well defined:
Conversion between signed integer and unsigned integer
int to unsigned int conversion
Signed to unsigned conversion in C - is it always safe?
The description in the C spec is a bit contrived:
if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The C++ spec addresses the same conversion in a more sensible way:
In a two's complement representation, this conversion is conceptual
and there is no change in the bit pattern
In the question, f2() and f3() achieve the same results in a slightly different way.
In f2() the presence of the unsigned operand causes a conversion of the signed operand as required here for C++. The unsigned addition may-or-may-not then result in a wrap-around past zero, which is also well defined [citation needed].
In f3() the addition occurs in signed representation with no trickiness, and then the result is (explicitly) converted to unsigned. So this is slightly simpler than f2() (and also more clear).
In both cases, the you end up with the same unsigned representation of the sum, which can then be compared (as unsigned) to 2*b. And the trick of treating a signed value as an unsigned type allows you to check a two-sided range with only a single comparison. Note also that this is a bit more flexible than using the abs() function since the trick doesn't require that the range be centered around zero.
Commentary on the "usual arithmetic conversions"
I think this question demonstrated that using unsigned types is generally a bad idea. Look at the confusion it caused here.
It can be tempting to use unsigned for documentation purposes (or to take advantage of the shifted value range), but due to the conversion rules, this may tend to be a mistake. In my opinion, the "usual arithmetic conversions" are not sensible if you assume that arithmetic is more likely to involve negative values than to overflow signed values.
I asked this followup question to clarify the point: mixed-sign integer math depends on variable size. One new thing that I have learned is that mixed-sign operations are not generally portable because the conversion type will depend on the size relative to that of int.
In summary: Using type declarations or casts to perform unsigned operations is a low-level coding style that should be approached with the requisite caution.
I've just read that order of evaluation and precedence of operators are different but related concepts in C++. But I'm still unclear how those are different but related?.
int x = c + a * b; // 31
int y = (c + a) * b; // 36
What does the above statements has to with order of evaluation. e.g. when I say (c + a) am I changing the order of evaluation of expression by changing its precedence?
The important part about order of evaluation is whether any of the components have side effects.
Suppose you have this:
int i = c() + a() * b();
Where a and b have side effects:
int global = 1;
int a() {
return global++;
}
int b() {
return ++global;
}
int c() {
return global * 2;
}
The compiler can choose what order to call a(), b() and c() and then insert the results into the expression. At that point, precedence takes over and decides what order to apply the + and * operators.
In this example the most likely outcomes are either
The compiler will evaluate c() first, followed by a() and then b(), resulting in i = 2 + 1 * 3 = 5
The compiler will evaluate b() first, followed by a() and then c(), resulting in i = 6 + 2 * 2 = 10
But the compiler is free to choose whatever order it wants.
The short story is that precedence tells you the order in which operators are applied to arguments (* before +), whereas order of evaluation tells you in what order the arguments are resolved (a(), b(), c()). This is why they are "different but related".
"Order of evaluation" refers to when different subexpressions within the same expression are evaulated relative to each other.
For example in
3 * f(x) + 2 * g(x, y)
you have the usual precedence rules between multiplication and addition. But we have an order of evaluation question: will the first multiplication happen before the second or the second before the first? It matters because if f() has a side effect that changes y, the result of the whole expression will be different depending on the order of operations.
In your specific example, this order of evaluation scenario (in which the resulting value depends on order) does not arise.
As long as we are talking about built-in operators: no, you are not changing the order of evaluation by using the (). You have no control over the order of evaluation. In fact, there's no "order of evaluation" here at all.
The compiler is allowed to evaluate this expression in any way it desires, as long as the result is correct. It is not even required to use addition and multiplication operations to evaluate these expressions. The addition and multiplication only exist in the text of your program. The compiler is free to totally and completely ignore these specific operations. On some hardware platform, such expressions might be evaluated by a single atomic machine operation. For this reason, the notion of "order of evaluation" does not make any sense here. There's nothing there that you can apply the concept of "order" to.
The only thing you are changing by using () is the mathematical meaning of the expression. Let's say a, b and c are all 2. The a + b * c must evaluate to 6, while (a + b) * c must evaluate to to 8. That's it. This is the only thing that is guaranteed to you: that the results will be correct. How these results are obtained is totally unknown. The compiler might use absolutely anything, any method and any "order of evaluation" as long as the results are correct.
For another example, if you have two such expressions in your program following each other
int x = c + a * b;
int y = (c + a) * b;
the compiler is free to evaluate them as
int x = c + a * b;
int y = c * b + x - c;
which will also produce the correct result (assuming no overflow-related problems). In which case the actual evaluation schedule will not even remotely look like something that you wrote in your source code.
To put it short, to assume that the actual evaluation will have any significant resemblance to what you wrote in the source code of your program is naive at best. Despite popular belief, built-in operators are not generally translated in their machine "counterparts".
The above applies to built-in operators, again. Once we start dealing with overloaded operators, things change drastically. Overloaded operators are indeed evaluated in full accordance with the semantic structure of the expression. There's some freedom there even with overloaded operators, but it is not as unrestricted as in case of built-in operators.
The answer is may or may not.
The evaluation order of a, b and c depends on the compiler's interpretation of this formula.
Consider the below example:
#include <limits.h>
#include <stdio.h>
int main(void)
{
double a = 1 + UINT_MAX + 1.0;
double b = 1 + 1.0 + UINT_MAX;
printf("a=%g\n", a);
printf("b=%g\n", b);
return 0;
}
Here in terms of math as we know it, a and b are to be computed equally and must have the same result. But is that true in the C(++) world? See the program's output.
I want to introduce a link worth reading with regard to this question.
The Rules 3 and 4 mention about sequence point, another concept worth remembering.