Modulo Multiplication Function: Multiplying two integers under a modulus - c++

I came across this modulo multiplication function in a code for the miller-rabin primality test. This is supposed to eliminate the integer overflow that occurs when calculating ( a * b ) % m.
I need some help in understanding what is going on here. Why does this work? and what is the significance of that number literal 0x8000000000000000ULL?
unsigned long long mul_mod(unsigned long long a, unsigned long long b, unsigned long long m) {
unsigned long long d = 0, mp2 = m >> 1;
if (a >= m) a %= m;
if (b >= m) b %= m;
for (int i = 0; i < 64; i++)
{
d = (d > mp2) ? (d << 1) - m : d << 1;
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
a <<= 1;
}
return d;
}

This code, which currently appears on the modular arithmetic Wikipedia page, only works for arguments of up to 63 bits -- see bottom.
Overview
One way to compute an ordinary multiplication a * b is to add left-shifted copies of b -- one for each 1-bit in a. This is similar to how most of us did long multiplication in school, but simplified: Since we only ever need to "multiply" each copy of b by 1 or 0, all we need to do is either add the shifted copy of b (when the corresponding bit of a is 1) or do nothing (when it's 0).
This code does something similar. However, to avoid overflow (mostly; see below), instead of shifting each copy of b and then adding it to the total, it adds an unshifted copy of b to the total, and relies on later left-shifts performed on the total to shift it into the correct place. You can think of these shifts "acting on" all the summands added to the total so far. For example, the first loop iteration checks whether the highest bit of a, namely bit 63, is 1 (that's what a & 0x8000000000000000ULL does), and if so adds an unshifted copy of b to the total; by the time the loop completes, the previous line of code will have shifted the total d left 1 bit 63 more times.
The main advantage of doing it this way is that we are always adding two numbers (namely b and d) that we already know are less than m, so handling the modulo wraparound is cheap: We know that b + d < 2 * m, so to ensure that our total so far remains less than m, it suffices to check whether b + d < m, and if not, subtract m. If we were to use the shift-then-add approach instead, we would need a % modulo operation per bit, which is as expensive as division -- and usually much more expensive than subtraction.
One of the properties of modulo arithmetic is that, whenever we want to perform a sequence of arithmetic operations modulo some number m, performing them all in usual arithmetic and taking the remainder modulo m at the end always yields the same result as taking remainders modulo m for each intermediate result (provided no overflows occur).
Code
Before the first line of the loop body, we have the invariants d < m and b < m.
The line
d = (d > mp2) ? (d << 1) - m : d << 1;
is a careful way of shifting the total d left by 1 bit, while keeping it in the range 0 .. m and avoiding overflow. Instead of first shifting it and then testing whether the result is m or greater, we test whether it is currently strictly above RoundDown(m/2) -- because if so, after doubling, it will surely be strictly above 2 * RoundDown(m/2) >= m - 1, and so require a subtraction of m to get back in range. Note that even though the (d << 1) in (d << 1) - m may overflow and lose the top bit of d, this does no harm as it does not affect the lowest 64 bits of the subtraction result, which are the only ones we are interested in. (Also note that if d == m/2 exactly, we wind up with d == m afterward, which is slightly out of range -- but changing the test from d > mp2 to d >= mp2 to fix this would break the case where m is odd and d == RoundDown(m/2), so we have to live with this. It doesn't matter, because it will be fixed up below.)
Why not simply write d <<= 1; if (d >= m) d -= m; instead? Suppose that, in infinite-precision arithmetic, d << 1 >= m, so we should perform the subtraction -- but the high bit of d is on and the rest of d << 1 is less than m: In this case, the initial shift will lose the high bit and the if will fail to execute.
Restriction to inputs of 63 bits or fewer
The above edge case can only occur when d's high bit is on, which can only occur when m's high bit is also on (since we maintain the invariant d < m). So it looks like the code is taking pains to work correctly even with very high values of m. Unfortunately, it turns out that it can still overflow elsewhere, resulting in incorrect answers for some inputs that set the top bit. For example, when a = 3, b = 0x7FFFFFFFFFFFFFFFULL and m = 0xFFFFFFFFFFFFFFFFULL, the correct answer should be 0x7FFFFFFFFFFFFFFEULL, but the code will return 0x7FFFFFFFFFFFFFFDULL (an easy way to see the correct answer is to rerun with the values of a and b swapped). Specifically, this behaviour occurs whenever the line d += b overflows and leaves the truncated d less than m, causing a subtraction to be erroneously skipped.
Provided this behaviour is documented (as it is on the Wikipedia page), this is just a limitation, not a bug.
Removing the restriction
If we replace the lines
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
with
unsigned long long x = -(a >> 63) & b;
if (d >= m - x) d -= m;
d += x;
the code will work for all inputs, including those with top bits set. The cryptic first line is just a conditional-free (and thus usually faster) way of writing
unsigned long long x = (a & 0x8000000000000000ULL) ? b : 0;
The test d >= m - x operates on d before it has been modified -- it's like the old d >= m test, but b (when the top bit of a is on) or 0 (otherwise) has been subtracted from both sides. This tests whether d would be m or larger once x is added to it. We know that the RHS m - x never underflows, because the largest x can be is b and we have established that b < m at the top of the function.

Related

Correctness of multiplication with overflow detection

The following C++ template detects overflows from multiplying two unsigned integers.
template<typename UInt> UInt safe_multiply(UInt a, UInt b) {
UInt x = a * b; // x := ab mod n, for n := 2^#bits > 0
if (a != 0 && x / a != b)
cerr << "Overflow for " << a << " * " << b << "." << endl;
return x;
}
Can you give a proof that this algorithm detects every potential overflow, regardless of how many bits UInt uses?
The case
cannot result in overflows, so we can consider
.
It seems that the correctness proof boils down to leading
to a contradiction, since x / a actually means .
When assuming
, this leads to the straightforward consequence
thus
which contradicts n > 0.
So it remains to show
or there must be another way.
If the last equation is true, WolframAlpha fails to confirm that (also with exponents).
However, it asserts that the original assumptions have no integer solutions, so the algorithms seems to be correct indeed.
But it doesn't provide an explanation. So why is it correct?
I am looking for the smallest possible explanation that is still mathematically profound, ideally that it fits in a single-line comment. Maybe I am missing something trivial, or the problem is not as easy as it looks.
On a side note, I used Codecogs Equation Editor for the LaTeX markup images, which apparently looks bad in dark mode, so consider switching to light mode or, if you know, please tell me how to use different images depending on the client settings. It is just \bg{white} vs. \bg{black} as part of the image URLs.
To be clear, I'll use the multiplication and division symbols (*, /) mathematically.
Also, for convenience let's name the set N = {0, 1, ..., n - 1}.
Let's clear up what unsigned multiplication is:
Unsigned multiplication for some magnitude, n, is a modular n operation on unsigned-n inputs (inputs that are in N) that results in an unsigned-n output (ie. also in N).
In other words, the result of unsigned multiplication, x, is x = a*b (mod n), and, additionally, we know that x,a,b are in N.
It's important to be able to expand many modular formulas like so: x = a*b - k*n, where k is an integer - but in our case x,a,b are in N so this implies that k is in N.
Now, let's mathematically say what we're trying to prove:
Given positive integers, a,n, and non-negative integers x,b, where x,a,b are in N, and x = a*b (mod n), then a*b >= n (overflow) implies floor(x/a) != b.
Proof:
If overflow (a*b >= n) then x >= n - k*n = (1 - k)*n (for k in N),
As x < n then (1 - k)*n < n, so k > 0.
This means x <= a*b - n.
So, floor(x/a) <= floor([a*b - n]/a) = floor(a*b/a - n/a) = b - floor(n/a) <= b - 1,
Abbreviated: floor(x/a) <= b - 1
Therefore floor(x/a) != b
The multiplication gives either the mathematically correct result, or it is off by some multiple of 2^64. Since you check for a=0, the division always gives the correct result for its input. But in the case of overflow, the input is off by 2^64 or more, so the test will fail as you hoped.
The last bit is that unsigned operations don’t have undefined behaviour except for division by zero, so your code is fine.

Avoiding overflow working modulo p

As part of a university assignment, I have to implement in C scalar multiplication on an elliptic curve modulo p = 2^255 - 19. Since all computations are made modulo p, it seems enough to work with the primitive type (unsigned long).
However, if a and b are two integers modulo p, there is a risk of overflow computing a*b. I am not sure how to avoid that. Is the following code correct ?
long a = ...;
long b = ...;
long c = (a * b) % p;
Or should I rather cast a and b first ?
long a = ...;
long b = ...;
long long a1 = (long long) a;
long long b1 = (long long) b;
long c = (long) ((a1 * b1) % p);
I was also thinking or working with long long all along.
The whole operation(multiplication) is being done keeping in mind the type of the operands. You multiplied two long variables and the result if greater than what long variable can hold, it will overflow.
((a%p)*(b%p))%p this gives one protection that it wraps around p but what is being said in earlier case would still hold - (a%p)*(b%p) still can overflow. (considering that a,b is of type long).
If you store the values of long in long long no need to cast. But yes the result will now overflow when the multiplication yields the value greater than what long long can hold.
To give you a clarification:-
long a,b;
..
long long p = (a*b)%m;
This won't help. The multiplication when done is long arithmetic. Doesn't matter where we store the end result. It depends on the type of the operands.
Now look at this
long c = (long) ((a1 * b1) % p); here the result will be two long long multiplication and will overflow based on max value long long can hold but still there is a chance of overflow when you assign it to long.
If p is 255 byte you can't realize what you want using built in types long or long long types using 32 or 64 bit system. Down the line when we have 512 bit system this would surely be possible. Also one thing to note is when p=2255-19 then there is hardly any practicality involved in doing modular arithmetic with it.
If sizeof long is equal to sizeof long long as in ILP64 and LP64 then using long and long long would give you no result as such. But if sizeof long long is greater than sizeof long it is useful in saving the operands in long long to prevent overflow of the multiplication.
Also another way around is to write your own big integer library(multiple precision integer library) or use one which is already there(maybe like this). The idea revolves around the fact that the larger types are realized using something as simple as char and then doing operation on it. This is an implementation issue and there are many implementation around this same theme.
With a 255+ bit integer requirement, standard operations and the C library are insufficient.
Follows in the general algorithm to write your own modular multiplication.
myint mod(myint a, myint m);
myint add(myint a, myint b); // this may overflow
int cmp(myint a, myint b);
int isodd(myint a);
myint halve(myint a);
// (a+b)%mod
myint addmodmax(myint a, myint b, myint m) {
myint sum = add(a,b);
if (cmp(sum,a) < 0) {
sum = add(mod(add(sum, 1),m), mod(myint_MAX,m)); // These additions do not overflow
}
return mod(sum, m);
}
// (a*b)%mod
myint mulmodmax(myint a, myint b, myint m) {
myint prod = 0;
while (cmp(b,0) > 0) {
if (isodd(b)) {
prod = addmodmax(prod, a, m);
}
b = halve(b);
a = addmodmax(a, a, m);
}
return prod;
}
I recently came to this same problem.
First of all I'm going to assume you mean 32-bit integers (after reading your comments), but I think this applies to Big Integers as well (because doing a naive multiplication means doubling the word size and is going to be slow as well).
Option 1
We use the following property:
Proposition. a*b mod m = (a - m)*(b - m) mod m
Proof.
(a - m)*(b - m) mod m =
(a*b - (a+b)*m + m^2) mod m =
(a*b mod m - ((a+b) + m)*m mod m) mod m =
(a*b mod m) mod m = a*b mod m
q.e.d.
Moreover, if a,b approx m, then (a - m)*(b - m) mod m = (a - m)*(b - m). You will need to address the case for when a,b > m, however I think the validity of (m - a)*(m - b) mod m = a*b mod m is a corollary of the above Proposition; and of course don't do this when the difference is very big (small modulus, big a or b; or vice versa) or it will overflow.
Option 2
From Wikipedia
uint64_t mul_mod(uint64_t a, uint64_t b, uint64_t m)
{
uint64_t d = 0, mp2 = m >> 1;
int i;
if (a >= m) a %= m;
if (b >= m) b %= m;
for (i = 0; i < 64; ++i)
{
d = (d > mp2) ? (d << 1) - m : d << 1;
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
a <<= 1;
}
return d;
}
And also, assuming long double and 32 or 64 bit integers (not arbitrary precision) you can exploit the machine priority on most significant bits of different types:
On computer architectures where an extended precision format with at least 64 bits of mantissa is available (such as the long double type of most x86 C compilers), the following routine is faster than any algorithmic solution, by employing the trick that, by hardware, floating-point multiplication results in the most significant bits of the product kept, while integer multiplication results in the least significant bits kept
And do:
uint64_t mul_mod(uint64_t a, uint64_t b, uint64_t m)
{
long double x;
uint64_t c;
int64_t r;
if (a >= m) a %= m;
if (b >= m) b %= m;
x = a;
c = x * b / m;
r = (int64_t)(a * b - c * m) % (int64_t)m;
return r < 0 ? r + m : r;
}
These are guaranteed to not overflow.

Can XorShift return zero?

I've been reading about the XorShift PRNG especially the paper here
A guy here states that
The number lies in the range [1, 2**64). Note that it will NEVER be 0.
Looking at the code that makes sense:
uint64_t x;
uint64_t next(void) {
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
return x * UINT64_C(2685821657736338717);
}
If x would be zero than every next number would be zero too. But wouldn't that make it less useful? The usual use-pattern would be something like min + rand() % (max - min) or converting the 64 bits to 32 bits if you only need an int. But if 0 is never returned than that might be a serious problem. Also the bits are not 0 or 1 with the same probability as obviously 0 is missing so zeroes or slightly less likely. I even can't find any mention of that on Wikipedia so am I missing something?
So what is a good/appropriate way to generate random, equally distributed numbers from XorShift64* in a given range?
Short answer: No it cannot return zero.
According the Numeric Recipes "it produces a full period of 2^64-1 [...] the missing value is zero".
The essence is that those shift values have been chosen carefully to make very long sequences (full possible one w/o zero) and hence one can be sure that every number is produced. Zero is indeed the fixpoint of this generator, hence it produces 2 sequences: Zero and the other containing every other number.
So IMO for a sufficiently small range max-min it is enough to make a function (next() - 1) % (max - min) + min or even omitting the subtraction altogether as zero will be returned by the modulo.
If one wants better quality equal distribution one should use the 'usual' method by using next() as a base generator with a range of [1, 2^64)
I am nearly sure that there is an x, for which the xorshift operation returns 0.
Proof:
First, we have these equations:
a = x ^ (x >> 12);
b = a ^ (a << 25);
c = b ^ (b >> 27);
Substituting them:
b = (x ^ x >> 12) ^ ((x ^ x >> 12) << 25);
c = b ^ (b >> 27) = ((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) ^ (((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) >> 27);
As you can see, although c is a complex equation, it is perfectly abelian.
It means, you can express the bits of c as fully boolean expressions of the bits of x.
Thus, you can simply construct an equation system for the bits b0, b1, b2, ... so:
(Note: the coefficients are only examples, I didn't calculate them, but so would it look):
c0 = x1 ^ !x32 ^ x47 ...
c1 = x23 ^ x45 ^ !x61 ...
...
c63 = !x13 ^ ...
From that point, you have 64 equations and 64 unknowns. You can simply solve it with Gauss-elimination, you will always have a single unique solution.
Except some rare cases, i.e. if the determinant of the coefficients of the equation system is zero, but it is very unlikely in the size of such a big matrix.
Even if it happens, it would mean that you have an information loss in every iteration, i.e. you can't get all of the 2^64 possible values of x, only some of them.
Now consider the much more probable possibility, that the coefficient matrix is non-zero. In this case, for all the possible 2^64 values of x, you have all possible 2^64 values of c, and these are all different.
Thus, you can get zero.
Extension: actually you get zero for zero... sorry, the proof is more useful to show that it is not so simple as it seems for the first spot. The important part is that you can express the bits of c as a boolean function of the bits of x.
There is another problem with this random number generator. And this is that even if you somehow modify the function to not have such problem (for example, by adding 1 in every iteration):
You still can't guarantee that it won't get into a short loop *for any possible values of x. What if there is a 5 length loop for value 345234523452345? Can you prove for all possible initial values? I can't.
Actually, having a really pseudorandom iteration function, your system will likely loop after 2^32 iterations. It has a nearly trivial combinatoric reason, but "unfortunately this margin is small to contain it" ;-)
So:
If a 2^32 loop length is for your PRNG okay, then use a proven iteration function collected from somewhere on the net.
If it isn't, upgrade the bit length to at least 2^128. It will result a roughly 2^64 loop length which is not so bad.
If you still want a 64-bit output, then use 128-bit numeric internally, but return (x>>64) ^ (x&(2^64-1)) (i.e. xor-ing the upper and lower half of the internal state x).

Is ((a + (b & 255)) & 255) the same as ((a + b) & 255)?

I was browsing some C++ code, and found something like this:
(a + (b & 255)) & 255
The double AND annoyed me, so I thought of:
(a + b) & 255
(a and b are 32-bit unsigned integers)
I quickly wrote a test script (JS) to confirm my theory:
for (var i = 0; i < 100; i++) {
var a = Math.ceil(Math.random() * 0xFFFF),
b = Math.ceil(Math.random() * 0xFFFF);
var expr1 = (a + (b & 255)) & 255,
expr2 = (a + b) & 255;
if (expr1 != expr2) {
console.log("Numbers " + a + " and " + b + " mismatch!");
break;
}
}
While the script confirmed my hypothesis (both operations are equal), I still don't trust it, because 1) random and 2) I'm not a mathematician, I have no idea what am I doing.
Also, sorry for the Lisp-y title. Feel free to edit it.
They are the same. Here's a proof:
First note the identity (A + B) mod C = (A mod C + B mod C) mod C
Let's restate the problem by regarding a & 255 as standing in for a % 256. This is true since a is unsigned.
So (a + (b & 255)) & 255 is (a + (b % 256)) % 256
This is the same as (a % 256 + b % 256 % 256) % 256 (I've applied the identity stated above: note that mod and % are equivalent for unsigned types.)
This simplifies to (a % 256 + b % 256) % 256 which becomes (a + b) % 256 (reapplying the identity). You can then put the bitwise operator back to give
(a + b) & 255
completing the proof.
Lemma: a & 255 == a % 256 for unsigned a.
Unsigned a can be rewritten as m * 0x100 + b some unsigned m,b, 0 <= b < 0xff, 0 <= m <= 0xffffff. It follows from both definitions that a & 255 == b == a % 256.
Additionally, we need:
the distributive property: (a + b) mod n = [(a mod n) + (b mod n)] mod n
the definition of unsigned addition, mathematically: (a + b) ==> (a + b) % (2 ^ 32)
Thus:
(a + (b & 255)) & 255 = ((a + (b & 255)) % (2^32)) & 255 // def'n of addition
= ((a + (b % 256)) % (2^32)) % 256 // lemma
= (a + (b % 256)) % 256 // because 256 divides (2^32)
= ((a % 256) + (b % 256 % 256)) % 256 // Distributive
= ((a % 256) + (b % 256)) % 256 // a mod n mod n = a mod n
= (a + b) % 256 // Distributive again
= (a + b) & 255 // lemma
So yes, it is true. For 32-bit unsigned integers.
What about other integer types?
For 64-bit unsigned integers, all of the above applies just as well, just substituting 2^64 for 2^32.
For 8- and 16-bit unsigned integers, addition involves promotion to int. This int will definitely neither overflow or be negative in any of these operations, so they all remain valid.
For signed integers, if either a+b or a+(b&255) overflow, it's undefined behavior. So the equality can't hold — there are cases where (a+b)&255 is undefined behavior but (a+(b&255))&255 isn't.
In positional addition, subtraction and multiplication of unsigned numbers to produce unsigned results, more significant digits of the input don't affect less-significant digits of the result. This applies to binary arithmetic as much as it does to decimal arithmetic. It also applies to "twos complement" signed arithmetic, but not to sign-magnitude signed arithmetic.
However we have to be careful when taking rules from binary arithmetic and applying them to C (I beleive C++ has the same rules as C on this stuff but i'm not 100% sure) because C arithmetic has some arcane rules that can trip us up. Unsigned arithmetic in C follows simple binary wraparound rules but signed arithmetic overflow is undefined behaviour. Worse under some circumstances C will automatically "promote" an unsigned type to (signed) int.
Undefined behaviour in C can be especially insiduous. A dumb compiler (or a compiler on a low optimisation level) is likely to do what you expect based on your understanding of binary arithmetic while an optimising compiler may break your code in strange ways.
So getting back to the formula in the question the equivilence depends on the operand types.
If they are unsigned integers whose size is greater than or equal to the size of int then the overflow behaviour of the addition operator is well-defined as simple binary wraparound. Whether or not we mask off the high 24 bits of one operand before the addition operation has no impact on the low bits of the result.
If they are unsigned integers whose size is less than int then they will be promoted to (signed) int. Overflow of signed integers is undefined behaviour but at least on every platform I have encountered the difference in size between different integer types is large enough that a single addition of two promoted values will not cause overflow. So again we can fall back to the simply binary arithmetic argument to deem the statements equivalent.
If they are signed integers whose size is less than int then again overflow can't happen and on twos-complement implementations we can rely on the standard binary arithmetic argument to say they are equivilent. On sign-magnitude or ones complement implementations they would not be equivilent.
OTOH if a and b were signed integers whose size was greater than or equal to the size of int then even on twos complement implementations there are cases where one statement would be well-defined while the other would be undefined behaviour.
Yes, (a + b) & 255 is fine.
Remember addition in school? You add numbers digit by digit, and add a carry value to the next column of digits. There is no way for a later (more significant) column of digits to influence an already processed column. Because of this, it does not make a difference if you zero-out the digits only in the result, or also first in an argument.
The above is not always true, the C++ standard allows an implementation that would break this.
Such a Deathstation 9000 :-) would have to use a 33-bit int, if the OP meant unsigned short with "32-bit unsigned integers". If unsigned int was meant, the DS9K would have to use a 32-bit int, and a 32-bit unsigned int with a padding bit. (The unsigned integers are required to have the same size as their signed counterparts as per §3.9.1/3, and padding bits are allowed in §3.9.1/1.) Other combinations of sizes and padding bits would work too.
As far as I can tell, this is the only way to break it, because:
The integer representation must use a "purely binary" encoding scheme (§3.9.1/7 and the footnote), all bits except padding bits and the sign bit must contribute a value of 2n
int promotion is allowed only if int can represent all the values of the source type (§4.5/1), so int must have at least 32 bits contributing to the value, plus a sign bit.
the int can not have more value bits (not counting the sign bit) than 32, because else an addition can not overflow.
You already have the smart answer: unsigned arithmetic is modulo arithmetic and therefore the results will hold, you can prove it mathematically...
One cool thing about computers, though, is that computers are fast. Indeed, they are so fast that enumerating all valid combinations of 32 bits is possible in a reasonable amount of time (don't try with 64 bits).
So, in your case, I personally like to just throw it at a computer; it takes me less time to convince myself that the program is correct than it takes to convince myself than the mathematical proof is correct and that I didn't oversee a detail in the specification1:
#include <iostream>
#include <limits>
int main() {
std::uint64_t const MAX = std::uint64_t(1) << 32;
for (std::uint64_t i = 0; i < MAX; ++i) {
for (std::uint64_t j = 0; j < MAX; ++j) {
std::uint32_t const a = static_cast<std::uint32_t>(i);
std::uint32_t const b = static_cast<std::uint32_t>(j);
auto const champion = (a + (b & 255)) & 255;
auto const challenger = (a + b) & 255;
if (champion == challenger) { continue; }
std::cout << "a: " << a << ", b: " << b << ", champion: " << champion << ", challenger: " << challenger << "\n";
return 1;
}
}
std::cout << "Equality holds\n";
return 0;
}
This enumerates through all possible values of a and b in the 32-bits space and checks whether the equality holds, or not. If it does not, it prints the case which didn't work, which you can use as a sanity check.
And, according to Clang: Equality holds.
Furthermore, given that the arithmetic rules are bit-width agnostic (above int bit-width), this equality will hold for any unsigned integer type of 32 bits or more, including 64 bits and 128 bits.
Note: How can a compiler enumerates all 64-bits patterns in a reasonable time frame? It cannot. The loops were optimized out. Otherwise we would all have died before execution terminated.
I initially only proved it for 16-bits unsigned integers; unfortunately C++ is an insane language where small integers (smaller bitwidths than int) are first converted to int.
#include <iostream>
int main() {
unsigned const MAX = 65536;
for (unsigned i = 0; i < MAX; ++i) {
for (unsigned j = 0; j < MAX; ++j) {
std::uint16_t const a = static_cast<std::uint16_t>(i);
std::uint16_t const b = static_cast<std::uint16_t>(j);
auto const champion = (a + (b & 255)) & 255;
auto const challenger = (a + b) & 255;
if (champion == challenger) { continue; }
std::cout << "a: " << a << ", b: " << b << ", champion: "
<< champion << ", challenger: " << challenger << "\n";
return 1;
}
}
std::cout << "Equality holds\n";
return 0;
}
And once again, according to Clang: Equality holds.
Well, there you go :)
1 Of course, if a program ever inadvertently triggers Undefined Behavior, it would not prove much.
The quick answer is: both expressions are equivalent
since a and b are 32-bit unsigned integers, the result is the same even in case of overflow. unsigned arithmetic guarantees this: a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
The long answer is: there are no known platforms where these expressions would differ, but the Standard does not guarantee it, because of the rules of integral promotion.
If the type of a and b (unsigned 32 bit integers) has a higher rank than int, the computation is performed as unsigned, modulo 232, and it yields the same defined result for both expressions for all values of a and b.
Conversely, If the type of a and b is smaller than int, both are promoted to int and the computation is performed using signed arithmetic, where overflow invokes undefined behavior.
If int has at least 33 value bits, neither of the above expressions can overflow, so the result is perfectly defined and has the same value for both expressions.
If int has exactly 32 value bits, the computation can overflow for both expressions, for example values a=0xFFFFFFFF and b=1 would cause an overflow in both expressions. In order to avoid this, you would need to write ((a & 255) + (b & 255)) & 255.
The good news is there are no such platforms1.
1 More precisely, no such real platform exists, but one could configure a DS9K to exhibit such behavior and still conform to the C Standard.
Identical assuming no overflow. Neither version is truly immune to overflow but the double and version is more resistant to it. I am not aware of a system where an overflow in this case is a problem but I can see the author doing this in case there is one.
Yes you can prove it with arithmetic, but there is a more intuitive answer.
When adding, every bit only influences those more significant than itself; never those less significant.
Therefore, whatever you do to the higher bits before the addition won't change the result, as long as you only keep bits less significant than the lowest bit modified.
The proof is trivial and left as an exercise for the reader
But to actually legitimize this as an answer, your first line of code says take the last 8 bits of b** (all higher bits of b set to zero) and add this to a and then take only the last 8 bits of the result setting all higher bits to zero.
The second line says add a and b and take the last 8 bits with all higher bits zero.
Only the last 8 bits are significant in the result. Therefore only the last 8 bits are significant in the input(s).
** last 8 bits = 8 LSB
Also it is interesting to note that the output would be equivalent to
char a = something;
char b = something;
return (unsigned int)(a + b);
As above, only the 8 LSB are significant, but the result is an unsigned int with all other bits zero. The a + b will overflow, producing the expected result.

Modular Exponentiation over a Power of 2

So I've been doing some work recently with the modpow function. One of the forms I needed was Modular Exponentiation when the Modulus is a Power of 2. So I got the code up and running. Great, no problems. Then I read that one trick you can make to get it faster is, instead of using the regular exponent, takes it's modulus over the totient of the modulus.
Now when the modulus is a power of two, the answer is simply the power of 2 less than the current one. Well, that's simple enough. So I coded it, and it worked..... sometimes.
For some reason there are some values that aren't working, and I just can't figure out what it is.
uint32 modpow2x(uint32 B, uint32 X, uint32 M)
{
uint32 D;
M--;
B &= M;
X &= (M >> 1);
D = 1;
if ((X & 1) == 1)
{
D = B;
}
while ((X >>= 1) != 0)
{
B = (B * B) & M;
if ((X & 1) == 1)
{
D = (D * B) & M;
}
}
return D;
}
And this is one set of numbers that it doesn't work for.
Base = 593803430
Exponent = 3448538912
Modulus = 8
And no, there is no check in this function to determine if the Modulus is a power of 2. The reason is that this is an internal function and I already know that only Powers of 2 will be passed to it. However, I have already double checked to make sure that no non-powers of 2 are getting though.
Thanks for any help you guys can give!
It's true that if x is relatively prime to n (x and n have no common factors), then x^a = x^(phi(a)) (mod n), where phi is Euler's totient function. That's because then x belongs to the multiplicative group of (Z/nZ), which has order phi(a).
But, for x not relatively prime to n, this is no longer true. In your example, the base does have a common factor with your modulus, namely 2. So the trick will not work here. If you wanted to, though, you could write some extra code to deal with this case -- maybe find the largest power of 2 that x is divisible by, say 2^k. Then divide x by 2^k, run your original code, shift its output left by k*e, where e is your exponent, and reduce modulo M. Of course, if k isn't zero, this would usually result in an answer of zero.