Are float inequalities guaranteed to be consistent

Are float inequalities guaranteed to be consistent - c++

Assume a, b, c, and d are declared double (or float). Are the following expressions always true?
! ( (a >= b) && (c <= d) ) || ( (a-c) >= (b-d) )
! ( (a > b) && (c <= d) ) || ( (a-c) > (b-d) )
! ( (a >= b) && (c < d) ) || ( (a-c) > (b-d) )
Is there any guaranty from the IEEE 754 or the current C or C++ standard? And will any compiler optimize this as simply true at compile time? I am interested mostly in normal values, not so much in subnormal or special values.
Seems to me this should depend on round-off errors during subtraction mostly.

For the 3rd to produce false it should be sufficient to take large equal a and b and small unequal c and d, e.g. a=1e30, b=1e30, c=1e-31, d=1e-30.
EDIT: Ok, for the 2nd to produce false, by analogy to the 3rd, it should be sufficient to take small unequal a and b and large equal c and d, e.g. a=1e-30, b=1e-31, c=1e30, d = 1e30.
No idea about a counterexample for the 1st expression...

Serge Rogatch gave counterexamples to your second and third expressions.
The first one, !(a >= b && c <= d) || a-c >= b-d, is always true in IEEE 754 arithmetic, if a, b, c, and d must all be finite. Subtraction of finite numbers cannot produce a NaN. Thus a counterexample must satisfy a >= b && c <= d && a-c < b-d. However, a >= b implies that a-c >= b-c, whatever c is, and c <= d implies that b-c >= b-d, whatever b is. Transitivity of >= takes care of the rest.
You can take a = c = 1.0/0.0 and take arbitrary choices of b and d for a counterexample if you relax the condition that a, b, c, and d must all be finite. All counterexamples are of essentially this form.

Related

Modulo Multiplication Function: Multiplying two integers under a modulus

I came across this modulo multiplication function in a code for the miller-rabin primality test. This is supposed to eliminate the integer overflow that occurs when calculating ( a * b ) % m.
I need some help in understanding what is going on here. Why does this work? and what is the significance of that number literal 0x8000000000000000ULL?
unsigned long long mul_mod(unsigned long long a, unsigned long long b, unsigned long long m) {
unsigned long long d = 0, mp2 = m >> 1;
if (a >= m) a %= m;
if (b >= m) b %= m;
for (int i = 0; i < 64; i++)
{
d = (d > mp2) ? (d << 1) - m : d << 1;
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
a <<= 1;
}
return d;
}

This code, which currently appears on the modular arithmetic Wikipedia page, only works for arguments of up to 63 bits -- see bottom.
Overview
One way to compute an ordinary multiplication a * b is to add left-shifted copies of b -- one for each 1-bit in a. This is similar to how most of us did long multiplication in school, but simplified: Since we only ever need to "multiply" each copy of b by 1 or 0, all we need to do is either add the shifted copy of b (when the corresponding bit of a is 1) or do nothing (when it's 0).
This code does something similar. However, to avoid overflow (mostly; see below), instead of shifting each copy of b and then adding it to the total, it adds an unshifted copy of b to the total, and relies on later left-shifts performed on the total to shift it into the correct place. You can think of these shifts "acting on" all the summands added to the total so far. For example, the first loop iteration checks whether the highest bit of a, namely bit 63, is 1 (that's what a & 0x8000000000000000ULL does), and if so adds an unshifted copy of b to the total; by the time the loop completes, the previous line of code will have shifted the total d left 1 bit 63 more times.
The main advantage of doing it this way is that we are always adding two numbers (namely b and d) that we already know are less than m, so handling the modulo wraparound is cheap: We know that b + d < 2 * m, so to ensure that our total so far remains less than m, it suffices to check whether b + d < m, and if not, subtract m. If we were to use the shift-then-add approach instead, we would need a % modulo operation per bit, which is as expensive as division -- and usually much more expensive than subtraction.
One of the properties of modulo arithmetic is that, whenever we want to perform a sequence of arithmetic operations modulo some number m, performing them all in usual arithmetic and taking the remainder modulo m at the end always yields the same result as taking remainders modulo m for each intermediate result (provided no overflows occur).
Code
Before the first line of the loop body, we have the invariants d < m and b < m.
The line
d = (d > mp2) ? (d << 1) - m : d << 1;
is a careful way of shifting the total d left by 1 bit, while keeping it in the range 0 .. m and avoiding overflow. Instead of first shifting it and then testing whether the result is m or greater, we test whether it is currently strictly above RoundDown(m/2) -- because if so, after doubling, it will surely be strictly above 2 * RoundDown(m/2) >= m - 1, and so require a subtraction of m to get back in range. Note that even though the (d << 1) in (d << 1) - m may overflow and lose the top bit of d, this does no harm as it does not affect the lowest 64 bits of the subtraction result, which are the only ones we are interested in. (Also note that if d == m/2 exactly, we wind up with d == m afterward, which is slightly out of range -- but changing the test from d > mp2 to d >= mp2 to fix this would break the case where m is odd and d == RoundDown(m/2), so we have to live with this. It doesn't matter, because it will be fixed up below.)
Why not simply write d <<= 1; if (d >= m) d -= m; instead? Suppose that, in infinite-precision arithmetic, d << 1 >= m, so we should perform the subtraction -- but the high bit of d is on and the rest of d << 1 is less than m: In this case, the initial shift will lose the high bit and the if will fail to execute.
Restriction to inputs of 63 bits or fewer
The above edge case can only occur when d's high bit is on, which can only occur when m's high bit is also on (since we maintain the invariant d < m). So it looks like the code is taking pains to work correctly even with very high values of m. Unfortunately, it turns out that it can still overflow elsewhere, resulting in incorrect answers for some inputs that set the top bit. For example, when a = 3, b = 0x7FFFFFFFFFFFFFFFULL and m = 0xFFFFFFFFFFFFFFFFULL, the correct answer should be 0x7FFFFFFFFFFFFFFEULL, but the code will return 0x7FFFFFFFFFFFFFFDULL (an easy way to see the correct answer is to rerun with the values of a and b swapped). Specifically, this behaviour occurs whenever the line d += b overflows and leaves the truncated d less than m, causing a subtraction to be erroneously skipped.
Provided this behaviour is documented (as it is on the Wikipedia page), this is just a limitation, not a bug.
Removing the restriction
If we replace the lines
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
with
unsigned long long x = -(a >> 63) & b;
if (d >= m - x) d -= m;
d += x;
the code will work for all inputs, including those with top bits set. The cryptic first line is just a conditional-free (and thus usually faster) way of writing
unsigned long long x = (a & 0x8000000000000000ULL) ? b : 0;
The test d >= m - x operates on d before it has been modified -- it's like the old d >= m test, but b (when the top bit of a is on) or 0 (otherwise) has been subtracted from both sides. This tests whether d would be m or larger once x is added to it. We know that the RHS m - x never underflows, because the largest x can be is b and we have established that b < m at the top of the function.

Is a bitwise OR and AND NOT the same as addition and subtraction when a set is known?

If a = 2 and b = 4 where a OR b = 6 and (a|b) AND NOT b = a then is a bitwise AND NOT equivalent to subtraction when the the value is a set of flags which is known to include the flag being removed?
Is it the same for addition as well?
Note that this is in a situation where the flags are known to exist in the set. No addition or subtraction would occur if the flag is not present.

If I understood correctly what you're asking, yes. So:
If (a & b) == 0, then (a | b) == (a + b), and
If (a | b) == a, then (a & ~b) == (a - b)
As a sort of proof, take that addition can be written as a + b == (a ^ b) + ((a & b) << 1) (which is doing all the sums-without-carry, and then adding the carries separately). So if a & b is zero, the carries disappear and it becomes just a ^ b, and that in turn becomes a | b. A similar thing happens with the subtraction where we know there are no borrows.

Only if you're ORing with values whose 1-bits definitely are not in the first operand or ANDing with bit-negated values of values whose 1-bits definitely are in the first operand.

How to check float in diapason in C++?

Simple way for diapason (A, B)
if (A < X && X < B) ...
But it seems that +INF, .NAN also lay in the diapason

Your condition is not an interval (diapason). It is functionally equivalent to
X < std::min(A, B)
You only have an upper bound, no lower bound at all.
Exactly how NaN and +Inf behave, depends on the floating point representation, which is not specified by the C++ standard, but is cpu specific.
If we assume the commonly used IEEE-754, then neither X=+Inf, nor X=NaN can satisfy the condition for any values of A and B.
This is how you check that a floating point number is between a lower and an upper bound (but equal to neither):
X > low && X < high
or
low < X && X < high
Again, if we assume IEEE-754, then neither X=+Inf, nor X=NaN can satisfy this condition for any values of low and high. But, since IEEE-754 might not be guaranteed, the behaviour of such numbers is not specified. You might need to be explicit to support exotic hardware:
low < X && X < high && std::isfinite(X)

Euclidean integer modulo in C++

Where can I find an implementation or library that computes the remainder of an integer Euclidean division, 0 <= r < |n|?

In C++98 and C++03 versions of C++ language the built-in division (bit / and % operators) might be Euclidean and might be non-Euclidean - it is implementation defined. However, most implementations truncate the quotient towards zero, which is unfortunately non-Euclidean.
In most implementations 5 / -3 = -1 and 5 % -3 = -2. In Euclidean division 5 / -3 = -2 and 5 % -3 = 1.
C++11 requires integer division to be non-Euclidean: it requires an implementation that truncates towards zero.
The issue, as you can see, arises with negative numbers only. So, you can easily implement Euclidean division yourself by using operator % and post-correcting negative remainders
int euclidean_remainder(int a, int b)
{
assert(b != 0);
int r = a % b;
return r >= 0 ? r : r + std::abs(b);
}

Try (x%m + m)%m if the result must be positive.
Write your own function around this, or any of the variants, and don't get hung up on a library - you've spent more time asking than you would to just do it. Start your own library (toolbox) for simple functions you need.

It's a simple operator. %.
5 % 4 is 1, etc.
Edit:
As has been pointed out, depending on your implementation this isn't necessarily a euclidean mod.
#define EUCMOD(a, b) (a < 0 ? (((a % b) + b) % b) : (a % b))

I really liked Brandon's answer, but I started having some strange bugs.
After some testing I found that the expansion of the EUCMOD macro was messing with the precedence of operations.
So I'd suggest using it as a function instead of a macro
int eucmod(const int a, const int b)
{
return (a < 0 ? (((a % b) + b) % b) : (a % b));
}
Or adding a few parenthesis
#define EUCMOD(a,b) ((a) < 0 ? ((((a) % (b)) + (b)) % (b)) : ((a) % (b)))

Is it okay to check a double "d < 0"?

I'm writing a full double to float function for the Arduino (irrelevant, but I couldn't find any "proper" ones) and I do this check:
if (d < 0) {
d *= -1;
bin += "-";
}
I know because of floating point imprecisions double equality is finicky. So is it safe to do that? Or should I stick to this (which I use in later parts of my code anyways)
int compareNums(double x, double y) {
if (abs(x - y) <= EPSILON) {
return 0;
} else if (x > y) {
return 1;
} else {
return -1;
}
}
And a couple quick questions: does it matter if I do d < 0 or d < 0.0?
I'm multiplying a double d by 10 until it has no fractional part, so I do a check similar to d == (int) d. I'm wondering what's a good epsilon to use (I used this here http://msdn.microsoft.com/en-us/library/6x7575x3(v=vs.80).aspx), since I don't want to end up with an infinite loop. According to the article 0.000000119209 is the smallest distinguishable difference for floats or something like that.
Thanks

d < 0 is valid (though I'd prefer to write d < 0.0. In the first case the zero will be "promoted" to double before the comparison.
And comparing double to zero with < or > is perfectly valid, and does not require an "epsilon".
bin += "-"; is nonsensical.
In general comparing floats/doubles with "==" is invalid and should never be done (except for some special cases such as checking for zero or infinity). Some languages do not even allow "==" (or equivalent) between floats.
d == (int) d is more nonsense.

See my answer to this question:
How dangerous is it to compare floating point values?
Specifically, the recommendations that you should not be using absolute epsilons and should not be using floating point whatsoever until you've thoroughly read and understood What Every Computer Scientist Should Know About Floating-Point Arithmetic.
As for this specific piece of code in your question, where it seems your goal is to print a textual representation of the number, simply testing < 0 is correct. And it does not matter whether you write 0 or 0.0.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Are float inequalities guaranteed to be consistent - c++

Related

Modulo Multiplication Function: Multiplying two integers under a modulus

Is a bitwise OR and AND NOT the same as addition and subtraction when a set is known?

How to check float in diapason in C++?

Euclidean integer modulo in C++

Is it okay to check a double "d < 0"?

Categories

Resources