How to write this floating point code in a portable way? - c++

I am working on a cryptocurrency and there is a calculation that nodes must make:
average /= total;
double ratio = average/DESIRED_BLOCK_TIME_SEC;
int delta = -round(log2(ratio));
It is required that every node has the exact same result no matter what architecture or stdlib being used by the system. My understanding is that log2 might have different implementations that yield very slightly different results or flags like --ffast-math could impact the outputted results.
Is there a simple way to convert the above calculation to something that is verifiably portable across different architectures (fixed point?) or am I overthinking the precision that is needed (given that I round the answer at the end).
EDIT: Average is a long and total is an int... so average ends up rounded to the closest second.
DESIRED_BLOCK_TIME_SEC = 30.0 (it's a float) that is #defined

For this kind of calculation to be exact, one must either calculate all the divisions and logarithms exactly -- or one can work backwards.
-round(log2(x)) == round(log2(1/x)), meaning that one of the divisions can be turned around to get (1/x) >= 1.
round(log2(x)) == floor(log2(x * sqrt(2))) == binary_log((int)(x*sqrt(2))).
One minor detail here is, if (double)sqrt(2) rounds down, or up. If it rounds up, then there might exist one or more value x * sqrt2 == 2^n + epsilon (after rounding), where as if it would round down, we would get 2^n - epsilon. One would give the integer value of n the other would give n-1. Which is correct?
Naturally that one is correct, whose ratio to the theoretical mid point x * sqrt(2) is smaller.
x * sqrt(2) / 2^(n-1) < 2^n / (x * sqrt(2)) -- multiply by x*sqrt(2)
x^2 * 2 / 2^(n-1) < 2^n -- multiply by 2^(n-1)
x^2 * 2 < 2^(2*n-1)
In order of this comparison to be exact, x^2 or pow(x,2) must be exact as well on the boundary - and it matters, what range the original values are. Similar analysis can and should be done while expanding x = a/b, so that the inexactness of the division can be mitigated at the cost of possible overflow in the multiplication...
Then again, I wonder how all the other similar applications handle the corner cases, which may not even exist -- and those could be brute force searched assuming that average and total are small enough integers.
EDIT
Because average is an integer, it makes sense to tabulate those exact integer values, which are on the boundaries of -round(log2(average)).
From octave: d=-round(log2((1:1000000)/30.0)); find(d(2:end) ~= find(d(1:end-1))
1 2 3 6 11 22 43 85 170 340 679 1358 2716
5431 10862 21723 43445 86890 173779 347558 695115
All the averages between [1 2( -> 5
All the averages between [2 3( -> 4
All the averages between [3 6( -> 3
..
All the averages between [43445 86890( -> -11
int a = find_lower_bound(average, table); // linear or binary search
return 5 - a;
No floating point arithmetic needed

Related

Should I use multiplication or division for recurring floats?

It is common knowledge that division takes many more clock cycles to compute than multiplication. (Refer to the discussion here: Floating point division vs floating point multiplication.)
I already use x * 0.5 instead of x / 2 and x * 0.125 instead of x / 8 in my C++ code, but I was wondering how far I should take this.
For decimals that recur when inverted (ie. 1 / num is a recurring decimal), I use division instead of multiplication (example x / 2.2 instead of x * 0.45454545454).
My question is: In loops that iterate a considerably large number of times, should I replace divisors with their recurring multiplicative counterparts (ie. x * 0.45454545454 instead of x / 2.2), or will this bring an even greater loss of precision?
Edit: I did some profiling, I turned on full optimization in Visual Studio, used the Windows QueryPerformanceCounter() function to get profiling results.
int main() {
init();
int x;
float value = 100002030.0;
start();
for (x = 0; x < 100000000; x++)
value /= 2.2;
printf("Div: %fms, value: %f", getElapsedMilliseconds(), value);
value = 100002030.0;
restart();
for (x = 0; x < 100000000; x++)
value *= 0.45454545454;
printf("\nMult: %fms, value: %f", getElapsedMilliseconds(), value);
scanf_s("");
}
The results are: Div: 426.907185ms, value: 0.000000
Mult: 289.616415ms, value: 0.000000
Division took almost twice as long as multiplication, even with optimizations. Performance benefits are guaranteed, but will they reduce precision?
For decimals that recur when inverted (ie. 1 / num is a recurring decimal), I use division instead of multiplication (example x / 2.2 instead of x * 0.45454545454).
It is also common knowledge that 22/10 is not representable exactly in binary floating-point, so all you are achieving, instead of multiplying by a slightly inaccurate value, is dividing by a slightly inaccurate value.
In fact, if the intent is to divide by 22/10 or some other real value that isn't necessarily exactly representable in binary floating-point, then half the times, the multiplication is more accurate than the division, because it happens by coincidence that the relative error for 1/X is less than the relative error for X.
Another remark is that your micro-benchmark runs into subnormal numbers, where the timings are not representative of timings for the usual operations on normal floating-point numbers, and after a short while, value is zero, which again means that the timings are not representative of the reality of multiplying and dividing normal numbers. And as Mark Ransom says, you should at least make the operands the same for both measurements: as currently written all the multiplications take a zero operand and result in zero. Also since 2.2 and 0.45454545454 both have type double, your benchmark is measuring double-precision multiplication and division, and if you are willing to implement a single-precision division by a double-precision multiplication, this needs not involve any loss of accuracy (but you would have to provide more digits for 1/2.2).
But don't let yourself be fooled into trying to fix the micro-benchmark. You don't need it, because there is no trade-off when X is no more exactly representable than 1/X. There is no reason not to use multiplication.
Note: you should explicitly multiply by 1 / X because since the two operations / X and * (1 / X) are very slightly different, the compiler is not able to do the replacement itself. On the other hand you don't need to replace / 2 by * 0.5 because any compiler worth its salt should do that for you.
You will get different answers when multiplying by a reciprocal versus dividing, but in practice it typically does not matter, and the performance gain is worthwhile. At most, the error will be 1 ULP for reciprocal multiplication versus ½ ULP for division. But do
a = b * (1.f / 7.f);
rather than
a = b * 0.142857f;
because the former will generate the most accurate (½ ULP) representation for 1/7.

Fast integer solution of x(x-1)/2 = c

Given a non-negative integer c, I need an efficient algorithm to find the largest integer x such that
x*(x-1)/2 <= c
Equivalently, I need an efficient and reliably accurate algorithm to compute:
x = floor((1 + sqrt(1 + 8*c))/2) (1)
For the sake of defineteness I tagged this question C++, so the answer should be a function written in that language. You can assume that c is an unsigned 32 bit int.
Also, if you can prove that (1) (or an equivalent expression involving floating-point arithmetic) always gives the right result, that's a valid answer too, since floating-point on modern processors can be faster than integer algorithms.
If you're willing to assume IEEE doubles with correct rounding for all operations including square root, then the expression that you wrote (plus a cast to double) gives the right answer on all inputs.
Here's an informal proof. Since c is a 32-bit unsigned integer being converted to a floating-point type with a 53-bit significand, 1 + 8*(double)c is exact, and sqrt(1 + 8*(double)c) is correctly rounded. 1 + sqrt(1 + 8*(double)c) is accurate to within one ulp, since the last term being less than 2**((32 + 3)/2) = 2**17.5 implies that the unit in the last place of the latter term is less than 1, and thus (1 + sqrt(1 + 8*(double)c))/2 is accurate to within one ulp, since division by 2 is exact.
The last piece of business is the floor. The problem cases here are when (1 + sqrt(1 + 8*(double)c))/2 is rounded up to an integer. This happens if and only if sqrt(...) rounds up to an odd integer. Since the argument of sqrt is an integer, the worst cases look like sqrt(z**2 - 1) for positive odd integers z, and we bound
z - sqrt(z**2 - 1) = z * (1 - sqrt(1 - 1/z**2)) >= 1/(2*z)
by Taylor expansion. Since z is less than 2**17.5, the gap to the nearest integer is at least 1/2**18.5 on a result of magnitude less than 2**17.5, which means that this error cannot result from a correctly rounded sqrt.
Adopting Yakk's simplification, we can write
(uint32_t)(0.5 + sqrt(0.25 + 2.0*c))
without further checking.
If we start with the quadratic formula, we quickly reach sqrt(1/4 + 2c), round up at 1/2 or higher.
Now, if you do that calculation in floating point, there can be inaccuracies.
There are two approaches to deal with these inaccuracies. The first would be to carefully determine how big they are, determine if the calculated value is close enough to a half for them to be important. If they aren't important, simply return the value. If they are, we can still bound the answer to being one of two values. Test those two values in integer math, and return.
However, we can do away with that careful bit, and note that sqrt(1/4 + 2c) is going to have an error less than 0.5 if the values are 32 bits, and we use doubles. (We cannot make this guarantee with floats, as by 2^31 the float cannot handle +0.5 without rounding).
In essense, we use the quadratic formula to reduce it to two possibilities, and then test those two.
uint64_t eval(uint64_t x) {
return x*(x-1)/2;
}
unsigned solve(unsigned c) {
double test = sqrt( 0.25 + 2.*c );
if ( eval(test+1.) <= c )
return test+1.
ASSERT( eval(test) <= c );
return test;
}
Note that converting a positive double to an integral type rounds towards 0. You can insert floors if you want.
This may be a bit tangential to your question. But, what caught my attention is the specific formula. You are trying to find the triangular root of Tn - 1 (where Tn is the nth triangular number).
I.e.:
Tn = n * (n + 1) / 2
and
Tn - n = Tn - 1 = n * (n - 1) / 2
From the nifty trick described here, for Tn we have:
n = int(sqrt(2 * c))
Looking for n such that Tn - 1 ≤ c in this case doesn't change the definition of n, for the same reason as in the original question.
Computationally, this saves a few operations, so it's theoretically faster than the exact solution (1). In reality, it's probably about the same.
Neither this solution or the one presented by David are as "exact" as your (1) though.
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(2 * c)) (red) vs Exact (white line)
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(0.25 + 2 * c) + 0.5 (red) vs Exact (white line)
My real point is that triangular numbers are a fun set of numbers that are connected to squares, pascal's triangle, Fibonacci numbers, et. al.
As such there are loads of identities around them which might be used to rearrange the problem in a way that didn't require a square root.
Of particular interest may be that Tn + Tn - 1 = n2
I'm assuming you know that you're working with a triangular number, but if you didn't realize that, searching for triangular roots yields a few questions such as this one which are along the same topic.

Accurate percentage in C++

Given 2 numbers, where A <= B say for example A = 9 and B = 10, I am trying to get the percentage of how smaller A is compared to B. I need to have the percentage as an int e.g. if the result is 10.00% The int should be 1000.
Here is my code:
int A = 9;
int B = 10;
int percentage = (((1 - (double)A/B) / 0.01)) * 100;
My code returns 999 instead of 1000. Some precision related to the usage of double is lost.
Is there a way to avoid losing precision in my case?
Seems the formula you're looking for is
int result = 10000 - (A*10000+B/2)/B;
The idea is to do all computations in integers and delaying division.
To do the rounding half of the denominator is added before performing the division (otherwise you get truncation in the division and thus upper rounding because of 100%-x)
For example with A=9 and B=11 the percentage is 18.18181818... and rounding 18.18, the computation without the rounding would give 1819 instead of the expected result 1818.
Note that the computation is done all in integers so there is a risk of overflow for large values of A and B. For example if int is 32 bit then A can be up to around 200000 before risking an overflow when computing A*10000.
Using A*10000LL instead of A*10000 in the formula will trade in some speed to raise the limit to a much bigger value.
Offcourse there may be precision loss in floating point number. Either you should use fixed point number as #6502 answered or add a bias to the result to get the intended answer.
You should better do
assert(B != 0);
int percentage = ((A<0) == (B<0) ? 0.5 : -0.5) + (((1 - (double)A/B) / 0.01)) * 100;
Because of precision loss, result of (((1 - (double)A/B) / 0.01)) * 100 may be slightly less or more than intended. If you add extra 0.5, it is guaranteed to be sligthly more than intended. Now when you cast this value to an integer, you get intended answer. (floor or ceil value depending whether the fractional part of the result of equation was above or below 0.5)
I tried
float floatpercent = (((1 - (double)A/B) / 0.01)) * 100;
int percentage = (int) floatpercent;
cout<< percentage;
displays 1000
I suspect a precision loss on automatic casting to int as the root problem to your code.
[I alluded to this in a comment to the original question, but I though I'd post it as an answer.]
The core problem is that the form of expression you're using amplifies the unavoidable floating point loss of precision when representing simple fractions of 10.
Your expression (with casts stripped out for now, using standard precedence to also avoid some parens)
((1 - A/B) / 0.01) * 100
is quite a complicated way of representing what you want, although it's algebraically correct. Unfortunately, floating point numbers can only precisely represent numbers like 1/2, 1/4, 1/8, etc, their multiples, and sums of those. In particular, neither 9/10 or 1/10 or 1/100 have precise representations.
The above expression introduces these errors twice: first in the calculation of A/B, and then in the division by 0.01. These two imprecise values are then divided, which further amplifies the inherent error.
The most direct way to write what you meant (again without needed casts) is
((B-A) / B) * 10000
This produces the correct answer and considerably easier to read, I would suggest, than the original. The fully correct C form is
((B - A) / (double)B) * 10000
I've tested this and it works reliably. As others have noted, it's generally good better to work with doubles instead of floats, as their extra precision makes them less prone (but not immune) to this sort of difficulty.

What is correct by common sense: (int) blabla * 255.99999999999997 or round(blabla*255)?

Recently I found this interesting thing in webkit sources, related to color conversions (hsl to rgb):
http://osxr.org/android/source/external/webkit/Source/WebCore/platform/graphics/Color.cpp#0111
const double scaleFactor = nextafter(256.0, 0.0); // it's here something like 255.99999999999997
// .. some code skipped
return makeRGBA(static_cast<int>(calcSomethingFrom0To1(blablabla) * scaleFactor),
Same I found here: http://www.filewatcher.com/p/kdegraphics-4.6.0.tar.bz2.5101406/kdegraphics-4.6.0/kolourpaint/imagelib/effects/kpEffectHSV.cpp.html
(int)(value * 255.999999)
Is it correct to use such technique at all? Why dont' use something straight like round(blabla * 255)?
Is it features of C/C++? As I see strictly speaking is will return not always correct results, in 27 cases of 100. See spreadsheet at https://docs.google.com/spreadsheets/d/1AbGnRgSp_5FCKAeNrELPJ5j9zON9HLiHoHC870PwdMc/edit?usp=sharing
Somebody pls explain — I think it should be something basic.
Normally we want to map a real value x in the (closed) interval [0,1] to an integer value j in the range [0 ...255].
And we want to do it in a "fair" way, so that, if the reals are uniformly distributed in the range, the discrete values will be approximately equiprobable: each of the 256 discrete values should get "the same share" (1/256) from the [0,1] interval. That is, we want a mapping like this:
[0 , 1/256) -> 0
[1/256, 2/256) -> 1
...
[254/256, 255/256) -> 254
[255/256, 1] -> 255
We are not much concerned about the transition points [*], but we do want to cover the full the range [0,1]. How to accomplish that?
If we simply do j = (int)(x *255): the value 255 would almost never appear (only when x=1); and the rest of the values 0...254 would each get a share of 1/255 of the interval. This would be unfair, regardless of the rounding behaviour at the limit points.
If we instead do j = (int)(x * 256): this partition would be fair, except for a sngle problem: we would get the value 256 (out of range!) when x=1 [**]
That's why j = (int)(x * 255.9999...) (where 255.9999... is actually the largest double less than 256) will do.
An alternative implementation (also reasonable, almost equivalent) would be
j = (int)(x * 256);
if(j == 256) j = 255;
// j = x == 1.0 ? 255 : (int)(x * 256); // alternative
but this would be more clumsy and probably less efficient.
round() does not help here. For example, j = (int)round(x * 255) would give a 1/255 share to the integers j=1...254 and half that value to the extreme points j=0, j=255.
[*] I mean: we are not extremely interested in what happens in the 'small' neighbourhood of, say, 3/256: rounding might give 2 or 3, it doesn't matter. But we are interested in the extrema: we want to get 0 and 255, for x=0 and x=1respectively.
[**] The IEEE floating point standard guarantees that there's no rounding ambiguity here: integers admit an exact floating point representation, the product will be exact, and the casting will give always 256. Further, we are guaranteed that 1.0 * z = z.
In general, I'd say (int)(blabla * 255.99999999999997) is more correct than using round().
Why?
Because with round(), 0 and 255 only have "half" the range that 1-254 do. If you round(), then 0-0.00196078431 get mapped to 0, while 0.00196078431-0.00588235293 get mapped to 1. This means that 1 has 200% more probability of occurring than 0, which is, strictly speaking, an unfair bias.
If, isntead, one multiplies by 255.99999999999997 and then floors (which is what casting to an integer does, since it truncates), then each integer from 0 to 255 are equally likely.
Your spreadsheet might show this better if it counted in fractional percentages (i.e. if it counted by 0.01% instead of 1% each time). I've made a simple spreadsheet to show this. If you look at that spreadsheet, you'll see that 0 is unfairly biased against when round()ing, but with the other method things are fair and equal.
Casting to int has the same effect as the floor function (i.e. it truncates). When you call round it, well, rounds to the nearest integer.
They do different things, so choose the one you need.

How to efficiently find the largest integer closest to the mean of two integers in increments of 100,000?

Let's say I am given integers x and y (satisfying x <= y with ones digit of 0 so they are, in particular, divisible by two). Then I know that their average avg = ((x+y) / 2) is an integer as well. I would like to find this midpoint rounded up to a resolution of 100. In other words if my two inputs are 75200 and 75300 then the avg is 75250 and rounded up to the nearest 100 (but without exceeding or equaling the bigger number) forces the answer to be 75200.
How can I implement this logic without first dividing everything by 100 and using the following floating point arithmetic:
x + std::floor((y - x) * .5 * 100 + .5)*0.01
In other words, how can I do the above without floating point values but obtain the same behavior at the resolution of 100 instead of 0.01?
To compute the average you can do
avg = (x + y) / 2
(BTW, integer addition and division by 2 are very cheap operations even on small microcontrollers.)
To round this to the nearest multiple of 100 (corresponding to your floating-point example) you can do
result = ((avg + 50) / 100) * 100
as integer division rounds down to the nearest integer. By changing the 50 to 0 you can always round down, while changing it to 99 always rounds up.
Edit: Note that this method for rounding doesn't work for negative numbers. Since integer division rounds towards zero, in that case you'll need to subtract the 50, subtract 99 to always round down and subtract 0 to always round up.
Your problematic example requires strong conditions:
the difference between x and y needs to be not greater than 100
y % 100 must be 0
So for most cases, a simple rounded average is perfect for you:
avg100 = avg - (avg % 100) + 100
The tricky part is fixing the remaining error without a condition - if you want to avoid conditions, or slow operations.
For this, the best way is to use a multiplication, and split the expression into two:
avg100 = avg - (avg % 100)
avg100 += 100 * !!(y - avg100)
For most cases, y is greater than avg100. For this case, the !! operator will return 1. In the rare case when they equal, it will return a 0, and it won't change the value.
(I don't know if the compiler will really generate a code without conditions for the '!!' operator, but I don't have a batter idea, and if it is possible, I think it will. If not, this code is still short and easy to understand.)
Also, you can calculate the average using the following expression:
avg = y - (y-x)/2
Or even change the division into bit shift for optimization.
This won't require for both of the numbers to be even, just to be the same parity.