Stop double division before decimals (low precision, fast division; getting only the 'quotient')

Stop double division before decimals (low precision, fast division; getting only the 'quotient') - c++

Basically a performance related question:
I want to get only the integer quotient from a double division, i.e. for example, for a division 88.3/12.7 = 6.9527559055118110236220472440945, I only want to get '6' as a result.
A possible implementation would be of course: floor(x/y), but here, first the performance-intensive double division is done and afterwards floor throws away most of the 'work' the double division did.
So basically I want a division with doubles which 'stops' before calculating all these decimal points and just gives me the correct integer result of the division, without rounding or truncating the initial double arguments. Does anyone know an elegant implementation for this (I searched for this topic but didn't find much)?
Another implementation I can imagine is:
int(x*1000)/int(y*1000)
Where instead of 1000, the needed 'precision' can be used. A very simple implementation would be also simply subtracting y from x until the result is smaller than zero. But yeah, I was wondering what would be the best way to do it.
Also, doing simply int(x)/int(y) is no option since it could easily result in wrong results.
By the way, I know this is probably again one of these 'micro-optimization' questions which deal with a matter that does not really matter on new machines, but well, I still am kinda curious on the subject! :-)

There is no way to stop earlier, and using integer division is potentially slower.
For example, on Skylake:
idiv r/m32 L: 26-27 T: 6
divsd xmm, xmm L: 13-14 T: 4
(source)
So the double division is twice as fast and has a significantly better throughput. That is before you factor in the extra multiplications and extra cast.
On older µarchs, 32 bit integer division often has lower latency numbers listed than double division, but they varied more (division used to be more serial), with (for floats) round divisors being faster yet for integer division it's small results that are faster. This difference in characteristics can make it swing either way, depending on what you're dividing by what.
As you can see it's dangerous in this case to optimize without a specific target in mind, but I imagine newer machines are a more likely target than older machines, which means the double division is more or less the best you can do anyway (unless other optimizations apply). Dividing single precision floats is faster by itself but incurs a conversion cost, it actually ends up losing (5+10) if you add them up.

Related

controlling overflow and loss in precision while multiplying doubles

ques:
I have a large number of floating point numbers (~10,000 numbers) , each having 6 digits after decimal. Now, the multiplication of all these numbers would yield about 60,000 digits. But the double range is for 15 digits only. The output product has to have 6 digits of precision after decimal.
my approach:
I thought of multiplying these numbers by 10^6 and then multiplying them and later dividing them by 10^12.
I also thought of multiplying these numbers using arrays to store their digits and later converting them to decimal. But this also appears cumbersome and may not yield correct result.
Is there an alternate easier way to do this?

I thought of multiplying these numbers by 10^6 and then multiplying them and later dividing them by 10^12.
This would only achieve further loss of accuracy. In floating-point, large numbers are represented approximately just like small numbers are. Making your numbers bigger only means you are doing 19999 multiplications (and one division) instead of 9999 multiplications; it does not magically give you more significant digits.
This manipulation would only be useful if it prevented the partial product to reach into subnormal territory (and in this case, multiplying by a power of two would be recommended to avoid loss of accuracy due to the multiplication). There is no indication in your question that this happens, no example data set, no code, so it is only possible to provide the generic explanation below:
Floating-point multiplication is very well behaved when it does not underflow or overflow. At the first order, you can assume that relative inaccuracies add up, so that multiplying 10000 values produces a result that's 9999 machine epsilons away from the mathematical result in relative terms(*).
The solution to your problem as stated (no code, no data set) is to use a wider floating-point type for the intermediate multiplications. This solves both the problems of underflow or overflow and leaves you with a relative accuracy on the end result such that once rounded to the original floating-point type, the product is wrong by at most one ULP.
Depending on your programming language, such a wider floating-point type may be available as long double. For 10000 multiplications, the 80-bit “extended double” format, widely available in x86 processors, would improve things dramatically and you would barely see any performance difference, as long as your compiler does map this 80-bit format to a floating-point type. Otherwise, you would have to use a software implementation such as MPFR's arbitrary-precision floating-point format or the double-double format.
(*) In reality, relative inaccuracies compound, so that the real bound on the relative error is more like (1 + ε)9999 - 1 where ε is the machine epsilon. Also, in reality, relative errors often cancel each other, so that you can expect the actual relative error to grow like the square root of the theoretical maximum error.

IEEE-754 floating point: Divide first or multiply first for best precision?

What's better if I want to preserve as much precision as possible in a calculation with IEEE-754 floating point values:
a = b * c / d
or
a = b / d * c
Is there a difference? If there is, does it depend on the magnitudes of the input values? And, if magnitude matters, how is the best ordering determined when general magnitudes of the values are known?

It depends on the magnitude of the values. Obviously if one divides by zero, all bets are off, but if a multiplication or division results in a denormal subsequent operations can lose precision.
You may find it useful to study Goldberg's seminal paper What Every Computer Scientist Should Know About Floating-Point Arithmetic which will explain things far better than any answer you're likely to receive here. (Goldberg was one of the original authors of IEEE-754.)

Assuming that none of the operations would yield an overflow or an underflow, and your input values have uniformly distributed significands, then this is equivalent. Well, I suppose that to have a rigorous proof, one should do an exhaustive test (probably not possible in practice for double precision since there are 2^156 inputs), but if there is a difference in the average error, then it is tiny. I could try in low precisions with Sipe.
In any case, in the absence of overflow/underflow, only the exact values of the significands matter, not the exponents.
However if the result a is added to (or subtracted from) another expression and not reused, then starting with the division may be more interesting since you can group the multiplication with the following addition by using a FMA (thus with a single rounding).

Integer division, or float multiplication?

If one has to calculate a fraction of a given int value, say:
int j = 78;
int i = 5* j / 4;
Is this faster than doing:
int i = 1.25*j; // ?
If it is, is there a conversion factor one could use to decide which to use, as in how many int divisions can be done in the same time a one float multiplication?
Edit: I think the comments make it clear that the floating point math will be slower, but the question is, by how much? If I need to replace each float multiplication by N int divisions, for what N will this not be worth it anymore?

You've said all the values are dynamic, which makes a difference. For the specific values 5 * j / 4, the integer operations are going to be blindingly fast, because pretty much the worst case is that the compiler optimises them to two shifts and one addition, plus some messing around to cope with the possibility that j is negative. If the CPU can do better (single-cycle integer multiplication or whatever) then the compiler typically knows about it. The limits of compilers' abilities to optimize this kind of thing basically come when you're compiling for a wide family of CPUs (generating lowest-common-denominator ARM code, for example), where the compiler doesn't really know much about the hardware and therefore can't always make good choices.
I suppose that if a and b are fixed for a while (but not known at compile time), then it's possible that computing k = double(a) / b once and then int(k * x) for many different values of x, might be faster than computing a * x / b for many different values of x. I wouldn't count on it.
If all the values vary each time, then it seems unlikely that the floating-point division to compute the 1.25, followed by floating-point multiplication, is going to be any faster than the integer multiplication followed by integer division. But you never know, test it.
It's not really possible to give simple relative timings for this on modern processors, it really depends a lot on the surrounding code. The main costs in your code often aren't the "actual" ops: it's "invisible" stuff like instruction pipelines stalling on dependencies, or spilling registers to stack, or function call overhead. Whether or not the function that does this work can be inlined might easily make more difference than how the function actually does it. As far as definitive statements of performance are concerned you can basically test real code or shut up. But the chances are that if your values start as integers, doing integer ops on them is going to be faster than converting to double and doing a similar number of double ops.

It is impossible to answer this question out of context. Additionally 5*j/4 does not generally produce the same result as (int) (1.25*j), due to properties of integer and floating-point arithmetic, including rounding and overflow.
If your program is doing mostly integer operations, then the conversion of j to floating point, multiplication by 1.25, and conversion back to integer might be free because it uses floating-point units that are not otherwise engaged.
Alternatively, on some processors, the operating system might mark the floating-point state to be invalid, so that the first time a process uses it, there is an exception, the operating system saves the floating-point registers (which contain values from another process), restores or initializes the registers for your process, and returns from the exception. This would take a great deal of time, relative to normal instruction execution.
The answer also depends on characteristics of the specific processor model the program is executing on, as well as the operating system, how the compiler translates the source into assembly, and possibly even what other processes on the system are doing.
Also, the performance difference between 5*j/4 and (int) (1.25*j) is most often too small to be noticeable in a program unless it or operations like it are repeated a great many times. (And, if they are, there may be huge benefits to vectorizing the code, that is, using the Single Instruction Multiple Data [SIMD] features of many modern processors to perform several operations at once.)

In your case, 5*j/4 would be much faster than 1.25*j because division by powers of 2 can be easily manipulated by a right shift, and 5*j can be done by a single instruction on many architectures such as LEA on x86, or ADD with shift on ARM. Most others would require at most 2 instructions like j + (j >> 2) but that way it's still probably faster than a floating-point multiplication. Moreover by doing int i = 1.25*j you need 2 conversions from int to double and back, and 2 cross-domain data movements which is generally very costly
In other cases when the fraction is not representable in binary floating-point (like 3*j/10) then using int multiply/divide would be more correct (because 0.3 isn't exactly 0.3 in floating-point), and most probably faster (because the compiler can optimize out division by a constant by converting it to a multiplication)
In cases that i and j are of a floating-point type, multiplying by another floating-point value might be faster. Because moving values between float and int domains takes time and conversion between int and float also takes time as I said above
An important difference is that 5*j/4 will overflow if j is too large, but 1.25*j doesn't
That said, there's no general answer for the questions "which is faster" and "how much faster", as it depends on a specific architecture and in a specific context. You must measure on your system and decide. But if an expression is done repeatedly to a lot of values then it's time to move to SIMD
See also
Why is int * float faster than int / int?
Should I use multiplication or division?
Floating point division vs floating point multiplication

How can I get consistent program behavior when using floats?

I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.
In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:
if (value <= 0.0f) something_happens();
A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.
Here's a simple example program that demonstrates the phenomena I'm describing:
float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);
f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);
The output of this program is
0.0000050000057854
0.0000050000062402
even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.
What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.
The only solution I've so far been able to think of is to use some value epsilon like this:
if (value < epsilon) something_happens();
but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?
I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.
Note: Using integer values instead is not an option.
Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.

I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.

Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:
If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.
Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.

I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.
In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.
As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.
One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.

If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.
Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).

Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.
Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.

Preventing Rounding Errors

I was just reading about rounding errors in C++. So, if I'm making a math intense program (or any important calculations) should I just drop floats all together and use only doubles or is there an easier way to prevent rounding errors?

Obligatory lecture: What Every Programmer Should Know About Floating-Point Arithmetic.
Also, try reading IEEE Floating Point standard.
You'll always get rounding errors. Unless you use an infinite arbitrary precision library, like gmplib. You have to decide if your application really needs this kind of effort.
Or, you could use integer arithmetic, converting to floats only when needed. This is still hard to do, you have to decide if it's worth it.
Lastly, you can use float or double taking care not to make assumption about values at the limit of representation's precision. I'd wish this Valgrind plugin was implemented (grep for float)...

The rounding errors are normally very insignificant, even using floats. Mathematically-intense programs like games, which do very large numbers of floating-point computations, often still use single-precision.

This might work if your highest number is less than 10 billion and you're using C++ double precision.
if ( ceil(10000*(x + 0.00001)) > ceil(100000*(x - 0.00001))) {
x = ceil(10000*(x + 0.00004)) / 10000;
}
This should allow at least the last digit to be off +/- 9. I'm assuming dividing by 1000 will always just move a decimal place. If not, then maybe it could be done in binary.
You would have to apply it after every operation that is not +, -, *, or a comparison. For example, you can't do two divisions in the same formula because you'd have to apply it to each division.
If that doesn't work, you could work in integers by scaling the numbers up and always use integer division. If you need advanced functions maybe there is a package that does deterministic integer math. Integer division is required in a lot of financial settings because of round off error being subject to exploit like in the movie "The Office".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js