How do I handle values close to zero in c++? - c++

I am trying to code an iterative function which takes an initial
double t = /*formula 1*/;
and then computes
for (auto i = 0; i < bigNumber; ++i)
{
temp = /*formula 2*/;
t = t*temp;
}
This works fine, except in the cases where the initial t is so small that C++ automatically sets it equal to zero (it is NOT actually supposed to be zero).
Then of course t will forever remain zero since we multiply it by itself, and that's the problem.
I tried solving this by setting t equal to some very small, but non-zero, number in case C++ had set it to zero, but this doesn't work, because then, I end up with the opposite problem, as t eventually blows up, once we have iterated it enough times.
How do I solve this problem?
Possibly worth mentioning:
The first formula (formula 1) involves stuff like exp(-verybignumber) and the second formula involves stuff like pow(i, -1), meaning it becomes very small with higher iterations.

Floating-point arithmetic isn't trivial, as you just discovered. This is not really related to C++, but to the IEEE 754 standard.
One of the things you need to need to ensure is that you stay within the normal numbers. That is, ensure your values throughout your computation do not get too small or too large.
In some cases, this is easy and maybe rescaling the input data is enough. In other cases, maybe you have to rethink your equations (steps) to avoid this.
Sometimes you can simply get away using a bigger type, e.g. long double or even __float128 (quad, check libquadmath).
Other solutions are to employ arbitrary-precision numbers (use a library like GMP and MPFR; do not attempt to do it yourself as a beginner) or even symbolic computation. It all depends on what performance you require.
Note that there are many other pitfalls when dealing with floating-point arithmetic.

Related

Comparing Floating Point Nos - Google Test Framework

While going through this post at SO by the user #skrebbel who stated that the google testing framework does a good and fast job for comparing floats and doubles. So I wrote the following code to check the validity of the code and apparently it seems like I am missing something here , since I was expecting to enter the almost equal to section here this is my code
float left = 0.1234567;
float right= 0.1234566;
const FloatingPoint<float> lhs(left), rhs(right);
if (lhs.AlmostEquals(rhs))
{
std::cout << "EQUAL"; //Shouldnt it have entered here ?
}
Any suggetsions would be appreciated.
You can use
ASSERT_NEAR(val1, val2, abs_error);
where you can give the acceptable - your chosen one, like, say 0.0000001 - difference as abs_error, if the default one is too small, see here https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#floating-point-comparison
Your left and right are not “almost equal” because they are too far apart, farther than the default tolerance of AlmostEquals. The code in one of the answers in the question you linked to shows a tolerance of 4 ULP, but your numbers are 14 ULP apart (using IEEE 754 32-bit binary and correctly rounding software). (An ULP is the minimum increment of the floating-point value. It is small for floating-point numbers of small magnitude and large for large numbers, so it is approximately relative to the magnitude of the numbers.)
You should never perform any floating-point comparison without understanding what errors may be in the values you are comparing and what comparison you are performing.
People often misstate that you cannot test floating-point values for equality. This is false; executing a == b is a perfect operation. It returns true if and only if a is equal to b (that is, a and b are numbers with exactly the same value). The actual problem is that they are trying to calculate a correct function given incorrect input. == is a function: It takes two inputs and returns a value. Obviously, if you give any function incorrect inputs, it may return an incorrect result. So the problem here is not floating-point comparison; it is incorrect inputs. You cannot generally calculate a sum, a product, a square root, a logarithm, or any other function correctly given incorrect input. Therefore, when using floating-point, you must design an algorithm to work with approximate values (or, in special cases, use great care to ensure no errors are introduced).
Often people try to work around errors in their floating-point values by accepting as equal numbers that are slightly different. This decreases false negatives (indications of inequality due to prior computing errors) at the expense of increasing false positives (indications of equality caused by lax acceptance). Whether this exchange of one kind of error for another is acceptable depends on the application. There is no general solution, which is why functions like AlmostEquals are generally bad.
The errors in floating-point values are the results of preceding operations and values. These errors can range from zero to infinity, depending on circumstances. Because of this, one should never simply accept the default tolerance of a function such as AlmostEquals. Instead, one should calculate the tolerance, which is specific to their applications, needs, and computations, and use that calculated tolerance (or not use a comparison at all).
Another problem is that functions such as AlmostEquals are often written using tolerances that are specified relative to the values being compared. However, the errors in the values may have been affected by intermediate values of vastly different magnitude, so the final error might be a function of data that is not present in the values being compared.
“Approximate” floating-point comparisons may be acceptable in code that is testing other code because most bugs are likely cause large errors, so a lax acceptance of equality will allow good code to continue but will report bugs in most bad code. However, even in this situation, you must set the expected result and the permitted error tolerance appropriately. The AlmostEquals code appears to hard-code the error tolerance.
(Not sure if this 100% applies to the original question but this is what I came for when I stumbled upon it)
There also exist ASSERT_FLOAT_EQ and EXPECT_FLOAT_EQ (or the corresponding versions for double) which you can use if you don't want to worry about tolerable errors yourself.
Docs: https://github.com/google/googletest/blob/master/docs/reference/assertions.md#floating-point-comparison-floating-point

How can I get consistent program behavior when using floats?

I am writing a simulation program that proceeds in discrete steps. The simulation consists of many nodes, each of which has a floating-point value associated with it that is re-calculated on every step. The result can be positive, negative or zero.
In the case where the result is zero or less something happens. So far this seems straightforward - I can just do something like this for each node:
if (value <= 0.0f) something_happens();
A problem has arisen, however, after some recent changes I made to the program in which I re-arranged the order in which certain calculations are done. In a perfect world the values would still come out the same after this re-arrangement, but because of the imprecision of floating point representation they come out very slightly different. Since the calculations for each step depend on the results of the previous step, these slight variations in the results can accumulate into larger variations as the simulation proceeds.
Here's a simple example program that demonstrates the phenomena I'm describing:
float f1 = 0.000001f, f2 = 0.000002f;
f1 += 0.000004f; // This part happens first here
f1 += (f2 * 0.000003f);
printf("%.16f\n", f1);
f1 = 0.000001f, f2 = 0.000002f;
f1 += (f2 * 0.000003f);
f1 += 0.000004f; // This time this happens second
printf("%.16f\n", f1);
The output of this program is
0.0000050000057854
0.0000050000062402
even though addition is commutative so both results should be the same. Note: I understand perfectly well why this is happening - that's not the issue. The problem is that these variations can mean that sometimes a value that used to come out negative on step N, triggering something_happens(), now may come out negative a step or two earlier or later, which can lead to very different overall simulation results because something_happens() has a large effect.
What I want to know is whether there is a good way to decide when something_happens() should be triggered that is not going to be affected by the tiny variations in calculation results that result from re-ordering operations so that the behavior of newer versions of my program will be consistent with the older versions.
The only solution I've so far been able to think of is to use some value epsilon like this:
if (value < epsilon) something_happens();
but because the tiny variations in the results accumulate over time I need to make epsilon quite large (relatively speaking) to ensure that the variations don't result in something_happens() being triggered on a different step. Is there a better way?
I've read this excellent article on floating point comparison, but I don't see how any of the comparison methods described could help me in this situation.
Note: Using integer values instead is not an option.
Edit the possibility of using doubles instead of floats has been raised. This wouldn't solve my problem since the variations would still be there, they'd just be of a smaller magnitude.
I've worked with simulation models for 2 years and the epsilon approach is the sanest way to compare your floats.
Generally, using suitable epsilon values is the way to go if you need to use floating point numbers. Here are a few things which may help:
If your values are in a known range you and you don't need divisions you may be able to scale the problem and use exact operations on integers. In general, the conditions don't apply.
A variation is to use rational numbers to do exact computations. This still has restrictions on the operations available and it typically has severe performance implications: you trade performance for accuracy.
The rounding mode can be changed. This can be use to compute an interval rather than an individual value (possibly with 3 values resulting from round up, round down, and round closest). Again, it won't work for everything but you may get an error estimate out of this.
Keeping track of the value and a number of operations (possible multiple counters) may also be used to estimate the current size of the error.
To possibly experiment with different numeric representations (float, double, interval, etc.) you might want to implement your simulation as templates parameterized for the numeric type.
There are many books written on estimating and minimizing errors when using floating point arithmetic. This is the topic of numerical mathematics.
Most cases I'm aware of experiment briefly with some of the methods mentioned above and conclude that the model is imprecise anyway and don't bother with the effort. Also, doing something else than using float may yield better result but is just too slow, even using double due to the doubled memory footprint and the smaller opportunity of using SIMD operations.
I recommend that you single step - preferably in assembly mode - through the calculations while doing the same arithmetic on a calculator. You should be able to determine which calculation orderings yield results of lesser quality than you expect and which that work. You will learn from this and probably write better-ordered calculations in the future.
In the end - given the examples of numbers you use - you will probably need to accept the fact that you won't be able to do equality comparisons.
As to the epsilon approach you usually need one epsilon for every possible exponent. For the single-precision floating point format you would need 256 single precision floating point values as the exponent is 8 bits wide. Some exponents will be the result of exceptions but for simplicity it is better to have a 256 member vector than to do a lot of testing as well.
One way to do this could be to determine your base epsilon in the case where the exponent is 0 i e the value to be compared against is in the range 1.0 <= x < 2.0. Preferably the epsilon should be chosen to be base 2 adapted i e a value that can be exactly represented in a single precision floating point format - that way you know exactly what you are testing against and won't have to think about rounding problems in the epsilon as well. For exponent -1 you would use your base epsilon divided by two, for -2 divided by 4 and so on. As you approach the lowest and the highest parts of the exponent range you gradually run out of precision - bit by bit - so you need to be aware that extreme values can cause the epsilon method to fail.
If it absolutely has to be floats then using an epsilon value may help but may not eliminate all problems. I would recommend using doubles for the spots in the code you know for sure will have variation.
Another way is to use floats to emulate doubles, there are many techniques out there and the most basic one is to use 2 floats and do a little bit of math to save most of the number in one float and the remainder in the other (saw a great guide on this, if I find it I'll link it).
Certainly you should be using doubles instead of floats. This will probably reduce the number of flipped nodes significantly.
Generally, using an epsilon threshold is only useful when you are comparing two floating-point number for equality, not when you are comparing them to see which is bigger. So (for most models, at least) using epsilon won't gain you anything at all -- it will just change the set of flipped nodes, it wont make that set smaller. If your model itself is chaotic, then it's chaotic.

Strategy for dealing with floating point inaccuracy

Is there a general best practice strategy for dealing with floating point inaccuracy?
The project that I'm working on tried to solve them by wrapping everything in a Unit class which holds the floating point value and overloads the operators. Numbers are considered equal if they "close enough," comparisons like > or < are done by comparing with a slightly lower or higher value.
I understand the desire to encapsulate the logic of handling such floating point errors. But given that this project has had two different implementations (one based on the ratio of the numbers being compared and one based on the absolute difference) and I've been asked to look at the code because its not doing the right, the strategy seems to be a bad one.
So what is best the strategy for try to make sure you handle all of the floating point inaccuracy in a program?
You want to keep data as dumb as possible, generally. Behavior and the data are two concerns that should be kept separate.
The best way is to not have unit classes at all, in my opinion. If you have to have them, then avoid overloading operators unless it has to work one way all the time. Usually it doesn't, even if you think it does. As mentioned in the comments, it breaks strict weak ordering for instance.
I believe the sane way to handle it is to create some concrete comparators that aren't tied to anything else.
struct RatioCompare {
bool operator()(float lhs, float rhs) const;
};
struct EpsilonCompare {
bool operator()(float lhs, float rhs) const;
};
People writing algorithms can then use these in their containers or algorithms. This allows code reuse without demanding that anyone uses a specific strategy.
std::sort(prices.begin(), prices.end(), EpsilonCompare());
std::sort(prices.begin(), prices.end(), RatioCompare());
Usually people trying to overload operators to avoid these things will offer complaints about "good defaults", etc. If the compiler tells you immediately that there isn't a default, it's easy to fix. If a customer tells you that something isn't right somewhere in your million lines of price calculations, that is a little harder to track down. This can be especially dangerous if someone changed the default behavior at some point.
Check comparing floating point numbers and this post on deniweb and this on SO.
Both techniques are not good. See this article.
Google Test is a framework for writing C++ tests on a variety of platforms.
gtest.h contains the AlmostEquals function.
// Returns true iff this number is at most kMaxUlps ULP's away from
// rhs. In particular, this function:
//
// - returns false if either number is (or both are) NAN.
// - treats really large numbers as almost equal to infinity.
// - thinks +0.0 and -0.0 are 0 DLP's apart.
bool AlmostEquals(const FloatingPoint& rhs) const {
// The IEEE standard says that any comparison operation involving
// a NAN must return false.
if (is_nan() || rhs.is_nan()) return false;
return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
<= kMaxUlps;
}
Google implementation is good, fast and platform-independent.
A small documentation is here.
To me floating point errors are essentially those which on an x86 would lead to a floating point exception (assuming the coprocessor has that interrupt enabled). A special case is the "inexact" exception i e when the result was not exactly representable in the floating point format (such as when dividing 1 by 3). Newbies not yet at home in the floating-point world will expect exact results and will consider this case an error.
As I see it there are several strategies available.
Early data checking such that bad values are identified and handled
when they enter the software. This lessens the need for testing
during the floating operations themselves which should improve
performance.
Late data checking such that bad values are identified
immediately before they are used in actual floating point operations.
Should lead to lower performance.
Debugging with floating point
exception interrupts enabled. This is probably the fastest way to
gain a deeper understanding of floating point issues during the
development process.
to name just a few.
When I wrote a proprietary database engine over twenty years ago using an 80286 with an 80287 coprocessor I chose a form of late data checking and using x87 primitive operations. Since floating point operations were relatively slow I wanted to avoid doing floating point comparisons every time I loaded a value (some of which would cause exceptions). To achieve this my floating point (double precision) values were unions with unsigned integers such that I would test the floating point values using x86 operations before the x87 operations would be called upon. This was cumbersome but the integer operations were fast and when the floating point operations came into action the floating point value in question would be ready in the cache.
A typical C sequence (floating point division of two matrices) looked something like this:
// calculate source and destination pointers
type1=npx_load(src1pointer);
if (type1!=UNKNOWN) /* x87 stack contains negative, zero or positive value */
{
type2=npx_load(src2pointer);
if (!(type2==POSITIVE_NOT_0 || type2==NEGATIVE))
{
if (type2==ZERO) npx_pop();
npx_pop(); /* remove src1 value from stack since there won't be a division */
type1=UNKNOWN;
}
else npx_divide();
}
if (type1==UNKNOWN) npx_load_0(); /* x86 stack is empty so load zero */
npx_store(dstpointer); /* store either zero (from prev statement) or quotient as result */
npx_load would load value onto the top of the x87 stack providing it was valid. Otherwise the top of the stack would be empty. npx_pop simply removes the value currently at the top of the x87. BTW "npx" is an abbreviation for "Numeric Processor eXtenstion" as it was sometimes called.
The method chosen was my way of handling floating-point issues stemming from my own frustrating experiences at trying to get the coprocessor solution to behave in a predictable manner in an application.
For sure this solution led to overhead but a pure
*dstpointer = *src1pointer / *src2pointer;
was out of the question since it didn't contain any error handling. The extra cost of this error handling was more than made up for by how the pointers to the values were prepared. Also, the 99% case (both values valid) is quite fast so if the extra handling for the other cases is slower, so what?

Need pow(-1,1.2) to be 1

I am using math.h with GCC and GSL. I was wondering how to get this to evaluate?
I was hoping that the pow function would recognize pow(-1,1.2) as ((-1)^6)^(1/5). But it doesn't.
Does anybody know of a c++ library that will recognize these? Perhaps somebody has a decomposition routine they could share.
Mathematically, pow(-1, 1.2) is simply not defined. There are no powers with fractional exponents of negative numbers, and I hope there is no library that will simply return some arbitray value for such an expression. Would you also expect things like
pow(-1, 0.5) = ((-1)^2)^(1/4) = 1
which obviously isn't desirable.
Moreover, the floating point number 1.2 isn't even exactly equal to 6/5. The closest double precision number to 1.2 is
1.1999999999999999555910790149937383830547332763671875
Given this, what result would you expect now for pow(-1, 1.2)?
If you want to raise negative numbers to powers -- especially fractional powers -- use the cpow() method. You'll need to include <complex> to use it.
It seems like you're looking for pow(abs(x), y).
Explanation: you seem to be thinking in terms of
xy = (xN)(y/N)
If we choose that N === 2, then you have
(x2)y/2 = ((x2)1/2)y
But
(x2)1/2 = |x|
Substituting gives
|x|y
This is a stretch, because the above manipulations only work for non-negative x, but you're the one who chose to use that assumption.
Sounds like you want to perform a complex power (cpow()) and then take the magnitude (abs()) of that after.
>>> abs(cmath.exp(1.2*cmath.log(-1)))
1.0
>>> abs(cmath.exp(1.2*cmath.log(-293.2834)))
913.57662451612202
pow(a,b) is often thought of, defined as, and implemented as exp(log(a)*b) where log(a) is natural logarithm of a. log(a) is not defined for a<=0 in real numbers. So you need to either write a function with special case for negative a and integer b and/or b=1/(some_integer). It's easy to special-case for integer b, but for b=1/(some_integer) it's prone to round-off problems, like Sven Marnach pointed out.
Maybe for your domain pow(-a,b) should always be -pow(a,b)? But then you'd just implement such function, so I assume the question warrants more explanation .
Like duskwuff suggested, a much more robust and "mathematical" solution is to use complex functions log and exp, but it's much more "complex" (excuse my pun) than it seems on the surface (even though there's cpow function). And it'll be much slower if you have to compute a lot of pow()s.
Now there's an important catch with complex numbers that may or may not be relevant to your problem domain: when done right, the result of pow(a,b) is not one, but often a few complex numbers, but in the cases you care about, one of them will be complex number with nearly-zero imaginary part (it'll be non-zero due to roundoff errors) which you can simply ignore and/or not compute in your code.
To demonstrate it, consider what pow(-1,.5) is. It's a number X such that X^2==-1. Guess what? There are 2 such numbers: i and -i. Generally, pow(-1, 1/N) has exactly N solutions, although you're interested in only one of them.
If the imaginary part of all results of pow(a,b) is significant, it means you are passing wrong values. For single-precision floating point values in the range you describe, 1e-6*max(abs(a),abs(b)) would be a good starting point for defining the "significant enough" threshold. The extreme "wrong values" would be pow(-1,0.5) which would return 0 + 1i (0 in real part, 1 in imaginary part). Here the imaginary part is huge relative to the input and real part, so you know you screwed up your input values.
In any reasonable single-return-result implementation of cpow() , cpow(-1,0.3333) will probably return something like -1+0.000001i and ignore two other values with significant imaginary parts. So you can just take that real value and that's your answer.
Use std::complex. Without that, the roots of unity don't make much sense. With it they make a whole lot of sense.

Integer division algorithm

I was thinking about an algorithm in division of large numbers: dividing with remainder bigint C by bigint D, where we know the representation of C in base b, and D is of form b^k-1. It's probably the easiest to show it on an example. Let's try dividing C=21979182173 by D=999.
We write the number as sets of three digits: 21 979 182 173
We take sums (modulo 999) of consecutive sets, starting from the left: 21 001 183 356
We add 1 to those sets preceding the ones where we "went over 999": 22 001 183 356
Indeed, 21979182173/999=22001183 and remainder 356.
I've calculated the complexity and, if I'm not mistaken, the algorithm should work in O(n), n being the number of digits of C in base b representation. I've also done a very crude and unoptimized version of the algorithm (only for b=10) in C++, tested it against GMP's general integer division algorithm and it really does seem to fare better than GMP. I couldn't find anything like this implemented anywhere I looked, so I had to resort to testing it against general division.
I found several articles which discuss what seem to be quite similar matters, but none of them concentrate on actual implementations, especially in bases different than 2. I suppose that's because of the way numbers are internally stored, although the mentioned algorithm seems useful for, say, b=10, even taking that into account. I also tried contacting some other people, but, again, to no avail.
Thus, my question would be: is there an article or a book or something where the aforementioned algorithm is described, possibly discussing the implementations? If not, would it make sense for me to try and implement and test such an algorithm in, say, C/C++ or is this algorithm somehow inherently bad?
Also, I'm not a programmer and while I'm reasonably OK at programming, I admittedly don't have much knowledge of computer "internals". Thus, pardon my ignorance - it's highly possible there are one or more very stupid things in this post. Sorry once again.
Thanks a lot!
Further clarification of points raised in the comments/answers:
Thanks, everyone - as I didn't want to comment on all the great answers and advice with the same thing, I'd just like to address one point a lot of you touched on.
I am fully aware that working in bases 2^n is, generally speaking, clearly the most efficient way of doing things. Pretty much all bigint libraries use 2^32 or whatever. However, what if (and, I emphasize, it would be useful only for this particular algorithm!) we implement bigints as an array of digits in base b? Of course, we require b here to be "reasonable": b=10, the most natural case, seems reasonable enough. I know it's more or less inefficient both considering memory and time, taking into account how numbers are internally stored, but I have been able to, if my (basic and possibly somehow flawed) tests are correct, produce results faster than GMP's general division, which would give sense to implementing such an algorithm.
Ninefingers notices I'd have to use in that case an expensive modulo operation. I hope not: I can see if old+new crossed, say, 999, just by looking at the number of digits of old+new+1. If it has 4 digits, we're done. Even more, since old<999 and new<=999, we know that if old+new+1 has 4 digits (it can't have more), then, (old+new)%999 equals deleting the leftmost digit of (old+new+1), which I presume we can do cheaply.
Of course, I'm not disputing obvious limitations of this algorithm nor I claim it can't be improved - it can only divide with a certain class of numbers and we have to a priori know the representation of dividend in base b. However, for b=10, for instance, the latter seems natural.
Now, say we have implemented bignums as I outlined above. Say C=(a_1a_2...a_n) in base b and D=b^k-1. The algorithm (which could be probably much more optimized) would go like this. I hope there aren't many typos.
if k>n, we're obviously done
add a zero (i.e. a_0=0) at the beginning of C (just in case we try to divide, say, 9999 with 99)
l=n%k (mod for "regular" integers - shouldn't be too expensive)
old=(a_0...a_l) (the first set of digits, possibly with less than k digits)
for (i=l+1; i < n; i=i+k) (We will have floor(n/k) or so iterations)
new=(a_i...a_(i+k-1))
new=new+old (this is bigint addition, thus O(k))
aux=new+1 (again, bigint addition - O(k) - which I'm not happy about)
if aux has more than k digits
delete first digit of aux
old=old+1 (bigint addition once again)
fill old with zeroes at the beginning so it has as much digits as it should
(a_(i-k)...a_(i-1))=old (if i=l+1, (a _ 0...a _ l)=old)
new=aux
fill new with zeroes at the beginning so it has as much digits as it should
(a_i...a_(i+k-1)=new
quot=(a_0...a_(n-k+1))
rem=new
There, thanks for discussing this with me - as I said, this does seem to me to be an interesting "special case" algorithm to try to implement, test and discuss, if nobody sees any fatal flaws in it. If it's something not widely discussed so far, even better. Please, let me know what you think. Sorry about the long post.
Also, just a few more personal comments:
#Ninefingers: I actually have some (very basic!) knowledge of how GMP works, what it does and of general bigint division algorithms, so I was able to understand much of your argument. I'm also aware GMP is highly optimized and in a way customizes itself for different platforms, so I'm certainly not trying to "beat it" in general - that seems as much fruitful as attacking a tank with a pointed stick. However, that's not the idea of this algorithm - it works in very special cases (which GMP does not appear to cover). On an unrelated note, are you sure general divisions are done in O(n)? The most I've seen done is M(n). (And that can, if I understand correctly, in practice (Schönhage–Strassen etc.) not reach O(n). Fürer's algorithm, which still doesn't reach O(n), is, if I'm correct, almost purely theoretical.)
#Avi Berger: This doesn't actually seem to be exactly the same as "casting out nines", although the idea is similar. However, the aforementioned algorithm should work all the time, if I'm not mistaken.
Your algorithm is a variation of a base 10 algorithm known as "casting out nines". Your example is using base 1000 and "casting out" 999's (one less than the base). This used to be taught in elementary school as way to do a quick check on hand calculations. I had a high school math teacher who was horrified to learn that it wasn't being taught anymore and filled us in on it.
Casting out 999's in base 1000 won't work as a general division algorithm. It will generate values that are congruent modulo 999 to the actual quotient and remainder - not the actual values. Your algorithm is a bit different and I haven't checked if it works, but it is based on effectively using base 1000 and the divisor being 1 less than the base. If you wanted to try it for dividing by 47, you would have to convert to a base 48 number system first.
Google "casting out nines" for more information.
Edit: I originally read your post a bit too quickly, and you do know of this as a working algorithm. As #Ninefingers and #Karl Bielefeldt have stated more clearly than me in their comments, what you aren't including in your performance estimate is the conversion into a base appropriate for the particular divisor at hand.
I feel the need to add to this based on my comment. This isn't an answer, but an explanation as to the background.
A bignum library uses what are called limbs - search for mp_limb_t in the gmp source, which are usually a fixed-size integer field.
When you do something like addition, one way (albeit inefficient) to approach it is to do this:
doublelimb r = limb_a + limb_b + carryfrompreviousiteration
This double-sized limb catches the overflow of limb_a + limb_b in the case that the sum is bigger than the limb size. So if the total is bigger than 2^32 if we're using uint32_t as our limb size, the overflow can be caught.
Why do we need this? Well, what you typically do is loop through all the limbs - you've done this yourself in dividing your integer up and going through each one - but we do it LSL first (so the smallest limb first) just as you'd do arithmetic by hand.
This might seem inefficient, but this is just the C way of doing things. To really break out the big guns, x86 has adc as an instruction - add with carry. What this does is an arithmetic and on your fields and sets the carry bit if the arithmetic overflows the size of the register. The next time you do add or adc, the processor factors in the carry bit too. In subtraction it's called the borrow flag.
This also applies to shift operations. As such, this feature of the processor is crucial to what makes bignums fast. So the fact is, there's electronic circuitry in the chip for doing this stuff - doing it in software is always going to be slower.
Without going into too much detail, operations are built up from this ability to add, shift, subtract etc. They're crucial. Oh and you use the full width of your processor's register per limb if you're doing it right.
Second point - conversion between bases. You cannot take a value in the middle of a number and change it's base, because you can't account for the overflow from the digit beneath it in your original base, and that number can't account for the overflow from the digit beneath... and so on. In short, every time you want to change base, you need to convert the entire bignum from the original base to your new base back again. So you have to walk the bignum (all the limbs) three times at least. Or, alternatively, detect overflows expensively in all other operations... remember, now you need to do modulo operations to work out if you overflowed, whereas before the processor was doing it for us.
I should also like to add that whilst what you've got is probably quick for this case, bear in mind that as a bignum library gmp does a fair bit of work for you, like memory management. If you're using mpz_ you're using an abstraction above what I've described here, for starters. Finally, gmp uses hand optimised assembly with unrolled loops for just about every platform you've ever heard of, plus more. There's a very good reason it ships with Mathematica, Maple et al.
Now, just for reference, some reading material.
Modern Computer Arithmetic is a Knuth-like work for arbitrary precision libraries.
Donald Knuth, Seminumerical Algorithms (The Art of Computer Programming Volume II).
William Hart's blog on implementing algorithm's for bsdnt in which he discusses various division algorithms. If you're interested in bignum libraries, this is an excellent resource. I considered myself a good programmer until I started following this sort of stuff...
To sum it up for you: division assembly instructions suck, so people generally compute inverses and multiply instead, as you do when defining division in modular arithmetic. The various techniques that exist (see MCA) are mostly O(n).
Edit: Ok, not all of the techniques are O(n). Most of the techniques called div1 (dividing by something not bigger than a limb are O(n). When you go bigger you end up with O(n^2) complexity; this is hard to avoid.
Now, could you implement bigints as an array of digits? Well yes, of course you could. However, consider the idea just under addition
/* you wouldn't do this just before add, it's just to
show you the declaration.
*/
uint32_t* x = malloc(num_limbs*sizeof(uint32_t));
uint32_t* y = malloc(num_limbs*sizeof(uint32_t));
uint32_t* a = malloc(num_limbs*sizeof(uint32_t));
uint32_t m;
for ( i = 0; i < num_limbs; i++ )
{
m = 0;
uint64_t t = x[i] + y[i] + m;
/* now we need to work out if that overflowed at all */
if ( (t/somebase) >= 1 ) /* expensive division */
{
m = t % somebase; /* get the overflow */
}
}
/* frees somewhere */
That's a rough sketch of what you're looking at for addition via your scheme. So you have to run the conversion between bases. So you're going to need a conversion to your representation for the base, then back when you're done, because this form is just really slow everywhere else. We're not talking about the difference between O(n) and O(n^2) here, but we are talking about an expensive division instruction per limb or an expensive conversion every time you want to divide. See this.
Next up, how do you expand your division for general case division? By that, I mean when you want to divide those two numbers x and y from the above code. You can't, is the answer, without resorting to bignum-based facilities, which are expensive. See Knuth. Taking modulo a number greater than your size doesn't work.
Let me explain. Try 21979182173 mod 1099. Let's assume here for simplicity's sake that the biggest size field we can have is three digits. This is a contrived example, but the biggest field size I know if uses 128 bits using gcc extensions. Anyway, the point is, you:
21 979 182 173
Split your number into limbs. Then you take modulo and sum:
21 1000 1182 1355
It doesn't work. This is where Avi is correct, because this is a form of casting out nines, or an adaption thereof, but it doesn't work here because our fields have overflowed for a start - you're using the modulo to ensure each field stays within its limb/field size.
So what's the solution? Split your number up into a series of appropriately sized bignums? And start using bignum functions to calculate everything you need to? This is going to be much slower than any existing way of manipulating the fields directly.
Now perhaps you're only proposing this case for dividing by a limb, not a bignum, in which case it can work, but hensel division and precomputed inverses etc do to without the conversion requirement. I have no idea if this algorithm would be faster than say hensel division; it would be an interesting comparison; the problem comes with a common representation across the bignum library. The representation chosen in existing bignum libraries is for the reasons I've expanded on - it makes sense at the assembly level, where it was first done.
As a side note; you don't have to use uint32_t to represent your limbs. You use a size ideally the size of the registers of the system (say uint64_t) so that you can take advantage of assembly-optimised versions. So on a 64-bit system adc rax, rbx only sets the overflow (CF) if the result overspills 2^64 bits.
tl;dr version: the problem isn't your algorithm or idea; it's the problem of converting between bases, since the representation you need for your algorithm isn't the most efficient way to do it in add/sub/mul etc. To paraphrase knuth: This shows you the difference between mathematical elegance and computational efficiency.
If you need to frequently divide by the same divisor, using it (or a power of it) as your base makes division as cheap as bit-shifting is for base 2 binary integers.
You could use base 999 if you want; there's nothing special about using a power-of-10 base except that it makes conversion to decimal integer very cheap. (You can work one limb at a time instead of having to do a full division over the whole integer. It's like the difference between converting a binary integer to decimal vs. turning every 4 bits into a hex digit. Binary -> hex can start with the most significant bits, but converting to non-power-of-2 bases has to be LSB-first using division.)
For example, to compute the first 1000 decimal digits of Fibonacci(109) for a code-golf question with a performance requirement, my 105 bytes of x86 machine code answer used the same algorithm as this Python answer: the usual a+=b; b+=a Fibonacci iteration, but divide by (a power of) 10 every time a gets too large.
Fibonacci grows faster than carry propagates, so discarding the low decimal digits occasionally doesn't change the high digits long-term. (You keep a few extra beyond the precision you want).
Dividing by a power of 2 doesn't work, unless you keep track of how many powers of 2 you've discarded, because the eventual binary -> decimal conversion at the end would depend on that.
So for this algorithm, you have to do extended-precision addition, and division by 10 (or whatever power of 10 you want).
I stored base-109 limbs in 32-bit integer elements. Dividing by 109 is trivially cheap: just a pointer increment to skip the low limb. Instead of actually doing a memmove, I just offset the pointer used by the next add iteration.
I think division by a power of 10 other than 10^9 would be somewhat cheap, but would require an actual division on each limb, and propagating the remainder to the next limb.
Extended-precision addition is somewhat more expensive this way than with binary limbs, because I have to generate the carry-out manually with a compare: sum[i] = a[i] + b[i]; carry = sum < a; (unsigned comparison). And also manually wrap to 10^9 based on that compare, with a conditional-move instruction. But I was able to use that carry-out as an input to adc (x86 add-with-carry instruction).
You don't need a full modulo to handle the wrapping on addition, because you know you've wrapped at most once.
This wastes a just over 2 bits of each 32-bit limb: 10^9 instead of 2^32 = 4.29... * 10^9. Storing base-10 digits one per byte would be significantly less space efficient, and very much worse for performance, because an 8-bit binary addition costs the same as a 64-bit binary addition on a modern 64-bit CPU.
I was aiming for code-size: for pure performance I would have used 64-bit limbs holding base-10^19 "digits". (2^64 = 1.84... * 10^19, so this wastes less than 1 bit per 64.) This lets you get twice as much work done with each hardware add instruction. Hmm, actually this might be a problem: the sum of two limbs might wrap the 64-bit integer, so just checking for > 10^19 isn't sufficient anymore. You could work in base 5*10^18, or in base 10^18, or do more complicated carry-out detection that checks for binary carry as well as manual carry.
Storing packed BCD with one digit per 4 bit nibble would be even worse for performance, because there isn't hardware support for blocking carry from one nibble to the next within a byte.
Overall, my version ran about 10x faster than the Python extended-precision version on the same hardware (but it had room for significant optimization for speed, by dividing less often). (70 seconds or 80 seconds vs. 12 minutes)
Still, I think for this particular implementation of that algorithm (where I only needed addition and division, and division happened after every few additions), the choice of base-10^9 limbs was very good. There are much more efficient algorithms for the Nth Fibonacci number that don't need to do 1 billion extended-precision additions.