Adding double precision values yield different results between separate programs in C++ - c++

I have a question about floating point addition. I understand how compilers and processor architecture can lead to floating point arithmetic values. I have seen many questions on here similar to my question, but they all have some variation such as different compiler, different code, different machine, etc. However, I'm am running into an issue when adding doubles in the exact same way in two different programs calling the identical function with the same arguments and it is leading to different results. Both programs are compiled on the same machine with the same compiler/tags. The code looks similar to this:
void function(double tx, double ty, double tz){
double answer;
double x,y;
x = y = answer = 0;
x = tx - ty;
y = ty - tz;
answer = (tx + ty + tz) * (x*y)
}
The values of:
tx,ty,tz
are on the order of [10e-15,10e-30]. Obviously this is a very simplified version of the functions I am actually using, but, is it possible for two programs, running identical floating point arithmetic (not just the same function, the exact same code), on the same machine, with the same compiler/tags, to get different results for the function?

Some possibilities:
The source code of function is identical in the two programs, but it appears with different context, resulting in the compiler compiling it in different ways. For example, the compiler might inline it in one place and not another, and inlining might lead to some expression reduction due to combination with other expressions at the point of the inlined call, and hence different arithmetic is performed. (To test this, move function to a separate source file, compile it separately, and link it with a linker without cross-module optimization. Also, try compiling with optimization disabled.)
You think there are identical inputs to function because they appear the same when printed or viewed in the debugger, but they are actually different due to small differences in the low digits that are not printed. (To test this, print the full values using the hexadecimal floating-point format. To do that, insert std::hexfloat into the output stream, followed by floating-point values. Alternately, use a C printf using the %a format.)
Something else in the programs changes floating-point state, such as rounding mode.
You think you have used an identical compiler, identical sources, identical compilation switches, and so on, but actually have not.
David Schwartz notes that floating-point values can change when they are stored, as occurs when they are simply spilled to the stack. This occurs because some processors and C++ implementations may store floating-point values with extended precision in registers but less precision in memory. Technically, this fits into either 1. (different computation nominally inside function) or 2. (different values passed to function), but it is insidious enough to warrant separate mention.

Well the answer is quite easy. If your computer behaves deterministic it will always return the same results for the same input. That's the basic idea behind programming languages so far. (Unless we are talking about quantum computers, of course.)
So the question reduces to whether you really have the same input.
Although the above function looks strictly functional, there are often hidden inputs that are not that obvious. E.g. you might adjust the rounding mode of your FPU before calling the function. Or you might setup different exception behavior. In both cases the function may behave differently for certain inputs.
So even if your computer isn't non-deterministic (i.e. buggy) the above function might return different results. Although it is not that likely.

Related

Enforcing order of execution

I would like to ensure that the calculations requested are executed exactly in the order I specify, without any alterations from either the compiler or CPU (including the linker, assembler, and anything else you can think of).
Operator left-to-right associativity is assumed in the C language
I am working in C (possibly also interested in C++ solutions), which states that for operations of equal precedence there is an assumed left-to-right operator associativity, and hence
a = b + c - d + e + f - g ...;
is equivalent to
a = (...(((((b + c) - d) + e) + f) - g) ...);
A small example
However, consider the following example:
double a, b = -2, c = -3;
a = 1 + 2 - 2 + 3 + 4;
a += 2*b;
a += c;
So many opportunities for optimisation
For many compilers and pre-processors they may be clever enough to recognise the "+ 2 - 2" is redundant and optimise this away. Similarly they could recognise that the "+= 2*b" followed by the "+= c" can be written using a single FMA. Even if they don't optimise in an FMA, they may switch the order of these operations etc. Furthermore, if the compiler doesn't do any of these optimisations, the CPU may well decide to do some out of order execution, and decide it can do the "+= c" before the "+= 2*b", etc.
As floating-point arithmetic is non-associative, each type of optimisation may result in a different end result, which may be noticeable if the following is inlined somewhere.
Why worry about floating point associativity?
For most of my code I would like as much optimisation as I can have and don't care about floating-point associativity or bit-wise reproduciblilty, but occasionally there is a small snippet (similar to the above example) which I would like to be untampered with and totally respected. This is because I am working with a mathematical method which exactly requires a reproducible result.
What can I do to resolve this?
A few ideas which have come to mind:
Disable compiler optimisations and out of order execution
I don't want this, as I want the other 99% of my code to be heavily optimised. (This seems to be cutting off my nose to spite my face). I also most likely won't have permission to change my hardware settings.
Use a pragma
Write some assembly
The code snippets are small enough that this might be reasonable, although I'm not very confident in this, especially if (when) it comes to debugging.
Put this in a separate file, compile separately as un-optimised as possible, and then link using a function call
Volatile variables
To my mind these are just for ensuring that memory access is respected and un-optimised, but perhaps they might prove useful.
Access everything through judicious use of pointers
Perhaps, but this seems like a disaster in readability, performance, and bugs waiting to happen.
If anyone can think of any feasibly solutions (either from any of the ideas I've suggested or otherwise) that would be ideal. The "pragma" option or "function call" to my mind seem like the best approaches.
The ultimate goal
To have something that marks off a small chuck of simple and largely vanilla C code as protected and untouchable to any (realistically most) optimisations, while allowing for the rest of the code to be heavily optimised, covering optimisations from both the CPU and compiler.
This is not a complete answer, but it is informative, partially answers, and is too long for a comment.
Clarifying the Goal
The question actually seeks reproducibility of floating-point results, not order of execution. Also, order of execution is irrelevant; we do not care if, in (a+b)+(c+d), a+b or c+d is executed first. We care that the result of a+b is added to the result of c+d, without any reassociation or other rewriting of arithmetic unless the result is known to be the same.
Reproducibility of floating-point arithmetic is in general an unsolved technological problem. (There is no theoretical barrier; we have reproducible elementary operations. Reproducibility is a matter of what hardware and software vendors have provided and how hard it is to express the computations we want performed.)
Do you want reproducibility on one platform (e.g., always using the same version of the same math library)? Does your code use any math library routines like sin or log? Do you want reproducibility across different platforms? With multithreading? Across changes of compiler version?
Addressing Some Specific Issues
The samples shown in the question can largely be handled by writing each individual floating-point operation in its own statement, as by replacing:
a = 1 + 2 - 2 + 3 + 4;
a += 2*b;
a += c;
with:
t0 = 1 + 2;
t0 = t0 - 2;
t0 = t0 + 3;
t0 = t0 + 4;
t1 = 2*b;
t0 += t1;
a += c;
The basis for this is that both C and C++ permit an implementation to use “excess precision” when evaluating an expression but require that precision to be “discarded” when an assignment or cast is performed. Limiting each assignment expression to one operation or executing a cast after each operation effectively isolates the operations.
In many cases, a compiler will then generate code using instructions of the nominal type, instead of instructions using a type with excess precision. In particular, this should avoid a fused multiply-add (FMA) being substituted for a multiplication followed by an addition. (An FMA has effectively infinite precision in the product before it is added to the addend, thus falling under the “excess precision is permitted” rule.) There are caveats, however. An implementation might first evaluate an operation with excess precision and then round it to the nominal precision. In general, this can cause a different result than doing a single operation in the nominal precision. For the elementary operations of addition, subtract, multiplication, division, and even square root, this does not happen if the excess precision is sufficient greater than the nominal precision. (There are proofs that a result with sufficient excess precision is always close enough to the infinitely precise result that the rounding to nominal precision gets the same result.) This is true for the case where the nominal precision is the IEEE-754 basic 32-bit binary floating-point format, and the excess precision is the 64-bit format. However, it is not true where the nominal precision is the 64-bit format and the excess precision is Intel’s 80-bit format.
So, whether this workaround works depends on the platform.
Other Issues
Aside from the use of excess precision and features like FMA or the optimizer rewriting expressions, there are other things that affect reproducibility, such as non-standard treatment of subnormals (notably replacing them with zeroes), variations between math library routines. (sin, log, and similar functions return different results on different platforms. Nobody has fully implemented correctly rounded math library routines with known bounded performance.)
These are discussed in other Stack Overflow questions about floating-point reproducibility, as well as papers, specifications, and standards documents.
Irrelevant Issues
The order in which a processor executes floating-point operations is irrelevant. Processor reordering of calculations obeys rigid semantics; the results are identical regardless of the chronological order of execution. (Processor timing can affect results if, for example, a task is partitioned into subtasks, such as assigning multiple threads or processes to process different parts of the arrays. Among other issues, their results could arrive in different orders, and the process receiving their results might then add or otherwise combine their results in different orders.)
Using pointers will not fix anything. As far as C or C++ is concerned, *p where p is a pointer to double is the same as a where a is a double. One the objects has a name (a) and one of them does not, but they are like roses: They smell the same. (There are issues where, if you have some other pointer q, the compiler might not know whether *q and *p refer to the same thing. But that also holds true for *q and a.)
Using volatile qualifiers will not aid in reproducibility regarding the excess precision or expression rewriting issue. That is because only an object (not a value) is volatile, which means it has no effect until you write it or read it. But, if you write it, you are using an assignment expression1, so the rule about discarding excess precision already applies. When reading the object, you would force the compiler to retrieve the actual value from memory, but this value will not be any different than the non-volatile object has after assignment, so nothing is accomplished.
Footnote
1 I would have to check on other things that modify an object, such as ++, but those are likely not significant for this discussion.
Write this critical chunk of code in assembly language.
The situation you're in is unusual. Most of the time people want the compiler to do optimizations, so compiler developers don't spend much development effort on means to avoid them. Even with the knobs you do get (pragmas, separate compilation, indirections, ...) you can never be sure something won't be optimized. Some of the undesirable optimizations you mention (constant folding, for instance) cannot be turned off by any means in modern compilers.
If you use assembly language you can be sure you're getting exactly what you wrote. If you do it any other way you won't have that level of confidence.
"clever enough to recognise the + 2 - 2 is redundant and optimise this
away"
No ! All decent compilers will apply constant propagation and figure out that a is constant and optimize all your statement away, into something equivalent to a = 1;. Here the example with assembly.
Now if you make a volatile, the compiler has to assume that any change of a could have an impact outside the C++ programme. Constant propagation will still be performed to optimise each of these calculations, but the intermediary assignments are guaranteed to happen. Here the example with assembly.
If you don't want constant propagation to happen, you need to deactivate optimizations. In this case, the best would be to keep your code separate so to compile the rest with all optilizations on.
However this is not ideal. The optimizer could outperform you and with this approach, you'll loose global optimisation across the function boundaries.
Recommendation/quote of the day:
Don't diddle code; Find better algorithms
- B.W.Kernighan & P.J.Plauger

What is the standard way to maintain accuracy when dealing with incredibly precise floating point calculations in C++?

I'm in the process of converting a program to C++ from Scilab (similar to Matlab) and I'm required to maintain the same level of precision that is kept by the previous code.
Note: Although maintaining the same level of precision would be ideal. It's acceptable if there is some error with the finished result. The problem I'm facing (as I'll show below) is due to looping, so the calculation error compounds rather quickly. But if the final result is only a thousandth or so off (e.g. 1/1000 vs 1/1001) it won't be a problem.
I've briefly looked into a number of different ways to do this including:
GMP (A Multiple Precision
Arithmetic Library)
Using integers instead of floats (see example below)
Int vs Float Example: Instead of using the float 12.45, store it as an integer being 124,500. Then simply convert everything back when appropriate to do so. Note: I'm not exactly sure how this will work with the code I'm working with (more detail below).
An example of how my program is producing incorrect results:
for (int i = 0; i <= 1000; i++)
{
for (int j = 0; j <= 10000; j++)
{
// This calculation will be computed with less precision than in Scilab
float1 = (1.0 / 100000.0);
// The above error of float2 will become significant by the end of the loop
float2 = (float1 + float2);
}
}
My question is:
Is there a generally accepted way to go about retaining accuracy in floating point arithmetic OR will one of the above methods suffice?
Maintaining precision when porting code like this is very difficult to do. Not because the languages have implicitly different perspectives on what a float is, but because of what the different algorithms or assumptions of accuracy limits are. For example, when performing numerical integration in Scilab, it may use a Gaussian quadrature method. Whereas you might try using a trapezoidal method. The two may both be working on identical IEEE754 single-precision floating point numbers, but you will get different answers due to the convergence characteristics of the two algorithms. So how do you get around this?
Well, you can go through the Scilab source code and look at all of the algorithms it uses for each thing you need. You can then replicate these algorithms taking care of any pre- or post-conditioning of the data that Scilab implicitly does (if any at all). That's a lot of work. And, frankly, probably not the best way to spend your time. Rather, I would look into using the Interfacing with Other Languages section from the developer's documentation to see how you can call the Scilab functions directly from your C, C++, Java, or Fortran code.
Of course, with the second option, you have to consider how you are going to distribute your code (if you need to).Scilab has a GPL-compatible license, so you can just bundle it with your code. However, it is quite big (~180MB) and you may want to just bundle the pieces you need (e.g., you don't need the whole interpreter system). This is more work in a different way, but guarantees numerical-compatibility with your current Scilab solutions.
Is there a generally accepted way to go about retaining accuracy in floating
point arithmetic
"Generally accepted" is too broad, so no.
will one of the above methods suffice?
Yes. Particularly gmp seems to be a standard choice. I would also have a look at the Boost Multiprecision library.
A hand-coded integer approach can work as well, but is surely not the method of choice: it requires much more coding, and more severe a means to store and process aritrarily precise integers.
If your compiler supports it use BCD (Binary-coded decimal)
Sam
Well, another alternative if you use GCC compilers is to go with quadmath/__float128 types.

How to store doubles in memory

Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a

strange results with /fp:fast

We have some code that looks like this:
inline int calc_something(double x) {
if (x > 0.0) {
// do something
return 1;
} else {
// do something else
return 0;
}
}
Unfortunately, when using the flag /fp:fast, we get calc_something(0)==1 so we are clearly taking the wrong code path. This only happens when we use the method at multiple points in our code with different parameters, so I think there is some fishy optimization going on here from the compiler (Microsoft Visual Studio 2008, SP1).
Also, the above problem goes away when we change the interface to
inline int calc_something(const double& x) {
But I have no idea why this fixes the strange behaviour. Can anyone explane this behaviour? If I cannot understand what's going on we will have to remove the /fp:fastswitch, but this would make our application quite a bit slower.
I'm not familiar enough with FPUs to comment with any certainty, but my guess would be that the compiler is letting an existing value that it thinks should be equal to x sit in on that comparison. Maybe you go y = x + 20.; y = y - 20; y is already on the FP stack, so rather than load x the compiler just compares against y. But due to rounding errors, y isn't quite 0.0 like it is supposed to be, and you get the odd results you see.
For a better explanation: Why is cos(x) != cos(y) even though x == y? from the C++FAQ lite. This is part of what I'm trying to get across, I just couldn't remember where exactly I had read it until just now.
Changing to a const reference fixes this because the compiler is worried about aliasing. It forces a load from x because it can't assume its value hasn't changed at some point after creating y, and since x is actually exactly 0.0 [which is representable in every floating point format I'm familiar with] the rounding errors vanish.
I'm pretty sure MS provides a pragma that allows you to set the FP flags on a per-function basis. Or you could move this routine to a separate file and give that file custom flags. Either way, it could prevent your whole program from suffering just to keep that one routine happy.
what are the results of calc_something(0L), or calc_something(0.0f) ? It could be linked to the size of the types before casting. An integer is 4 bytes, a double is 8.
Have you tried looking at the asembled code, to see how the aforementioned conversion is done ?
Googling for 'fp fast', I found this post [social.msdn.microsoft.com]
As I've said in other question, compilers suck at generating floating point code. The article Dennis links to explains the problems well. Here's another: An MSDN article.
If the performance of the code is important, you can easily1 out-perform the compiler by writing your own assembler code. If your algoritm is vectorisable then you can make use of SIMD too (with a slight loss of precision though).
Assuming you understand the way the FPU works.
inline int calc_something(double x) will (probably) use an 80 bits register. inline int calc_something(const double& x) would store the double in memory, where it takes 64 bits. That at least explains the difference between the two.
However, I find your test quite fishy to begin with. The results of calc_something are extremely sensitive to rounding of its input. Your FP algorithms should be robust to rounding. calc_something(1.0-(1.0/3.0)*3) should be the same as calc_something(0.0).
I think the behavior is correct.
You never compare a floating point number up to less than the holding type's precision.
Something that comes from zero may be equal, greater or less than another zero.
See http://floating-point-gui.de/

Why isn't `int pow(int base, int exponent)` in the standard C++ libraries?

I feel like I must just be unable to find it. Is there any reason that the C++ pow function does not implement the "power" function for anything except floats and doubles?
I know the implementation is trivial, I just feel like I'm doing work that should be in a standard library. A robust power function (i.e. handles overflow in some consistent, explicit way) is not fun to write.
As of C++11, special cases were added to the suite of power functions (and others). C++11 [c.math] /11 states, after listing all the float/double/long double overloads (my emphasis, and paraphrased):
Moreover, there shall be additional overloads sufficient to ensure that, if any argument corresponding to a double parameter has type double or an integer type, then all arguments corresponding to double parameters are effectively cast to double.
So, basically, integer parameters will be upgraded to doubles to perform the operation.
Prior to C++11 (which was when your question was asked), no integer overloads existed.
Since I was neither closely associated with the creators of C nor C++ in the days of their creation (though I am rather old), nor part of the ANSI/ISO committees that created the standards, this is necessarily opinion on my part. I'd like to think it's informed opinion but, as my wife will tell you (frequently and without much encouragement needed), I've been wrong before :-)
Supposition, for what it's worth, follows.
I suspect that the reason the original pre-ANSI C didn't have this feature is because it was totally unnecessary. First, there was already a perfectly good way of doing integer powers (with doubles and then simply converting back to an integer, checking for integer overflow and underflow before converting).
Second, another thing you have to remember is that the original intent of C was as a systems programming language, and it's questionable whether floating point is desirable in that arena at all.
Since one of its initial use cases was to code up UNIX, the floating point would have been next to useless. BCPL, on which C was based, also had no use for powers (it didn't have floating point at all, from memory).
As an aside, an integral power operator would probably have been a binary operator rather than a library call. You don't add two integers with x = add (y, z) but with x = y + z - part of the language proper rather than the library.
Third, since the implementation of integral power is relatively trivial, it's almost certain that the developers of the language would better use their time providing more useful stuff (see below comments on opportunity cost).
That's also relevant for the original C++. Since the original implementation was effectively just a translator which produced C code, it carried over many of the attributes of C. Its original intent was C-with-classes, not C-with-classes-plus-a-little-bit-of-extra-math-stuff.
As to why it was never added to the standards before C++11, you have to remember that the standards-setting bodies have specific guidelines to follow. For example, ANSI C was specifically tasked to codify existing practice, not to create a new language. Otherwise, they could have gone crazy and given us Ada :-)
Later iterations of that standard also have specific guidelines and can be found in the rationale documents (rationale as to why the committee made certain decisions, not rationale for the language itself).
For example the C99 rationale document specifically carries forward two of the C89 guiding principles which limit what can be added:
Keep the language small and simple.
Provide only one way to do an operation.
Guidelines (not necessarily those specific ones) are laid down for the individual working groups and hence limit the C++ committees (and all other ISO groups) as well.
In addition, the standards-setting bodies realise that there is an opportunity cost (an economic term meaning what you have to forego for a decision made) to every decision they make. For example, the opportunity cost of buying that $10,000 uber-gaming machine is cordial relations (or probably all relations) with your other half for about six months.
Eric Gunnerson explains this well with his -100 points explanation as to why things aren't always added to Microsoft products- basically a feature starts 100 points in the hole so it has to add quite a bit of value to be even considered.
In other words, would you rather have a integral power operator (which, honestly, any half-decent coder could whip up in ten minutes) or multi-threading added to the standard? For myself, I'd prefer to have the latter and not have to muck about with the differing implementations under UNIX and Windows.
I would like to also see thousands and thousands of collection the standard library (hashes, btrees, red-black trees, dictionary, arbitrary maps and so forth) as well but, as the rationale states:
A standard is a treaty between implementer and programmer.
And the number of implementers on the standards bodies far outweigh the number of programmers (or at least those programmers that don't understand opportunity cost). If all that stuff was added, the next standard C++ would be C++215x and would probably be fully implemented by compiler developers three hundred years after that.
Anyway, that's my (rather voluminous) thoughts on the matter. If only votes were handed out based on quantity rather than quality, I'd soon blow everyone else out of the water. Thanks for listening :-)
For any fixed-width integral type, nearly all of the possible input pairs overflow the type, anyway. What's the use of standardizing a function that doesn't give a useful result for vast majority of its possible inputs?
You pretty much need to have an big integer type in order to make the function useful, and most big integer libraries provide the function.
Edit: In a comment on the question, static_rtti writes "Most inputs cause it to overflow? The same is true for exp and double pow, I don't see anyone complaining." This is incorrect.
Let's leave aside exp, because that's beside the point (though it would actually make my case stronger), and focus on double pow(double x, double y). For what portion of (x,y) pairs does this function do something useful (i.e., not simply overflow or underflow)?
I'm actually going to focus only on a small portion of the input pairs for which pow makes sense, because that will be sufficient to prove my point: if x is positive and |y| <= 1, then pow does not overflow or underflow. This comprises nearly one-quarter of all floating-point pairs (exactly half of non-NaN floating-point numbers are positive, and just less than half of non-NaN floating-point numbers have magnitude less than 1). Obviously, there are a lot of other input pairs for which pow produces useful results, but we've ascertained that it's at least one-quarter of all inputs.
Now let's look at a fixed-width (i.e. non-bignum) integer power function. For what portion inputs does it not simply overflow? To maximize the number of meaningful input pairs, the base should be signed and the exponent unsigned. Suppose that the base and exponent are both n bits wide. We can easily get a bound on the portion of inputs that are meaningful:
If the exponent 0 or 1, then any base is meaningful.
If the exponent is 2 or greater, then no base larger than 2^(n/2) produces a meaningful result.
Thus, of the 2^(2n) input pairs, less than 2^(n+1) + 2^(3n/2) produce meaningful results. If we look at what is likely the most common usage, 32-bit integers, this means that something on the order of 1/1000th of one percent of input pairs do not simply overflow.
Because there's no way to represent all integer powers in an int anyways:
>>> print 2**-4
0.0625
That's actually an interesting question. One argument I haven't found in the discussion is the simple lack of obvious return values for the arguments. Let's count the ways the hypthetical int pow_int(int, int) function could fail.
Overflow
Result undefined pow_int(0,0)
Result can't be represented pow_int(2,-1)
The function has at least 2 failure modes. Integers can't represent these values, the behaviour of the function in these cases would need to be defined by the standard - and programmers would need to be aware of how exactly the function handles these cases.
Overall leaving the function out seems like the only sensible option. The programmer can use the floating point version with all the error reporting available instead.
Short answer:
A specialisation of pow(x, n) to where n is a natural number is often useful for time performance. But the standard library's generic pow() still works pretty (surprisingly!) well for this purpose and it is absolutely critical to include as little as possible in the standard C library so it can be made as portable and as easy to implement as possible. On the other hand, that doesn't stop it at all from being in the C++ standard library or the STL, which I'm pretty sure nobody is planning on using in some kind of embedded platform.
Now, for the long answer.
pow(x, n) can be made much faster in many cases by specialising n to a natural number. I have had to use my own implementation of this function for almost every program I write (but I write a lot of mathematical programs in C). The specialised operation can be done in O(log(n)) time, but when n is small, a simpler linear version can be faster. Here are implementations of both:
// Computes x^n, where n is a natural number.
double pown(double x, unsigned n)
{
double y = 1;
// n = 2*d + r. x^n = (x^2)^d * x^r.
unsigned d = n >> 1;
unsigned r = n & 1;
double x_2_d = d == 0? 1 : pown(x*x, d);
double x_r = r == 0? 1 : x;
return x_2_d*x_r;
}
// The linear implementation.
double pown_l(double x, unsigned n)
{
double y = 1;
for (unsigned i = 0; i < n; i++)
y *= x;
return y;
}
(I left x and the return value as doubles because the result of pow(double x, unsigned n) will fit in a double about as often as pow(double, double) will.)
(Yes, pown is recursive, but breaking the stack is absolutely impossible since the maximum stack size will roughly equal log_2(n) and n is an integer. If n is a 64-bit integer, that gives you a maximum stack size of about 64. No hardware has such extreme memory limitations, except for some dodgy PICs with hardware stacks that only go 3 to 8 function calls deep.)
As for performance, you'll be surprised by what a garden variety pow(double, double) is capable of. I tested a hundred million iterations on my 5-year-old IBM Thinkpad with x equal to the iteration number and n equal to 10. In this scenario, pown_l won. glibc pow() took 12.0 user seconds, pown took 7.4 user seconds, and pown_l took only 6.5 user seconds. So that's not too surprising. We were more or less expecting this.
Then, I let x be constant (I set it to 2.5), and I looped n from 0 to 19 a hundred million times. This time, quite unexpectedly, glibc pow won, and by a landslide! It took only 2.0 user seconds. My pown took 9.6 seconds, and pown_l took 12.2 seconds. What happened here? I did another test to find out.
I did the same thing as above only with x equal to a million. This time, pown won at 9.6s. pown_l took 12.2s and glibc pow took 16.3s. Now, it's clear! glibc pow performs better than the three when x is low, but worst when x is high. When x is high, pown_l performs best when n is low, and pown performs best when x is high.
So here are three different algorithms, each capable of performing better than the others under the right circumstances. So, ultimately, which to use most likely depends on how you're planning on using pow, but using the right version is worth it, and having all of the versions is nice. In fact, you could even automate the choice of algorithm with a function like this:
double pown_auto(double x, unsigned n, double x_expected, unsigned n_expected) {
if (x_expected < x_threshold)
return pow(x, n);
if (n_expected < n_threshold)
return pown_l(x, n);
return pown(x, n);
}
As long as x_expected and n_expected are constants decided at compile time, along with possibly some other caveats, an optimising compiler worth its salt will automatically remove the entire pown_auto function call and replace it with the appropriate choice of the three algorithms. (Now, if you are actually going to attempt to use this, you'll probably have to toy with it a little, because I didn't exactly try compiling what I'd written above. ;))
On the other hand, glibc pow does work and glibc is big enough already. The C standard is supposed to be portable, including to various embedded devices (in fact embedded developers everywhere generally agree that glibc is already too big for them), and it can't be portable if for every simple math function it needs to include every alternative algorithm that might be of use. So, that's why it isn't in the C standard.
footnote: In the time performance testing, I gave my functions relatively generous optimisation flags (-s -O2) that are likely to be comparable to, if not worse than, what was likely used to compile glibc on my system (archlinux), so the results are probably fair. For a more rigorous test, I'd have to compile glibc myself and I reeeally don't feel like doing that. I used to use Gentoo, so I remember how long it takes, even when the task is automated. The results are conclusive (or rather inconclusive) enough for me. You're of course welcome to do this yourself.
Bonus round: A specialisation of pow(x, n) to all integers is instrumental if an exact integer output is required, which does happen. Consider allocating memory for an N-dimensional array with p^N elements. Getting p^N off even by one will result in a possibly randomly occurring segfault.
One reason for C++ to not have additional overloads is to be compatible with C.
C++98 has functions like double pow(double, int), but these have been removed in C++11 with the argument that C99 didn't include them.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3286.html#550
Getting a slightly more accurate result also means getting a slightly different result.
The World is constantly evolving and so are the programming languages. The fourth part of the C decimal TR¹ adds some more functions to <math.h>. Two families of these functions may be of interest for this question:
The pown functions, that takes a floating point number and an intmax_t exponent.
The powr functions, that takes two floating points numbers (x and y) and compute x to the power y with the formula exp(y*log(x)).
It seems that the standard guys eventually deemed these features useful enough to be integrated in the standard library. However, the rational is that these functions are recommended by the ISO/IEC/IEEE 60559:2011 standard for binary and decimal floating point numbers. I can't say for sure what "standard" was followed at the time of C89, but the future evolutions of <math.h> will probably be heavily influenced by the future evolutions of the ISO/IEC/IEEE 60559 standard.
Note that the fourth part of the decimal TR won't be included in C2x (the next major C revision), and will probably be included later as an optional feature. There hasn't been any intent I know of to include this part of the TR in a future C++ revision.
¹ You can find some work-in-progress documentation here.
Here's a really simple O(log(n)) implementation of pow() that works for any numeric types, including integers:
template<typename T>
static constexpr inline T pown(T x, unsigned p) {
T result = 1;
while (p) {
if (p & 0x1) {
result *= x;
}
x *= x;
p >>= 1;
}
return result;
}
It's better than enigmaticPhysicist's O(log(n)) implementation because it doesn't use recursion.
It's also almost always faster than his linear implementation (as long as p > ~3) because:
it doesn't require any extra memory
it only does ~1.5x more operations per loop
it only does ~1.25x more memory updates per loop
Perhaps because the processor's ALU didn't implement such a function for integers, but there is such an FPU instruction (as Stephen points out, it's actually a pair). So it was actually faster to cast to double, call pow with doubles, then test for overflow and cast back, than to implement it using integer arithmetic.
(for one thing, logarithms reduce powers to multiplication, but logarithms of integers lose a lot of accuracy for most inputs)
Stephen is right that on modern processors this is no longer true, but the C standard when the math functions were selected (C++ just used the C functions) is now what, 20 years old?
As a matter of fact, it does.
Since C++11 there is a templated implementation of pow(int, int) --- and even more general cases, see (7) in
http://en.cppreference.com/w/cpp/numeric/math/pow
EDIT: purists may argue this is not correct, as there is actually "promoted" typing used. One way or another, one gets a correct int result, or an error, on int parameters.
A very simple reason:
5^-2 = 1/25
Everything in the STL library is based on the most accurate, robust stuff imaginable. Sure, the int would return to a zero (from 1/25) but this would be an inaccurate answer.
I agree, it's weird in some cases.