c++ - deterministic alternative to std::uniform_XXX_distribution with mt19937?

c++ - deterministic alternative to std::uniform_XXX_distribution with mt19937? - c++

I need a way to get deterministic sequences of ints and doubles.
template <class U>
constexpr auto get_random_value (std::mt19937 &gen, U min_value, U max_value)->U
{
if constexpr ( std::is_same_v <U, double> or std::is_same_v <U, float> ){
std::uniform_real_distribution <U> distrib( min_value, max_value );
return distrib( gen );
}
else if constexpr ( std::is_same_v <U, u32> or std::is_same_v <U, i32> ){
std::uniform_int_distribution distrib( min_value, max_value );
return distrib( gen );
}
else {
throw std::runtime_error( "error value type" );
}
}
My issue is that one day to another, the same seeded value will lead to different results.
The distribution is to blame because it goes a long way to avoid the pitfall of the modulo.
But I need a precise way to always be certain that a sequence will always be the same starting from a given seed. And I need an unbiased partition (so % and rand() are out).
What kind of implementation will guarantee this?

The distributions in the C++ standard are not portable ("seed-stable"), in the sense that the result can change between different implementations (e.g. Microsoft STL vs gcc libstdc++ vs clang libc++) or even different versions (e.g. Microsoft changed their implementation before). The standard simply does not prescribe a specific algorithm, with the intention to allow implementations to select the one with the best performance for each platform.
So far, there is only a proposal (D2059R0) to do something about this.
Note, however, that the generators are actually portable.
I have yet to see a library that guarantees portability.
However, in practice boost.random is known to produce reproducible result across platforms (see e.g. here or here or here). Also, Google's abseil library explicitly states that they do not provide a stability guarantee, but seems to produce the same result on different platforms nevertheless.
Here is a live example on godbolt where you can see that to some extent (well, at least Linux vs Windows for a tiny selection of parameters).
The major point is not to update the libraries without checking for a breaking change. Also compare e.g. this blog post.
Or you could also implement a specific algorithm yourself (see e.g. here) or simply copy the code from one the libraries to your code base and thereby effectively freezing its version.
For distributions involving floating point arithmetic, you also have the problem that the arithmetic itself is, generally speaking, far from stable across platforms since stuff like automatic vectorization (SSE2 vs AVX etc) or FPU settings might change behavior. See this blog post for more information. How far distributions, including the above mentioned libraries, are affected by this, I unfortunately do not know. The small example on godbolt mentioned above does at least not show any problem with -ffast-math, which is a good sign.
Whatever you do, I highly recommend to back up your choice by appropriate automatic tests (unit tests etc.) that catch any potential deviations in behavior.

Related

What is the maximum value I can pass to std::thread::sleep_for() and sleep_until()?

This question on sleeping forever has an answer that mentions this:
std::this_thread::sleep_until(
std::chrono::time_point<std::chrono::system_clock>::max());
and this:
std::this_thread::sleep_for(
std::chrono::system_clock::durat‌ion::max());
Running this code on Visual C++ 2017 RC actually doesn't sleep at all. I haven't checked out the sleep_until() case, so I'm not sure what's going on there.
In the sleep_for() case, the given duration seems to be converted to an absolute time by adding it to system_clock::now()which is then forwarded to sleep_until(). The problem is that the addition overflows, giving a time in the past.
Looking at the C++17 draft in 30.3.2, neither sleep_until() nor sleep_for() seem to mention limits. There is nothing relevant in Timing specifications (30.2.4). As for duration::max(), it is described in duration_values (20.17.4.3) as: "The value returned shall compare greater than zero()", which isn't helpful at all.
Honestly, I was rather surprised to see sleep_for() fail for system_clock::duration::max(), as it is a construct that make perfect sense to me.
What is the highest value I can pass to those functions that has a well-defined behaviour?

Technically speaking std::chrono::system_clock::durat‌ion::max() should sleep for a very long time (longer than you will or your grandchildren will live). And the standard enforces that.
But practically, implementors are still learning how to deal with overflow induced by chrono conversions among durations of different precisions. So bugs are common.
It might be more practical to sleep for 9'000h (a little over a year). There's no way this is going to cause overflow. And it is surely "forever" for your application.
However, don't hesitate to send a bug report to your vendor complaining that std::chrono::system_clock::durat‌ion::max() doesn't work. It should. It is just tricky to make it work correctly. And making it work isn't portable, so it isn't reasonable to ask you to write some wrapper to do it.
Motivated by isanae's excellent comment below which asks for references:
30.3.3 [thread.thread.this]/p7 which describes sleep_for says:
Effects: Blocks the calling thread for the relative timeout (30.2.4) specified by rel_time.
30.2.4 [thread.req.timing] which is a specification of all the timing requirements in the thread support library, says:
2 Implementations necessarily have some delay in returning from a timeout. Any overhead in interrupt response, function return, and scheduling induces a “quality of implementation” delay, expressed as duration Di. Ideally, this delay would be zero. Further, any contention for processor and memory resources induces a “quality of management” delay, expressed as duration Dm. The delay durations may vary from timeout to timeout, but in all cases shorter is better.
3 The member functions whose names end in _for take an argument that specifies a duration. These functions produce relative timeouts. Implementations should use a steady clock to measure time for these functions.330 Given a duration argument Dt, the real-time duration of the timeout is Dt + Di + Dm .
Ok, so now I'm amused, because we aren't talking about a member function. We're talking about a namespace-scope function. This is a defect. Feel free to submit one.
But the spec provides no grace to overflow. The spec (nearly) clearly says that the implementation can't return until after the specified delay. It is vague on how much after, but clear on that it can't return before.
If you "bug" STL and he isn't cooperative, just refer him to me, and we will work it out. :-) Perhaps there is a standards bug I'm not seeing, and should be fixed. If so, I can help you file the bug against the standard instead of against VS. Or maybe VS has already addressed this issue, and the fix is available in an upgrade.
If this is a bug in VS, please let STL know that I am more than happy to assist in fixing it. There are different tradeoffs in addressing this issue on different platforms.
At the moment, I can't swear that there isn't a bug of this class in my own implementation (libc++). So no high-horse here. It is a difficult area for a std::lib to get right.
Update
I've looked at the libc++ sleep_for and sleep_until. sleep_for correctly handles the overflow by sleeping for a "long time" (as much as the OS can handle). sleep_until has the overflow bug.
Here is a very lightly tested fixed sleep_until:
template <class _Clock, class _Duration>
void
sleep_until(const chrono::time_point<_Clock, _Duration>& __t)
{
using namespace chrono;
using __ldsec = duration<long double>;
_LIBCPP_CONSTEXPR time_point<_Clock, __ldsec> _Max =
time_point<_Clock, nanoseconds>::max();
time_point<_Clock, nanoseconds> __ns;
if (__t < _Max)
{
__ns = time_point_cast<nanoseconds>(__t);
if (__ns < __t)
__ns += nanoseconds{1};
}
else
__ns = time_point<_Clock, nanoseconds>::max();
mutex __mut;
condition_variable __cv;
unique_lock<mutex> __lk(__mut);
while (_Clock::now() < __ns)
__cv.wait_until(__lk, __ns);
}
The basic strategy is to do the overflow check using a long double representation which not only has a very large maximum representable value, but also uses saturation arithmetic (has an infinity). If the input value is too big for the OS to handle, truncate it down to something the OS can handle.
On some platforms it might not be desirable to resort to floating point arithmetic. One might use __int128_t instead. Or there is a more involved trick of converting to the "least common multiple" of the input and the native duration before doing the comparison. That conversion will only involve division (not multiplication) and so can't overflow. However it will not always give accurate answers for two values that are nearly equal. But it should work well enough for this use case.
For those interested in the latter (lcm) strategy, here is how to compute that type:
namespace detail
{
template <class Duration0, class ...Durations>
struct lcm_type;
template <class Duration>
struct lcm_type<Duration>
{
using type = Duration;
};
template <class Duration1, class Duration2>
struct lcm_type<Duration1, Duration2>
{
template <class D>
using invert = std::chrono::duration
<
typename D::rep,
std::ratio_divide<std::ratio<1>, typename D::period>
>;
using type = invert<typename std::common_type<invert<Duration1>,
invert<Duration2>>::type>;
};
template <class Duration0, class Duration1, class Duration2, class ...Durations>
struct lcm_type<Duration0, Duration1, Duration2, Durations...>
{
using type = typename lcm_type<
typename lcm_type<Duration0, Duration1>::type,
Duration2, Durations...>::type;
};
} // namespace detail
One can think of lcm_type<duration1, duration2> as the opposite of common_type<duration1, duration2>. The former finds a duration which the conversion to only divides. The latter finds a duration which the conversion to only multiplies.

It's unspecified, and it will overflow
I've had discussions with Billy O'Neal, one of the Visual C++ standard library developers, and Howard Hinnant, lead author of libc++. My conclusion is that the _for and _until family from the threading library will overflow in unspecified ways and you should not try to pass largish values to them. Whether the standard is under-specified on that subject is unclear to me.
The problem
All timed functions1 take either a duration or a time_point. Both are defined by their underlying type (representation) and ratio (period). The period can also be considered a "unit", such as a second or nanosecond.
There are two main places where overflow can happen:
Before the platform-specific call, and
During the conversion to a platform-specific type
Before the call
It is possible to avoid overflow in this situation, like Howard mentions in his answer, but "implementors are still learning how to deal with overflow induced by chrono conversions among durations of different precisions".
Visual C++ 2017, for example, implements sleep_for() in terms of sleep_until() by adding the given duration to the current time returned by
system_clock::now(). If the duration is too large, this will overflow. Other libraries, such as libstdc++, don't seem to have this problem.
The system call
Once you go deep enough, you'll have to interact with whatever platform you're on to do the actual work. This is where it gets messy.
On libstdc++, for example, the call to sleep_for() ends up in nanosleep(), which takes a timespec. This is a simplified version of it:
auto s = duration_cast<seconds>(time);
auto ns = duration_cast<nanoseconds>(time - s);
timespec ts = { s.count(), ns.count() };
nanosleep(&ts, &ts);
It's easy to overflow this: you just have to pass a time that is longer than LLONG_MAX seconds:
std::this_thread::sleep_for(hours::max());
This overflows the duration_cast into seconds and sets ts.tv_sec to -3600, which doesn't sleep at all because nanosleep() fails on negative values. It gets even better with sleep_until(), which tries to call nanosleep() in a loop, but it keeps failing, so it takes 100% of the processor for the duration of the wait.
The same thing happens in the Visual C++ 2017 library. Ignoring the overflow in sleep_for() because it adds the duration to the current time, it ends up calling Sleep, which takes an unsigned 32-bit value in milliseconds.
Even if it called something more flexible like NtWaitForSingleObject() (which it might in the future), it's still only a signed 64-bit value in 100-nanosecond increments and can still overflow.
Bugs and limitations
I personally consider an overflow in the <chrono> library itself to be a bug, such as Visual C++'s implementation of sleep_for() in terms of sleep_until(). I think whatever value you give should end up untouched right up to the final conversion before calling into a platform-specific function.
Once you get there though, if the platform doesn't support sleeping for the duration you're asking for, there is no real solution. As <chrono> is prohibited from throwing exceptions, I accept than overflowing is a possibility. Although this then becomes undefined behaviour, I wish implementations would be a bit more careful treating overflows, such as libstdc++'s various failings of handling EINVAL and spinning in a tight loop.
Visual C++
I'm quoting a few things from the emails I got from Billy O'Neal because they add the point of view of a standard library developer:
Are you saying that this:
this_thread::sleep_for(system_clock::durat‌ion::max());
is undefined behaviour by the standard?
As far as I can tell, yes. It's kind of a grey area -- no maximum allowable range is really specified for these functions, but given their nature of accepting arbitrary time_point/duration, which may be backed by some user-supplied bignum type of which the standard library has no knowledge, a conversion to some underlying time_point/duration type is essentially mandated. <chrono>'s design treats dealing with overflows as a non-goal (see duration_cast, for example, which outright prohibits implementing "as if infinity" and similar).
The standard [...] doesn't give us any way to report failure to convert here -- the behavior is literally undefined. We are explicitly prohibited from throwing exceptions, we have no way of reasoning about what happens if you exceed LLONG_MAX, and so our only possible responses are "as if infinity" or go directly to std::terminate(), do not pass go, do not collect $200.
libstdc++ and libc++ are targeting platforms for which system_clock actually maps to something the platform understands, where Unix timestamps are the law of the land. We are not targeting such a platform, and are obligated to map to/from "DWORD milliseconds" and/or FILETIME.
About the only thing I can think of might be a reasonable use case for this thing would be to have some kind of sentinel value which means "infinity," but if we want to go there the standard should introduce a named constant and describe the behavior thereof.
I'd rather solve your direct problem (wanting a time value to be a sentinel for infinity) rather than attempting to mandate overflow checking. Overflow checking when you don't know anything about the types involved can get really expensive (in both complexity and run time), but checking for a magic constant (e.g. chrono::duration<rep, period>::max() or chrono::time_point<clock, duration>::max()) should be cheap.
It also looks like a future update (ABI incompatible) would make major changes to <thread> so it doesn't overflow in sleep_for() anymore, but it is still limited by what the Windows API supports. Something like NtWaitForSingleObject() does support 64-bit values, but signed, because it supports both relative (negative) and absolute (positive) times.
1 By "timed functions", I mean any function for which 30.2.4 [thread.req.timing] applies, such as this_thread::sleep_for() and this_thread::sleep_until(), but also stuff in timed_mutex, recursive_timed_mutex, condition_variable, etc.

In C++11 or later is there a way to have a constexpr that determines endian without UB?

Note that earlier similar questions I found were before C++11 and/or included UB and/or could not be a constexpr.
Not a dup of
Is there a way to do a C++ style compile-time assertion to determine machine's endianness? or similar
AFAIK there is little endian, big endian, and other. At a minimum I need to at least not compile if other, even better if other architectures can be added
constexpr and endianness was asked earlier and does not include other, which would leave other architectures ill defined as one or the other
Basically I want to be able to specialize a template based on the target architecture's endianness

As part of my hash_append work I hope to provide what you're asking for:
https://github.com/HowardHinnant/hash_append/blob/master/endian.h
other would be detected by:
endian::native != endian::little && endian::native != endian::big
The first static_assert in this header is currently incorrect with respect to the other issue and should be removed.
This header is very easy to provide for any given platform. But of course it is not portable, and thus is ideal to have it be provided by your std::lib implementor instead.

How can elusive 64-bit portability issues be detected?

I found a snippet similar to this in some (C++) code I'm preparing for a 64-bit port.
int n;
size_t pos, npos;
/* ... initialization ... */
while((pos = find(ch, start)) != npos)
{
/* ... advance start position ... */
n++; // this will overflow if the loop iterates too many times
}
While I seriously doubt this would actually cause a problem in even memory-intensive applications, it's worth looking at from a theoretical standpoint because similar errors could surface that will cause problems. (Change n to a short in the above example and even small files could overflow the counter.)
Static analysis tools are useful, but they can't detect this kind of error directly. (Not yet, anyway.) The counter n doesn't participate in the while expression at all, so this isn't as simple as other loops (where typecasting errors give the error away). Any tool would need to determine that the loop would execute more than 231 times, but that means it needs to be able to estimate how many times the expression (pos = find(ch, start)) != npos will evaluate as true—no small feat! Even if a tool could determine that the loop could execute more than 231 times (say, because it recognizes the find function is working on a string), how could it know that the loop won't execute more than 264 times, overflowing a size_t value, too?
It seems clear that to conclusively identify and fix this kind of error requires a human eye, but are there patterns that give away this kind of error so it can be manually inspected? What similar errors exist that I should be watchful for?
EDIT 1: Since short, int and long types are inherently problematic, this kind of error could be found by examining every instance of those types. However, given their ubiquity in legacy C++ code, I'm not sure this is practical for a large piece of software. What else gives away this error? Is each while loop likely to exhibit some kind of error like this? (for loops certainly aren't immune to it!) How bad is this kind of error if we're not dealing with 16-bit types like short?
EDIT 2: Here's another example, showing how this error appears in a for loop.
int i = 0;
for (iter = c.begin(); iter != c.end(); iter++, i++)
{
/* ... */
}
It's fundamentally the same problem: loops are counting on some variable that never directly interacts with a wider type. The variable can still overflow, but no compiler or tool detects a casting error. (Strictly speaking, there is none.)
EDIT 3: The code I'm working with is very large. (10-15 million lines of code for C++ alone.) It's infeasible to inspect all of it, so I'm specifically interested in ways to identify this sort of problem (even if it results in a high false-positive rate) automatically.

Code reviews. Get a bunch of smart people looking at the code.
Use of short, int, or long is a warning sign, because the range of these types isn't defined in the standard. Most usage should be changed to the new int_fastN_t types in <stdint.h>, usage dealing with serialization to intN_t. Well, actually these <stdint.h> types should be used to typedef new application-specific types.
This example really ought to be:
typedef int_fast32_t linecount_appt;
linecount_appt n;
This expresses a design assumption that linecount fits in 32 bits, and also makes it easy to fix the code if the design requirements change.

Its clear what you need is a smart "range" analyzer tool to determine what the range of values are that are computed vs the type in which those values are being stored. (Your fundamental objection is to that smart range analyzer being a person). You might need some additional code annotations (manually well-placed typedefs or assertions that provide explicit range constraints) to enable a good analysis, and to handle otherwise apparantly arbitrarily large user input.
You'd need special checks to handle the place where C/C++ says the arithmetic is legal but dumb (e.g., assumption that you don't want [twos complement] overflows).
For your n++ example, (equivalent to n_after=n_before+1), n_before can be 2^31-1 (because of your observations about strings), so n_before+1 can be 2^32 which is overflow. (I think standard C/C++ semantics says that overflow to -0 without complaint is OK).
Our DMS Software Reengineering Toolkit in fact has range analysis machinery built in... but it is not presently connected to the DMS's C++ front end; we can only peddle so fast :-{ [We have used it on COBOL programs for different problems involving ranges].
In the absence of such range analysis, you could probably detect the existing of loops with such dependent flows; the value of n clearly depends on the loop count. I suspect this would get you every loop in the program that had a side effect, which might not be that much help.
Another poster suggests somehow redeclaring all the int-like declarations using application specific types (e.g., *linecount_appt*) and then typedef'ing those to value that work for your application. To do this, I'd think you'd have to classify each int-like declaration into categories (e.g., "these declarations are all *linecount_appt*"). Doing this by manual inspection for 10M SLOC seems pretty hard and very error prone. Finding all declarations which receive (by assignment) values from the "same" value sources might be a way to get hints about where such application types are. You'd want to be able to mechanically find such groups of declarations, and then have some tool automatically replace the actual declarations with a designated application type (e.g., *linecount_appt*). This is likely somewhat easier than doing precise range analysis.

There are tools that help find such issues. I won't give any links here because the ones I know of are commercial but should be pretty easy to find.

Retrofitting existing code with a floating point arbitrary precision C++ library, any chance of success?

Let say I have a snippet of code like this:
typedef double My_fp_t;
My_fp_t my_fun( My_fp_t input )
{
// some fp computation, it uses operator+, operator- and so on for type My_fp_t
}
My_fp_t input = 0.;
My_fp_t output = my_fun( input );
Is it possible to retrofit my existing code with a floating point arbitrary precision C++ library?
I would like to simple add #include <cpp_arbitrary_precision_fp>, change my typedef double My_fp_t; into typedef arbitrary_double_t My_fp_t; and let the operator overloading of C++ doing its job...
My main problem is that actually my code do NOT have the typedef :-( and so maybe my plan is doomed to failure.
Assuming that my code had the typedef, what other problems would I face?

This might be tough. I used a template approach in my PhD thesis code do deal with different numerical types. You might want to take a look at it to see the problems I encountered.
The thing is you are fine if all you do with your numbers is use the standard arithmetic operators. However, as soon as you use a square root or some other non operator function you need to create helper objects to detect your object's type (at compile time as it is too slow to do this at run time; see the boost metaprogramming library for help on that) and then call the correct function and return it as the correct type. It is all totally doable, but is likely to take longer than you think and will add considerably to the complexity of your code.
In my experience, (I was using GMP which must be the fastest arbitrary precision library available for C++) after all of the effort and complexity I had introduced, I found that GMP was just too slow for the sorts of computation that I was doing; so it was academically interesting, but practically useless. Before you start on this do some speed tests to see whether your library will still be usable if you use arbitrary precision arithmetic.

If the library defines a type that correctly overloads the operators you use, I don't see any problem...

Why isn't `int pow(int base, int exponent)` in the standard C++ libraries?

I feel like I must just be unable to find it. Is there any reason that the C++ pow function does not implement the "power" function for anything except floats and doubles?
I know the implementation is trivial, I just feel like I'm doing work that should be in a standard library. A robust power function (i.e. handles overflow in some consistent, explicit way) is not fun to write.

As of C++11, special cases were added to the suite of power functions (and others). C++11 [c.math] /11 states, after listing all the float/double/long double overloads (my emphasis, and paraphrased):
Moreover, there shall be additional overloads sufficient to ensure that, if any argument corresponding to a double parameter has type double or an integer type, then all arguments corresponding to double parameters are effectively cast to double.
So, basically, integer parameters will be upgraded to doubles to perform the operation.
Prior to C++11 (which was when your question was asked), no integer overloads existed.
Since I was neither closely associated with the creators of C nor C++ in the days of their creation (though I am rather old), nor part of the ANSI/ISO committees that created the standards, this is necessarily opinion on my part. I'd like to think it's informed opinion but, as my wife will tell you (frequently and without much encouragement needed), I've been wrong before :-)
Supposition, for what it's worth, follows.
I suspect that the reason the original pre-ANSI C didn't have this feature is because it was totally unnecessary. First, there was already a perfectly good way of doing integer powers (with doubles and then simply converting back to an integer, checking for integer overflow and underflow before converting).
Second, another thing you have to remember is that the original intent of C was as a systems programming language, and it's questionable whether floating point is desirable in that arena at all.
Since one of its initial use cases was to code up UNIX, the floating point would have been next to useless. BCPL, on which C was based, also had no use for powers (it didn't have floating point at all, from memory).
As an aside, an integral power operator would probably have been a binary operator rather than a library call. You don't add two integers with x = add (y, z) but with x = y + z - part of the language proper rather than the library.
Third, since the implementation of integral power is relatively trivial, it's almost certain that the developers of the language would better use their time providing more useful stuff (see below comments on opportunity cost).
That's also relevant for the original C++. Since the original implementation was effectively just a translator which produced C code, it carried over many of the attributes of C. Its original intent was C-with-classes, not C-with-classes-plus-a-little-bit-of-extra-math-stuff.
As to why it was never added to the standards before C++11, you have to remember that the standards-setting bodies have specific guidelines to follow. For example, ANSI C was specifically tasked to codify existing practice, not to create a new language. Otherwise, they could have gone crazy and given us Ada :-)
Later iterations of that standard also have specific guidelines and can be found in the rationale documents (rationale as to why the committee made certain decisions, not rationale for the language itself).
For example the C99 rationale document specifically carries forward two of the C89 guiding principles which limit what can be added:
Keep the language small and simple.
Provide only one way to do an operation.
Guidelines (not necessarily those specific ones) are laid down for the individual working groups and hence limit the C++ committees (and all other ISO groups) as well.
In addition, the standards-setting bodies realise that there is an opportunity cost (an economic term meaning what you have to forego for a decision made) to every decision they make. For example, the opportunity cost of buying that $10,000 uber-gaming machine is cordial relations (or probably all relations) with your other half for about six months.
Eric Gunnerson explains this well with his -100 points explanation as to why things aren't always added to Microsoft products- basically a feature starts 100 points in the hole so it has to add quite a bit of value to be even considered.
In other words, would you rather have a integral power operator (which, honestly, any half-decent coder could whip up in ten minutes) or multi-threading added to the standard? For myself, I'd prefer to have the latter and not have to muck about with the differing implementations under UNIX and Windows.
I would like to also see thousands and thousands of collection the standard library (hashes, btrees, red-black trees, dictionary, arbitrary maps and so forth) as well but, as the rationale states:
A standard is a treaty between implementer and programmer.
And the number of implementers on the standards bodies far outweigh the number of programmers (or at least those programmers that don't understand opportunity cost). If all that stuff was added, the next standard C++ would be C++215x and would probably be fully implemented by compiler developers three hundred years after that.
Anyway, that's my (rather voluminous) thoughts on the matter. If only votes were handed out based on quantity rather than quality, I'd soon blow everyone else out of the water. Thanks for listening :-)

For any fixed-width integral type, nearly all of the possible input pairs overflow the type, anyway. What's the use of standardizing a function that doesn't give a useful result for vast majority of its possible inputs?
You pretty much need to have an big integer type in order to make the function useful, and most big integer libraries provide the function.
Edit: In a comment on the question, static_rtti writes "Most inputs cause it to overflow? The same is true for exp and double pow, I don't see anyone complaining." This is incorrect.
Let's leave aside exp, because that's beside the point (though it would actually make my case stronger), and focus on double pow(double x, double y). For what portion of (x,y) pairs does this function do something useful (i.e., not simply overflow or underflow)?
I'm actually going to focus only on a small portion of the input pairs for which pow makes sense, because that will be sufficient to prove my point: if x is positive and |y| <= 1, then pow does not overflow or underflow. This comprises nearly one-quarter of all floating-point pairs (exactly half of non-NaN floating-point numbers are positive, and just less than half of non-NaN floating-point numbers have magnitude less than 1). Obviously, there are a lot of other input pairs for which pow produces useful results, but we've ascertained that it's at least one-quarter of all inputs.
Now let's look at a fixed-width (i.e. non-bignum) integer power function. For what portion inputs does it not simply overflow? To maximize the number of meaningful input pairs, the base should be signed and the exponent unsigned. Suppose that the base and exponent are both n bits wide. We can easily get a bound on the portion of inputs that are meaningful:
If the exponent 0 or 1, then any base is meaningful.
If the exponent is 2 or greater, then no base larger than 2^(n/2) produces a meaningful result.
Thus, of the 2^(2n) input pairs, less than 2^(n+1) + 2^(3n/2) produce meaningful results. If we look at what is likely the most common usage, 32-bit integers, this means that something on the order of 1/1000th of one percent of input pairs do not simply overflow.

Because there's no way to represent all integer powers in an int anyways:
>>> print 2**-4
0.0625

That's actually an interesting question. One argument I haven't found in the discussion is the simple lack of obvious return values for the arguments. Let's count the ways the hypthetical int pow_int(int, int) function could fail.
Overflow
Result undefined pow_int(0,0)
Result can't be represented pow_int(2,-1)
The function has at least 2 failure modes. Integers can't represent these values, the behaviour of the function in these cases would need to be defined by the standard - and programmers would need to be aware of how exactly the function handles these cases.
Overall leaving the function out seems like the only sensible option. The programmer can use the floating point version with all the error reporting available instead.

Short answer:
A specialisation of pow(x, n) to where n is a natural number is often useful for time performance. But the standard library's generic pow() still works pretty (surprisingly!) well for this purpose and it is absolutely critical to include as little as possible in the standard C library so it can be made as portable and as easy to implement as possible. On the other hand, that doesn't stop it at all from being in the C++ standard library or the STL, which I'm pretty sure nobody is planning on using in some kind of embedded platform.
Now, for the long answer.
pow(x, n) can be made much faster in many cases by specialising n to a natural number. I have had to use my own implementation of this function for almost every program I write (but I write a lot of mathematical programs in C). The specialised operation can be done in O(log(n)) time, but when n is small, a simpler linear version can be faster. Here are implementations of both:
// Computes x^n, where n is a natural number.
double pown(double x, unsigned n)
{
double y = 1;
// n = 2*d + r. x^n = (x^2)^d * x^r.
unsigned d = n >> 1;
unsigned r = n & 1;
double x_2_d = d == 0? 1 : pown(x*x, d);
double x_r = r == 0? 1 : x;
return x_2_d*x_r;
}
// The linear implementation.
double pown_l(double x, unsigned n)
{
double y = 1;
for (unsigned i = 0; i < n; i++)
y *= x;
return y;
}
(I left x and the return value as doubles because the result of pow(double x, unsigned n) will fit in a double about as often as pow(double, double) will.)
(Yes, pown is recursive, but breaking the stack is absolutely impossible since the maximum stack size will roughly equal log_2(n) and n is an integer. If n is a 64-bit integer, that gives you a maximum stack size of about 64. No hardware has such extreme memory limitations, except for some dodgy PICs with hardware stacks that only go 3 to 8 function calls deep.)
As for performance, you'll be surprised by what a garden variety pow(double, double) is capable of. I tested a hundred million iterations on my 5-year-old IBM Thinkpad with x equal to the iteration number and n equal to 10. In this scenario, pown_l won. glibc pow() took 12.0 user seconds, pown took 7.4 user seconds, and pown_l took only 6.5 user seconds. So that's not too surprising. We were more or less expecting this.
Then, I let x be constant (I set it to 2.5), and I looped n from 0 to 19 a hundred million times. This time, quite unexpectedly, glibc pow won, and by a landslide! It took only 2.0 user seconds. My pown took 9.6 seconds, and pown_l took 12.2 seconds. What happened here? I did another test to find out.
I did the same thing as above only with x equal to a million. This time, pown won at 9.6s. pown_l took 12.2s and glibc pow took 16.3s. Now, it's clear! glibc pow performs better than the three when x is low, but worst when x is high. When x is high, pown_l performs best when n is low, and pown performs best when x is high.
So here are three different algorithms, each capable of performing better than the others under the right circumstances. So, ultimately, which to use most likely depends on how you're planning on using pow, but using the right version is worth it, and having all of the versions is nice. In fact, you could even automate the choice of algorithm with a function like this:
double pown_auto(double x, unsigned n, double x_expected, unsigned n_expected) {
if (x_expected < x_threshold)
return pow(x, n);
if (n_expected < n_threshold)
return pown_l(x, n);
return pown(x, n);
}
As long as x_expected and n_expected are constants decided at compile time, along with possibly some other caveats, an optimising compiler worth its salt will automatically remove the entire pown_auto function call and replace it with the appropriate choice of the three algorithms. (Now, if you are actually going to attempt to use this, you'll probably have to toy with it a little, because I didn't exactly try compiling what I'd written above. ;))
On the other hand, glibc pow does work and glibc is big enough already. The C standard is supposed to be portable, including to various embedded devices (in fact embedded developers everywhere generally agree that glibc is already too big for them), and it can't be portable if for every simple math function it needs to include every alternative algorithm that might be of use. So, that's why it isn't in the C standard.
footnote: In the time performance testing, I gave my functions relatively generous optimisation flags (-s -O2) that are likely to be comparable to, if not worse than, what was likely used to compile glibc on my system (archlinux), so the results are probably fair. For a more rigorous test, I'd have to compile glibc myself and I reeeally don't feel like doing that. I used to use Gentoo, so I remember how long it takes, even when the task is automated. The results are conclusive (or rather inconclusive) enough for me. You're of course welcome to do this yourself.
Bonus round: A specialisation of pow(x, n) to all integers is instrumental if an exact integer output is required, which does happen. Consider allocating memory for an N-dimensional array with p^N elements. Getting p^N off even by one will result in a possibly randomly occurring segfault.

One reason for C++ to not have additional overloads is to be compatible with C.
C++98 has functions like double pow(double, int), but these have been removed in C++11 with the argument that C99 didn't include them.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3286.html#550
Getting a slightly more accurate result also means getting a slightly different result.

The World is constantly evolving and so are the programming languages. The fourth part of the C decimal TR¹ adds some more functions to <math.h>. Two families of these functions may be of interest for this question:
The pown functions, that takes a floating point number and an intmax_t exponent.
The powr functions, that takes two floating points numbers (x and y) and compute x to the power y with the formula exp(y*log(x)).
It seems that the standard guys eventually deemed these features useful enough to be integrated in the standard library. However, the rational is that these functions are recommended by the ISO/IEC/IEEE 60559:2011 standard for binary and decimal floating point numbers. I can't say for sure what "standard" was followed at the time of C89, but the future evolutions of <math.h> will probably be heavily influenced by the future evolutions of the ISO/IEC/IEEE 60559 standard.
Note that the fourth part of the decimal TR won't be included in C2x (the next major C revision), and will probably be included later as an optional feature. There hasn't been any intent I know of to include this part of the TR in a future C++ revision.
¹ You can find some work-in-progress documentation here.

Here's a really simple O(log(n)) implementation of pow() that works for any numeric types, including integers:
template<typename T>
static constexpr inline T pown(T x, unsigned p) {
T result = 1;
while (p) {
if (p & 0x1) {
result *= x;
}
x *= x;
p >>= 1;
}
return result;
}
It's better than enigmaticPhysicist's O(log(n)) implementation because it doesn't use recursion.
It's also almost always faster than his linear implementation (as long as p > ~3) because:
it doesn't require any extra memory
it only does ~1.5x more operations per loop
it only does ~1.25x more memory updates per loop

Perhaps because the processor's ALU didn't implement such a function for integers, but there is such an FPU instruction (as Stephen points out, it's actually a pair). So it was actually faster to cast to double, call pow with doubles, then test for overflow and cast back, than to implement it using integer arithmetic.
(for one thing, logarithms reduce powers to multiplication, but logarithms of integers lose a lot of accuracy for most inputs)
Stephen is right that on modern processors this is no longer true, but the C standard when the math functions were selected (C++ just used the C functions) is now what, 20 years old?

As a matter of fact, it does.
Since C++11 there is a templated implementation of pow(int, int) --- and even more general cases, see (7) in
http://en.cppreference.com/w/cpp/numeric/math/pow
EDIT: purists may argue this is not correct, as there is actually "promoted" typing used. One way or another, one gets a correct int result, or an error, on int parameters.

A very simple reason:
5^-2 = 1/25
Everything in the STL library is based on the most accurate, robust stuff imaginable. Sure, the int would return to a zero (from 1/25) but this would be an inaccurate answer.
I agree, it's weird in some cases.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js