What is `std::string_view::max_size()`?

What is `std::string_view::max_size()`? - c++

While creating a version of std::basic_string_view for a private project (choices made for me: C++11; no boost:: allowed; with a pinch of NIH, so no GSL either) I came to implementing std::basic_string_view::max_size() for which the standard (n4820 21.4.2.3 Capacity) simply says:
Returns: The largest possible number of char-like objects that can be referred to by a basic_string_view.
Logically, this would be the maximum number that std::basic_string_view::size_type can represent: std::numeric_limits<std::basic_string_view::size_type>::max() which comes out at 18446744073709551615 on my platform where size_type is std::size_t.
I figured that, since I want to be compatible with the standard libraries, I should ensure that I arrive at the same number as other implementations. This is where I get lost.
Given that I have a auto max_size = string_view{"foo"}.max_size() then I get the following results:
+--------------+--------------------------+
| Library | Result |
+--------------+--------------------------+
| libstdc++ | 4611686018427387899 |
| libc++ | 18446744073709551615 |
| boost 1.72.0 | 3 |
+--------------+--------------------------+
If my interpretation is correct, then that means that libc++ and I agree on what the value should be. I feel that boost is completely wrong, since the specification for max_size is to return the largest possible number that a, not this, string_view can refer to. However, as noted in the comments, boost::string_view predates the standard and it is thus unfair to call it "completely wrong". Further, looking at the implementations of all three libraries libc++ returns
numeric_limits<size_type>::max();
libstdc++ returns
(npos - sizeof(size_type) - sizeof(void*)) / sizeof(value_type) / 4;
and boost returns:
len_;
Basically, two implementations appear to be wrong, but the question is: which one is correct?

Related

c++ - deterministic alternative to std::uniform_XXX_distribution with mt19937?

I need a way to get deterministic sequences of ints and doubles.
template <class U>
constexpr auto get_random_value (std::mt19937 &gen, U min_value, U max_value)->U
{
if constexpr ( std::is_same_v <U, double> or std::is_same_v <U, float> ){
std::uniform_real_distribution <U> distrib( min_value, max_value );
return distrib( gen );
}
else if constexpr ( std::is_same_v <U, u32> or std::is_same_v <U, i32> ){
std::uniform_int_distribution distrib( min_value, max_value );
return distrib( gen );
}
else {
throw std::runtime_error( "error value type" );
}
}
My issue is that one day to another, the same seeded value will lead to different results.
The distribution is to blame because it goes a long way to avoid the pitfall of the modulo.
But I need a precise way to always be certain that a sequence will always be the same starting from a given seed. And I need an unbiased partition (so % and rand() are out).
What kind of implementation will guarantee this?

The distributions in the C++ standard are not portable ("seed-stable"), in the sense that the result can change between different implementations (e.g. Microsoft STL vs gcc libstdc++ vs clang libc++) or even different versions (e.g. Microsoft changed their implementation before). The standard simply does not prescribe a specific algorithm, with the intention to allow implementations to select the one with the best performance for each platform.
So far, there is only a proposal (D2059R0) to do something about this.
Note, however, that the generators are actually portable.
I have yet to see a library that guarantees portability.
However, in practice boost.random is known to produce reproducible result across platforms (see e.g. here or here or here). Also, Google's abseil library explicitly states that they do not provide a stability guarantee, but seems to produce the same result on different platforms nevertheless.
Here is a live example on godbolt where you can see that to some extent (well, at least Linux vs Windows for a tiny selection of parameters).
The major point is not to update the libraries without checking for a breaking change. Also compare e.g. this blog post.
Or you could also implement a specific algorithm yourself (see e.g. here) or simply copy the code from one the libraries to your code base and thereby effectively freezing its version.
For distributions involving floating point arithmetic, you also have the problem that the arithmetic itself is, generally speaking, far from stable across platforms since stuff like automatic vectorization (SSE2 vs AVX etc) or FPU settings might change behavior. See this blog post for more information. How far distributions, including the above mentioned libraries, are affected by this, I unfortunately do not know. The small example on godbolt mentioned above does at least not show any problem with -ffast-math, which is a good sign.
Whatever you do, I highly recommend to back up your choice by appropriate automatic tests (unit tests etc.) that catch any potential deviations in behavior.

What is std::__lg?

As title I don't know what is std::__lg mean after google it?And what exactly this line do : int n = std::__lg(block_sz - pos_l + 1);

It's helper to compute the 2-based logarithm of integer number, i.e. it return the index of highest set bit in the number (or -1 for 0).
I.e. for 1 it will return 0, for 16 it will return 4, for 1024 it will return 10, etc.
This can be used to efficiently predict the pre-allocated size for arrays, round to the nearest power of 2 and thing like that.
Note, that as any other function starting with __, it's internal function of the compiler or the library, so you should not rely on its existence, such code wouldn't be portable. Other implementations of std library can come with completely solution and different names of similar helpers (if they use something similar at all).
POSIX provides similar function - ffs(), there are also ffsl and ffsll (see the same page) which is GNU extension and work with long and long long respectively.
For the question from comment - how to use it from Java. Because of the above it's not good idea first, secondly it would require JNI wrapper for this. And third but most important - there is no reason for this actually. Java already provides similar methods Integer.heghestOneBit(), although note it returns +1 in comparison to described std::__lg, i.e. 0 for 0, 1 for 1, 11 for 1024, etc.

It's an identifier used internally by your compiler (very likely GCC), because all identifiers with double underscores belong to the compiler implementation.
Nowhere in your own code should something like __lg be seen or used. Use the interface of the standard library, not its implementation. If your own code directly uses __lg, then you have no guarantee that the code will compile or do the right thing with any other compiler or even with any other version of the same compiler.
As the C++ standard says at §2.10 [lex.name]:
Each identifier that contains a double underscore __ or begins
with an underscore followed by an uppercase letter is reserved to the
implementation for any use.
As for what that GCC thing actually is, just look at the source code which a Google search for "std::__lg" turns up.
Depending on the actual types of block_sz and pos_l, it should be either this:
/// This is a helper function for the sort routines and for random.tcc.
// Precondition: __n > 0.
template<typename _Size>
inline _Size
__lg(_Size __n)
{
_Size __k;
for (__k = 0; __n != 0; __n >>= 1)
++__k;
return __k - 1;
}
Or this:
inline int
__lg(int __n)
{ return sizeof(int) * __CHAR_BIT__ - 1 - __builtin_clz(__n); }
Now, __CHAR_BIT__ is like the standard CHAR_BIT macro. As GCC documentation says:
Defined to the number of bits used in the representation of the char
data type. It exists to make the standard header given numerical
limits work correctly. You should not use this macro directly;
instead, include the appropriate headers.
__builtin_clz is another GCC-specific function. Again, GCC documentation explains its purpose:
Returns the number of leading 0-bits in x, starting at the most
significant bit position. If x is 0, the result is undefined.
I think if you need such functionality, then it's trivial to write it yourself. In fact, the question is why you need it in the first place. The real answer to your actual problem probably lies in the code around the int n = std::__lg(block_sz - pos_l + 1); line.
Things to keep in mind:
Do not use anything with two consecutive underscores in your own code.
GCC is open source, so the internal implementation of special functions or macros is not secret but can easily be browsed online.

Ada-style Range Types in D

After having read this interesting article about Ada and C++ and knowing of D's support for CTFE and constant-parameter specialization of functions I wonder if Ada-Style Range types could be more easily/efficiently implemented in D than in C++. Has anybody perhaps already written such a library?
If such ranges could be implemented efficiently and developer-friendly in D it could be used as a promotor for establishing D in sectors with demands on determinism and type- and memory-safety (were D already shines) such as in avionics and automotive. D would thereby gain more developer-interest and stronger financial support.

Having scalar (bounded) variable is easily done in D as a template, and in fact I remember I saw the code that someone already did it. Unfortunately I do not remember where I saw it. This said, there is IMHO no need for this to become part of the language, but rather part of the standard library.
(Edit: Adam reminded me of the code: http://arsdnet.net/dcode/ranged.d )
Ranges are more wider concept nicely explained in Andrei's article - http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1 .
This type of ranges are now a core concept of D. D's slice is an implementation of the most powerful range - RandomAccessRange.
Example:
import std.stdio;
import std.algorithm;
void main()
{
int[] values = [ 1, 20, 7, 11 ]; // values is a RandomAcessRange
writeln(filter!(value => value > 10)(values));
}
Good reads:
http://ddili.org/ders/d.en/ranges.html
http://www.drdobbs.com/architecture-and-design/component-programming-in-d/240008321
http://dlang.org/phobos/std_range.html

I wrote some little code that does min and max of integers with overflow check:
http://arsdnet.net/dcode/ranged.d
This was just a proof of concept though, I doubt it will perform very well, but might if inlined.

Template code bloat with unordered_map

I wondered if unordered_map is implemented using type erasure, since an unordered_map<Key, A*> and unordered_map<Key, B*> can use exactly the same code (apart from casting, which is a no-op in machine code). That is, the implementation of both could be based on unordered_map<Key, void*> to save code size.
Update: This technique is commonly referred to as the Thin Template Idiom (Thanks to the commenters below for pointing that out).
Update 2: I would be particlarly interested in Howard Hinnant's opinion. Let's hope he reads this.
So I wrote this small test:
#include <iostream>
#if BOOST
# include <boost/unordered_map.hpp>
using boost::unordered_map;
#else
# include <unordered_map>
using std::unordered_map;
#endif
struct A { A(int x) : x(x) {} int x; };
struct B { B(int x) : x(x) {} int x; };
int main()
{
#if SMALL
unordered_map<std::string, void*> ma, mb;
#else
unordered_map<std::string, A*> ma;
unordered_map<std::string, B*> mb;
#endif
ma["foo"] = new A(1);
mb["bar"] = new B(2);
std::cout << ((A*) ma["foo"])->x << std::endl;
std::cout << ((B*) mb["bar"])->x << std::endl;
// yes, it leaks.
}
And determined the size of the compiled output with various settings:
#!/bin/sh
for BOOST in 0 1 ; do
for OPT in 2 3 s ; do
for SMALL in 0 1 ; do
clang++ -stdlib=libc++ -O${OPT} -DSMALL=${SMALL} -DBOOST=${BOOST} map_test.cpp -o map_test
strip map_test
SIZE=$(echo "scale=1;$(stat -f "%z" map_test)/1024" | bc)
echo boost=$BOOST opt=$OPT small=$SMALL size=${SIZE}K
done
done
done
It turns out, that with all settings I tried, lots of inner code of unordered_map seems to be instantiated twice:
With Clang and libc++:
| -O2 | -O3 | -Os
-DSMALL=0 | 24.7K | 23.5K | 28.2K
-DSMALL=1 | 17.9K | 17.2K | 19.8K
With Clang and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 23.9K | 23.9K | 32.5K
-DSMALL=1 | 17.4K | 17.4K | 22.3K
With GCC and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 21.8K | 21.8K | 35.5K
-DSMALL=1 | 16.4K | 16.4K | 26.2K
(With the compilers from Apple's Xcode)
Now to the question: Is there some convincing technical reason due to which the implementers have chosen to omit this simple optimization?
Also: why the hell is the effect of -Os exactly the opposite of what is advertised?
Update 3:
As suggested by Nicol Bolas, I have repeated the measurements with shared_ptr<void/A/B> instead of naked pointers (created with make_shared and cast with static_pointer_cast). The tendency in the results is the same:
With Clang and libc++:
| -O2 | -O3 | -Os
-DSMALL=0 | 27.9K | 26.7K | 30.9K
-DSMALL=1 | 25.0K | 20.3K | 26.8K
With Clang and Boost:
| -O2 | -O3 | -Os
-DSMALL=0 | 35.3K | 34.3K | 43.1K
-DSMALL=1 | 27.8K | 26.8K | 32.6K

Since I've been specifically asked to comment, I will, though I'm not sure I have much more to add than has already been said. (sorry it took me 8 days to get here)
I've implemented the thin template idiom before, for some containers, namely vector, deque and list. I don't currently have it implemented for any container in libc++. And I've never implemented it for the unordered containers.
It does save on code size. It also adds complexity, much more so than the referenced wikibooks link implies. One can also do it for more than just pointers. You can do it for all scalars which have the same size. For example why have different instantiations for int and unsigned? Even ptrdiff_t can be stored in the same instantiation as T*. After all, it is all just a bag bits at the bottom. But it is extremely tricky to get the member templates which take a range of iterators correct when playing these tricks.
There are disadvantages though (besides difficulty of implementation). It doesn't play nearly as nicely with the debugger. At the very least it makes it much more difficult for the debugger to display container innards. And while the code size savings can be significant, I would stop short of calling the code size savings dramatic. Especially when compared to the memory required to store the photographs, animations, audio clips, street maps, years of email with all of the attachments from your best friends and family, etc. I.e. optimizing code size is important. But you should take into account that in many apps today (even on embedded devices), if you cut your code size in half, you might cut your app size by 5% (statistics admittedly pulled from thin air).
My current position is that this particular optimization is one best paid for and implemented in the linker instead of in the template container. Though I know this isn't easy to implement in the linker, I have heard of successful implementations.
That being said, I still do try to make code size optimizations in templates. For example in the libc++ helper structures such as __hash_map_node_destructor are templated on as few parameters as possible, so if any of their code gets outlined, it is more likely that one instantiation of the helper can serve more than one instantiation of unordered_map. This technique is debugger friendly, and not that hard to get right. And can even have some positive side effects for the client when applied to iterators (N2980).
In summary, I wouldn't hold it against code for going the extra mile and implementing this optimization. But I also wouldn't classify it as high a priority as I did a decade ago, both because linker technology has progressed, and the ratio of code size to application size has tended to fairly dramatically decrease.

When you have a void* parameter there is no type checking at compile-time.
Such maps as those you propose would be a flaw in a program since they would accept value elements of type A*, B*, and even more unimaginable fancy types that would have nothing to do in that map. ( for example int*, float*; std::string*, CString*, CWnd*... imagine the mess in your map...)
Your optimisation is premature. And premature optimization is root of all evil.

Why isn't `int pow(int base, int exponent)` in the standard C++ libraries?

I feel like I must just be unable to find it. Is there any reason that the C++ pow function does not implement the "power" function for anything except floats and doubles?
I know the implementation is trivial, I just feel like I'm doing work that should be in a standard library. A robust power function (i.e. handles overflow in some consistent, explicit way) is not fun to write.

As of C++11, special cases were added to the suite of power functions (and others). C++11 [c.math] /11 states, after listing all the float/double/long double overloads (my emphasis, and paraphrased):
Moreover, there shall be additional overloads sufficient to ensure that, if any argument corresponding to a double parameter has type double or an integer type, then all arguments corresponding to double parameters are effectively cast to double.
So, basically, integer parameters will be upgraded to doubles to perform the operation.
Prior to C++11 (which was when your question was asked), no integer overloads existed.
Since I was neither closely associated with the creators of C nor C++ in the days of their creation (though I am rather old), nor part of the ANSI/ISO committees that created the standards, this is necessarily opinion on my part. I'd like to think it's informed opinion but, as my wife will tell you (frequently and without much encouragement needed), I've been wrong before :-)
Supposition, for what it's worth, follows.
I suspect that the reason the original pre-ANSI C didn't have this feature is because it was totally unnecessary. First, there was already a perfectly good way of doing integer powers (with doubles and then simply converting back to an integer, checking for integer overflow and underflow before converting).
Second, another thing you have to remember is that the original intent of C was as a systems programming language, and it's questionable whether floating point is desirable in that arena at all.
Since one of its initial use cases was to code up UNIX, the floating point would have been next to useless. BCPL, on which C was based, also had no use for powers (it didn't have floating point at all, from memory).
As an aside, an integral power operator would probably have been a binary operator rather than a library call. You don't add two integers with x = add (y, z) but with x = y + z - part of the language proper rather than the library.
Third, since the implementation of integral power is relatively trivial, it's almost certain that the developers of the language would better use their time providing more useful stuff (see below comments on opportunity cost).
That's also relevant for the original C++. Since the original implementation was effectively just a translator which produced C code, it carried over many of the attributes of C. Its original intent was C-with-classes, not C-with-classes-plus-a-little-bit-of-extra-math-stuff.
As to why it was never added to the standards before C++11, you have to remember that the standards-setting bodies have specific guidelines to follow. For example, ANSI C was specifically tasked to codify existing practice, not to create a new language. Otherwise, they could have gone crazy and given us Ada :-)
Later iterations of that standard also have specific guidelines and can be found in the rationale documents (rationale as to why the committee made certain decisions, not rationale for the language itself).
For example the C99 rationale document specifically carries forward two of the C89 guiding principles which limit what can be added:
Keep the language small and simple.
Provide only one way to do an operation.
Guidelines (not necessarily those specific ones) are laid down for the individual working groups and hence limit the C++ committees (and all other ISO groups) as well.
In addition, the standards-setting bodies realise that there is an opportunity cost (an economic term meaning what you have to forego for a decision made) to every decision they make. For example, the opportunity cost of buying that $10,000 uber-gaming machine is cordial relations (or probably all relations) with your other half for about six months.
Eric Gunnerson explains this well with his -100 points explanation as to why things aren't always added to Microsoft products- basically a feature starts 100 points in the hole so it has to add quite a bit of value to be even considered.
In other words, would you rather have a integral power operator (which, honestly, any half-decent coder could whip up in ten minutes) or multi-threading added to the standard? For myself, I'd prefer to have the latter and not have to muck about with the differing implementations under UNIX and Windows.
I would like to also see thousands and thousands of collection the standard library (hashes, btrees, red-black trees, dictionary, arbitrary maps and so forth) as well but, as the rationale states:
A standard is a treaty between implementer and programmer.
And the number of implementers on the standards bodies far outweigh the number of programmers (or at least those programmers that don't understand opportunity cost). If all that stuff was added, the next standard C++ would be C++215x and would probably be fully implemented by compiler developers three hundred years after that.
Anyway, that's my (rather voluminous) thoughts on the matter. If only votes were handed out based on quantity rather than quality, I'd soon blow everyone else out of the water. Thanks for listening :-)

For any fixed-width integral type, nearly all of the possible input pairs overflow the type, anyway. What's the use of standardizing a function that doesn't give a useful result for vast majority of its possible inputs?
You pretty much need to have an big integer type in order to make the function useful, and most big integer libraries provide the function.
Edit: In a comment on the question, static_rtti writes "Most inputs cause it to overflow? The same is true for exp and double pow, I don't see anyone complaining." This is incorrect.
Let's leave aside exp, because that's beside the point (though it would actually make my case stronger), and focus on double pow(double x, double y). For what portion of (x,y) pairs does this function do something useful (i.e., not simply overflow or underflow)?
I'm actually going to focus only on a small portion of the input pairs for which pow makes sense, because that will be sufficient to prove my point: if x is positive and |y| <= 1, then pow does not overflow or underflow. This comprises nearly one-quarter of all floating-point pairs (exactly half of non-NaN floating-point numbers are positive, and just less than half of non-NaN floating-point numbers have magnitude less than 1). Obviously, there are a lot of other input pairs for which pow produces useful results, but we've ascertained that it's at least one-quarter of all inputs.
Now let's look at a fixed-width (i.e. non-bignum) integer power function. For what portion inputs does it not simply overflow? To maximize the number of meaningful input pairs, the base should be signed and the exponent unsigned. Suppose that the base and exponent are both n bits wide. We can easily get a bound on the portion of inputs that are meaningful:
If the exponent 0 or 1, then any base is meaningful.
If the exponent is 2 or greater, then no base larger than 2^(n/2) produces a meaningful result.
Thus, of the 2^(2n) input pairs, less than 2^(n+1) + 2^(3n/2) produce meaningful results. If we look at what is likely the most common usage, 32-bit integers, this means that something on the order of 1/1000th of one percent of input pairs do not simply overflow.

Because there's no way to represent all integer powers in an int anyways:
>>> print 2**-4
0.0625

That's actually an interesting question. One argument I haven't found in the discussion is the simple lack of obvious return values for the arguments. Let's count the ways the hypthetical int pow_int(int, int) function could fail.
Overflow
Result undefined pow_int(0,0)
Result can't be represented pow_int(2,-1)
The function has at least 2 failure modes. Integers can't represent these values, the behaviour of the function in these cases would need to be defined by the standard - and programmers would need to be aware of how exactly the function handles these cases.
Overall leaving the function out seems like the only sensible option. The programmer can use the floating point version with all the error reporting available instead.

Short answer:
A specialisation of pow(x, n) to where n is a natural number is often useful for time performance. But the standard library's generic pow() still works pretty (surprisingly!) well for this purpose and it is absolutely critical to include as little as possible in the standard C library so it can be made as portable and as easy to implement as possible. On the other hand, that doesn't stop it at all from being in the C++ standard library or the STL, which I'm pretty sure nobody is planning on using in some kind of embedded platform.
Now, for the long answer.
pow(x, n) can be made much faster in many cases by specialising n to a natural number. I have had to use my own implementation of this function for almost every program I write (but I write a lot of mathematical programs in C). The specialised operation can be done in O(log(n)) time, but when n is small, a simpler linear version can be faster. Here are implementations of both:
// Computes x^n, where n is a natural number.
double pown(double x, unsigned n)
{
double y = 1;
// n = 2*d + r. x^n = (x^2)^d * x^r.
unsigned d = n >> 1;
unsigned r = n & 1;
double x_2_d = d == 0? 1 : pown(x*x, d);
double x_r = r == 0? 1 : x;
return x_2_d*x_r;
}
// The linear implementation.
double pown_l(double x, unsigned n)
{
double y = 1;
for (unsigned i = 0; i < n; i++)
y *= x;
return y;
}
(I left x and the return value as doubles because the result of pow(double x, unsigned n) will fit in a double about as often as pow(double, double) will.)
(Yes, pown is recursive, but breaking the stack is absolutely impossible since the maximum stack size will roughly equal log_2(n) and n is an integer. If n is a 64-bit integer, that gives you a maximum stack size of about 64. No hardware has such extreme memory limitations, except for some dodgy PICs with hardware stacks that only go 3 to 8 function calls deep.)
As for performance, you'll be surprised by what a garden variety pow(double, double) is capable of. I tested a hundred million iterations on my 5-year-old IBM Thinkpad with x equal to the iteration number and n equal to 10. In this scenario, pown_l won. glibc pow() took 12.0 user seconds, pown took 7.4 user seconds, and pown_l took only 6.5 user seconds. So that's not too surprising. We were more or less expecting this.
Then, I let x be constant (I set it to 2.5), and I looped n from 0 to 19 a hundred million times. This time, quite unexpectedly, glibc pow won, and by a landslide! It took only 2.0 user seconds. My pown took 9.6 seconds, and pown_l took 12.2 seconds. What happened here? I did another test to find out.
I did the same thing as above only with x equal to a million. This time, pown won at 9.6s. pown_l took 12.2s and glibc pow took 16.3s. Now, it's clear! glibc pow performs better than the three when x is low, but worst when x is high. When x is high, pown_l performs best when n is low, and pown performs best when x is high.
So here are three different algorithms, each capable of performing better than the others under the right circumstances. So, ultimately, which to use most likely depends on how you're planning on using pow, but using the right version is worth it, and having all of the versions is nice. In fact, you could even automate the choice of algorithm with a function like this:
double pown_auto(double x, unsigned n, double x_expected, unsigned n_expected) {
if (x_expected < x_threshold)
return pow(x, n);
if (n_expected < n_threshold)
return pown_l(x, n);
return pown(x, n);
}
As long as x_expected and n_expected are constants decided at compile time, along with possibly some other caveats, an optimising compiler worth its salt will automatically remove the entire pown_auto function call and replace it with the appropriate choice of the three algorithms. (Now, if you are actually going to attempt to use this, you'll probably have to toy with it a little, because I didn't exactly try compiling what I'd written above. ;))
On the other hand, glibc pow does work and glibc is big enough already. The C standard is supposed to be portable, including to various embedded devices (in fact embedded developers everywhere generally agree that glibc is already too big for them), and it can't be portable if for every simple math function it needs to include every alternative algorithm that might be of use. So, that's why it isn't in the C standard.
footnote: In the time performance testing, I gave my functions relatively generous optimisation flags (-s -O2) that are likely to be comparable to, if not worse than, what was likely used to compile glibc on my system (archlinux), so the results are probably fair. For a more rigorous test, I'd have to compile glibc myself and I reeeally don't feel like doing that. I used to use Gentoo, so I remember how long it takes, even when the task is automated. The results are conclusive (or rather inconclusive) enough for me. You're of course welcome to do this yourself.
Bonus round: A specialisation of pow(x, n) to all integers is instrumental if an exact integer output is required, which does happen. Consider allocating memory for an N-dimensional array with p^N elements. Getting p^N off even by one will result in a possibly randomly occurring segfault.

One reason for C++ to not have additional overloads is to be compatible with C.
C++98 has functions like double pow(double, int), but these have been removed in C++11 with the argument that C99 didn't include them.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3286.html#550
Getting a slightly more accurate result also means getting a slightly different result.

The World is constantly evolving and so are the programming languages. The fourth part of the C decimal TR¹ adds some more functions to <math.h>. Two families of these functions may be of interest for this question:
The pown functions, that takes a floating point number and an intmax_t exponent.
The powr functions, that takes two floating points numbers (x and y) and compute x to the power y with the formula exp(y*log(x)).
It seems that the standard guys eventually deemed these features useful enough to be integrated in the standard library. However, the rational is that these functions are recommended by the ISO/IEC/IEEE 60559:2011 standard for binary and decimal floating point numbers. I can't say for sure what "standard" was followed at the time of C89, but the future evolutions of <math.h> will probably be heavily influenced by the future evolutions of the ISO/IEC/IEEE 60559 standard.
Note that the fourth part of the decimal TR won't be included in C2x (the next major C revision), and will probably be included later as an optional feature. There hasn't been any intent I know of to include this part of the TR in a future C++ revision.
¹ You can find some work-in-progress documentation here.

Here's a really simple O(log(n)) implementation of pow() that works for any numeric types, including integers:
template<typename T>
static constexpr inline T pown(T x, unsigned p) {
T result = 1;
while (p) {
if (p & 0x1) {
result *= x;
}
x *= x;
p >>= 1;
}
return result;
}
It's better than enigmaticPhysicist's O(log(n)) implementation because it doesn't use recursion.
It's also almost always faster than his linear implementation (as long as p > ~3) because:
it doesn't require any extra memory
it only does ~1.5x more operations per loop
it only does ~1.25x more memory updates per loop

Perhaps because the processor's ALU didn't implement such a function for integers, but there is such an FPU instruction (as Stephen points out, it's actually a pair). So it was actually faster to cast to double, call pow with doubles, then test for overflow and cast back, than to implement it using integer arithmetic.
(for one thing, logarithms reduce powers to multiplication, but logarithms of integers lose a lot of accuracy for most inputs)
Stephen is right that on modern processors this is no longer true, but the C standard when the math functions were selected (C++ just used the C functions) is now what, 20 years old?

As a matter of fact, it does.
Since C++11 there is a templated implementation of pow(int, int) --- and even more general cases, see (7) in
http://en.cppreference.com/w/cpp/numeric/math/pow
EDIT: purists may argue this is not correct, as there is actually "promoted" typing used. One way or another, one gets a correct int result, or an error, on int parameters.

A very simple reason:
5^-2 = 1/25
Everything in the STL library is based on the most accurate, robust stuff imaginable. Sure, the int would return to a zero (from 1/25) but this would be an inaccurate answer.
I agree, it's weird in some cases.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js