Why isn't std::hash<T> specialized for char*? - c++

Why doesn't the C++ standard specify that std::hash<T> is specialized for char*, const char*, unsigned char*, const unsigned char*, etc? I.e., it would hash the contents of the C string until the terminating null is found.
Any harm in injecting my own specializations into the std namespace for my own code?

Why doesn't the C++ standard specify that std::hash<T> is specialized for char*, const char*, unsigned char*, const unsigned char*, etc?
It looks like it originated from proposal N1456. (emphasis mine)
Some earlier hash table implementations gave char* special treatment: it specialized the default hash function to look at character array being pointed to, rather than the pointer itself. This proposal removes that special treatment. Special treatment makes it slightly easier to use hash tables for C string, but at the cost of removing uniformity and making it harder to write generic code. Since naive users would generally be expected to use std::basic_string instead of C strings, the cost of special treatment outweighs the benefit.
If I'm interpreting this correctly, the reasoning is that supporting C style strings would break code that generically acts on hashes of pointers.
Any harm in injecting my own specializations into the std namespace for my own code?
There is potential harm, yes.
In the future, anything you added to the std namespace could collide with a new symbol name.
In the present, anything you add to the std namespace could be a "better match" for other components of the standard library, silently breaking behavior.

char* (and its ilk) doesn't always mean string. They can be simple byte arrays or binary file dumps or any number of other things. If you mean string in C++, you generally use the "string" class.
As for creating your own, given the above it's a bad idea. For user defined types, though, it is acceptable to create specializations of the std:: functions in the std:: namespace.

There is a standard specialization for pointer types, see here
template< class T > struct hash<T*>;
So, it can cover char* (as sequence of bytes not a C-style string) too.
If you mean a specialization for C-style strings, there's not technically a problem to implement that. But since there is a specialization for std::string in C++, it's not worth to have a specialization for C-style strings.
For second part of your question, you can inject everything in std namespace but, what do you gain? It's against the goal of namespaces. Have your own namespace territory.

Related

is std::string now usable as compile time constant? [duplicate]

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.
It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

Will std::string end up being our compile-time string after all?

Many developers and library authors have been struggling with compile-time strings for quite a few years now - as the standard (library) string, std::string, requires dynamic memory allocation, and isn't constexpr.
So we have a bunch of questions and blog posts about how to get compile-time strings right:
Conveniently Declaring Compile-Time Strings in C++
Concatenate compile-time strings in a template at compile time?
C++ Compile-Time string manipulation
(off-site) Compile-time strings with constexpr
We've now learned that not only is new available in constexpr code, allowing for dynamical allocation at compile-time, but, in fact, std::string will become constexpr in C++20 (C++ standard working group meeting report by Herb Sutter).
Does that mean that for C++20-and-up code we should chuck all of those nifty compile-time string implementations and just always go with std::string?
If not - when would we do so, and when would we stick to what's possible today (other than backwards-compatible code of course)?
Note: I'm not talking about strings whose contents is part of their type, i.e. not talking about the equivalent of std::integral_constant; that's definitely not going to be std::string.
It depends on what you mean by "constexpr string".
What C++20 allows you to do is to use std::string within a function marked constexpr (or consteval). Such a function can create a string, manipulate it, and so forth just like any literal type. However, that string cannot leak out into non-constexpr code; that would be a non-transient allocation and is forbidden.
The thing is, all of the examples you give are attempts to use strings as template parameters. That's a similar-yet-different thing. You're not just talking about building a string at compile-time; you now want to use it to instantiate a template.
C++20 solves this problem by allowing user-defined types to be template parameters. But the requirements on such types are much more strict than merely being literal types. The type must have no non-public data members and the only members are of types that follow those restrictions. Basically, the compiler needs to know that a byte-wise comparison of its data members represents an equivalent value. And even a constexpr-capable std::string doesn't work that way.
But std::array<char, N> can do that. And if you are in constexpr code, call a constexpr function which returns a std::string, and store that string in a constexpr value, then string::size() is a constexpr function. So you can use that to fill in the N for your array.
Copying the characters into a constexpr array (since it's a constexpr value, it's immutable) is a bit more involved, but it's doable.
So C++20 solves those problem, just not (directly) with std::string.

Why is the `std::sto`... series not a template?

I wonder if there is a reason why the std::sto series (e.g. std::stoi, std::stol) is not a function template, like that:
template<typename T>
T sto(std::string const & str, std::size_t *pos = 0, int base = 10);
and then:
template<>
int sto<int>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
template<>
long sto<long>(std::string const & str, std::size_t *pos, int base)
{
// do the stuff.
}
/* etc. */
In my sense, that would be a better design, because for the moment, when I have to convert a string in whatever numerical value an user want, I have to manually manage each case.
Is there a reason to not have such a template function? Is there an assumed choice, or is this just done like that?
Looking at the description of these functions at cppref, I note the following:
... Interprets a signed integer value in the string str.
1) calls std::strtol(str.c_str(), &ptr, base)...
and strol a "C" standard function that's also available in C++.
Reading further, we see: (for the c++ sto* functions):
Return value
The string converted to the specified signed integer type.
Exceptions
std::invalid_argument if no conversion could be performed
std::out_of_range if the converted value would fall out of the range of the result type or if the underlying function (std::strtol or
std::strtoll) sets errno to ERANGE.
So while I have no original source for this, and indeed have never worked with these functions, I would guess that:
TL;DR : These functions are C++-ish wrappers around already existing C/C++ functions -- strtol* -- so they resemble these functions as close as possible.
I have to manage manually each case. Is there a reason to not have such a template function?
In case of such questions, Eric Lippert (C#) usually says something along the lines:
If a feature is missing, then it's missing because noone implemented it yet. And that's because either noone else earlier wanted yet, or because it was considered not worth the effort, or because it couldn't have been finished before publishing the current release".
Here, I guess it's the "not worth" part, but I have neither asked the commitee about, nor managed to find any answer in old questions and faqs. I didn't spend much time searching though.
I say this because I suppose that most common of these functions' functionality (if not all of) is already contained in stream classes, like istringstream. Just like cin/etc, this one also has an all-having operator >>, overloaded for all base numeric types (and more).
Furthermore, the stream manipulators like std::hex (std::setbase) already solve the problem of passing various type-dependent configuration parameters to the actual conversion functions. No problems with mixed function signatures (like those mentioned by DavidHaim in his answer). Here's just a single operator>>.
So.. since if we have it in streams, if we already can read numbers/etc from strings with simple foo >> bar >> setbase(42) >> baz >> ..., then I think it was not worth the effort to add more complicated layers to old C runtime functions.
No proof for that though. Just a hunch.
The problem with template specialization is that the specialization requires you to match the original template function signature, so each specialization must implement the interface of (string,pos,base).
If you would like to have some other type which does not follows this interface, you are in trouble.
Suppose that, in the future, we would like to have sto<std::pair<int,int>>. We will want to have pos and base for the first and the second stringified integer. we would like the signature to be in the form of string,pos1,base1,pos2,base2. Since sto signature is already set, we cannot do it.
You can always wrap std::sto* in your implementation of sto for integral types, but you cannot do that the other way around.
The purpose of these functions is to provide simple conversions for common cases. They are not intended as a general-purpose conversion suite. std::ostringstream is much better for that kind of thing.
In my sense, there would be a better design, because for the moment,
when I have to convert a string in whatever numerical value an user
want, I have to manage manually each case.
No, it would not. Templates goal (deliberately setting T-MP apart) is not to replace overloading; you should always prefer overloading to templates. Actually, it's something the language already does for you! Between a candidate function and a possible template instantation, the former will be prefered. Using language features for the sake of it is bad.
I don't see how templates could help either. Whatever type the user decides to input, it won't be known till runtime, and template types are deduced at compile time. C++ is a statically typed language. In this case, templates will just add an unneeded layer of complexity over normal function overloading.

Uses of std::basic_string

The basic_string class was apparently designed as a general purpose container, as I cannot find any text-specific function in its specification except for the c_str() function. Just out of curiosity, have you ever used the std::basic_string container class for anything else than storing human-readable character data?
The reason I ask this is because one often has to choose between making something general or specific. The designers chose to make the std::basic_string class general, but I doubt it is ever used that way.
It was designed as a string class (hence, for example, length() and all those dozens of find functions), but after the introduction of the STL into the std lib it was outfitted to be an STL container, too (hence size() and the iterators, with <algorithm> making all the find functions redundant).
It's main purpose is to store characters, though. Using anything than PODs isn't guaranteed to work (and doesn't work, for example, when using Dinkumware's std lib). Also, the necessary std::char_traits isn't required to be available for anything else than char and wchar_t (although many implementations come with a reasonable implementation of the base template).
In the original standard, the class wasn't required to store its data in a contiguous piece of memory, but this has changed with C++03.
In short, it's mostly useful as a container of characters (a.k.a. "string"), where "character" has a fairly wide definition.
The "wildest" I have used it for is for storing differently encoded strings by using different character types. That way, strings of different encodings are incompatible even if they use the same character size (ASCII and UTF-8) and, e.g., assignment causes compile-time errors.
yes - I've implemented state machine for 'unsigned int'. To store/compare states basic_string has been used

operator char* in STL string class

Why doesn't the STL string class have an overloaded char* operator built-in? Is there any specific reason for them to avoid it?
If there was one, then using the string class with C functions would become much more convenient.
I would like to know your views.
Following is the quote from Josuttis STL book:
However, there is no automatic type
conversion from a string object to a
C-string. This is for safety reasons
to prevent unintended type conversions
that result in strange behavior (type
char* often has strange behavior) and
ambiguities (for example, in an
expression that combines a string and
a C-string it would be possible to
convert string into char* and vice
versa). Instead, there are several
ways to create or write/copy in a
C-string, In particular, c_str() is
provided to generate the value of a
string as a C-string (as a character
array that has '\0' as its last
character).
You should always avoid cast operators, as they tend to introduce ambiguities into your code that can only be resolved with the use of further casts, or worse still compile but don't do what you expect. A char*() operator would have lots of problems. For example:
string s = "hello";
strcpy( s, "some more text" );
would compile without a warning, but clobber the string.
A const version would be possible, but as strings must (possibly) be copied in order to implement it, it would have an undesirable hidden cost. The explicit c_str() function means you must always state that you really intend to use a const char *.
The string template specification deliberately allows for a "disconnected" representation of strings, where the entire string contents is made up of multiple chunks. Such a representation doesn't allow for easy conversion to char*.
However, the string template also provides the c_str method for precisely the purpose you want: what's wrong with using that method?
By 1998-2002 it was hot topic of c++ forums. The main problem - zero terminator. Spec of std::?string allows zero character as normal, but char* string doesn't.
You can use c_str instead:
string s("I like rice!");
const char* cstr = s.c_str();
I believe that in most cases you don't need the char*, and can work more conveniently with the string class itself.
If you need interop with C-style functions, using a std::vector<char> / <wchar_t> is often easier.
It's not as convenient, and unfortunately you can't O(1)-swap it with a std::string (now that would be a nice thing).
In that respect, I much prefer the interface of MFC/ATL CString which has stricter performance guarantees, provides interop, and doesn't treat wide character/unicode strings as totally foreign (but ok, the latter is somewhat platform specific).