Using std::string's std::hash specialization without constructing a string object - c++

I have a codebase which makes extensive use of CUDA, which unfortunately only supports C++14 to date. However, I still want to use string_view, which is a C++17 feature. The implementation is relatively straight-forward, especially since I do not require the 'find' functionalities.
However, I do need hashing to work. The standard mandates that std::hash of a string_view must be the equal to the hash of a string constructed from the string_view (and I intend to rely on this guarantee). Is there a standard-conform way of getting the output from std::hash without having to temporarily construct a string object, which may come with an unoptimizable heap allocation (which is the route that string-view-lite went)? I would rather not rely on copying over the algorithm from the concrete stdlib implementation, since that might break in the future or already breaks compilation with older versions.
Alternatively, is there a way to let MSVC (EDIT: v14.16) use std::string_view in C++14-mode, which NVCC also recognizes? It would be great if Clang and GCC had a similar option as well, since the codebase might migrate away from MSVC one day.

You are out of luck in my opinion since, also by assuming that you could mimic the internal structure and pass a tailored object to fulfill and look like a std::string, that would surely be more fragile than just copying the implementation.
I see two choices, either:
you copy the std::hash<std::string> specialization implementation and put some asserts with hardcoded cases which may warn you if anything changes or is different (rather a clumsy solution).
you provide your own hash function which overrides std::string one and pass it as template argument so that you can enforce the constraint of them being equal when working with STL collections.

Related

Why insert_or_assign doesn't have iterator overload?

Issue
In C++17, associative containers in standard library will have insert_or_assign member function, which will do what its name suggests. Unfortunately, it seems like it doesn't have iterator based interface for bulk insert/assign. I even tried to compile small example, and from the compiler error, compiler couldn't find suitable overload, and neither candidate was reasonably close to iterator based interface.
Question
Why C++17 didn't include iterator based insert_or_assign for operations in bulk? Was there any technical reasons? Design issues?
My assumptions and ideas
I don't see any technical reason to not add iterator based bulk insertion/addition. It seems quite doable. It needs to lookup the key anyway, so I don't see any violations of "Don't pay for what you don't use".
Actually not having the overload makes standard library less consistent, and kind of going against itself. Plain insert supports it, so I'd expect insert_or_assign to support that too. I don't think that not having the overload will make it "Easier to use correctly, and harder to use incorrectly".
The only clue left is this notice from cppreference:
insert_or_assign returns more information than operator[] and does not require default-constructibility of the mapped type.
I'm not sure why this might be a limitation, as the associative container has access to all of the internals and doesn't need to deal with operator[] at all.
Binary compatibility is not applicable here, if I didn't forget anything. Modification will be in a header, and everything will need to recompile anyway.
I couldn't find any associated paper either. Splicing maps and sets doesn't seem to mention it. The feature looks like a phantom.
Standard could at least include insert_assign_iterator, so that one could write std::copy(input_first, input_last, insert_assign_iterator{map});, but standard includes neither.
insert_or_emplace is intended to be a better form of doing some_map[key] = value;. The latter requires that the mapped_type is default constructible, while insert_or_emplace does not.
What you're talking about is similar, but different. You have some range of key-value pairs, and you want to stick all of those values into the map, whether they have equivalent keys or not. This was not the problem that these functions were created to solve, as evidenced by the original proposal for them.
It's not that they're necessarily a bad idea. It simply wasn't the problem the function was added to solve.

What is the actual purpose of std::type_info::name()?

Today a colleague of mine came and asked me the question as mentioned in the title.
He's currently trying to reduce the binaries footprint of a codebase, that is also used on small targets (like Cortex M3 and alike). Apparently they have decided to compile with RTTI switched on (GCC actually), to support proper exception handling.
Well, his major complaint was why std::type_info::name() is actually needed at all for support of RTTI, and asked, if I know a way to just suppress generation of the string literals needed to support this, or at least to shorten them.
std::type_info::name
const char* name() const;
Returns an implementation defined null-terminated character string containing the name of the type. No guarantees are given, in particular, the returned string can be identical for several types and change between invocations of the same program.
A ,- however compiler specific -, implementation of e.g. the dynamic_cast<> operator would not use this information, but rather something like a hash-tag for type determination (similar for catch() blocks with exception handling).
I think the latter is clearly expressed by the current standard definitions for
std::type_info::hash_code
std::type_index
I had to agree, that I also don't really see a point of using std::type_info::name(), other than for debugging (logging) purposes. I wasn't a 100% sure that exception handling will work just without RTTI with current versions of GCC (I think they're using 4.9.1), so I hesitated to recommend simply switching off RTTI.
Also it's the case that dynamic_casts<> are used in their code base, but for these, I just recommended not to use it, in favor of static_cast (they don't really have something like plugins, or need for runtime type detection other than assertions).
Question:
Are there real life, production code level use cases for std::type_info::name() other than logging?
Sub-Questions (more concrete):
Does anyone have an idea, how to overcome (work around) the generation of these useless string literals (under assumption they'll never be used)?
Is RTTI really (still) needed to support exception handling with GCC?
(This part is well solved now by #Sehe's answer, and I have accepted it. The other sub-question still remains for the left over generated std::type_info instances for any exceptions used in the code. We're pretty sure, that these literals are never used anywhere)
Bit of related: Strip unused runtime functions which bloat executable (GCC)
Isolating this bit:
The point is, if they switch off RTTI, will exception handling still work properly in GCC? — πάντα ῥεῖ 1 hour ago
The answer is yes:
-fno-rtti
Disable generation of information about every class with virtual functions for use by the C++ runtime type identification features (dynamic_cast and typeid). If you don't use those parts of the language, you can save some space by using this flag. Note that exception handling uses the same information, but it will generate it as needed. The dynamic_cast operator can still be used for casts that do not require runtime type information, i.e. casts to void * or to unambiguous base classes.
Are there real life, production code level use cases for std::type_info::name() other than logging?
The Itanium ABI describes how operator== for std::type_info objects can be easily implemented in terms of testing strings returned from std::type_info::name() for pointer equality.
In a non-flat address space, where it might be possible to have multiple type_info objects for the same type (e.g. because a dynamic library has been loaded with RTLD_LOCAL) the implementation of operator== might need to use strcmp to determine if two types are the same.
So the name() function is used to determine if two type_info objects refer to the same type. For examples of real use cases, that's typically used in at least two places in the standard library, in std::function<F>::target<T>() and std::get_deleter<D>(const std::shared_ptr<T>&).
If you're not using RTTI then all that's irrelevant, as you won't have any type_info objects anyway (and consequently in libstdc++ the function::target and get_deleter functions can't be used).
I think GCC's exception-handling code uses the addresses of type_info objects themselves, not the addresses of the strings returned by name(), so if you use exceptions but no RTTI the name() strings aren't needed.

Specializing std::optional

Will it be possible to specialize std::optional for user-defined types? If not, is it too late to propose this to the standard?
My use case for this is an integer-like class that represents a value within a range. For instance, you could have an integer that lies somewhere in the range [0, 10]. Many of my applications are sensitive to even a single byte of overhead, so I would be unable to use a non-specialized std::optional due to the extra bool. However, a specialization for std::optional would be trivial for an integer that has a range smaller than its underlying type. We could simply store the value 11 in my example. This should provide no space or time overhead over a non-optional value.
Am I allowed to create this specialization in namespace std?
The general rule in 17.6.4.2.1 [namespace.std]/1 applies:
A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
So I would say it's allowed.
N.B. optional will not be part of the C++14 standard, it will be included in a separate Technical Specification on library fundamentals, so there is time to change the rule if my interpretation is wrong.
If you are after a library that efficiently packs the value and the "no-value" flag into one memory location, I recommend looking at compact_optional. It does exactly this.
It does not specialize boost::optional or std::experimental::optional but it can wrap them inside, giving you a uniform interface, with optimizations where possible and a fallback to 'classical' optional where needed.
I've asked about the same thing, regarding specializing optional<bool> and optional<tribool> among other examples, to only use one byte. While the "legality" of doing such things was not under discussion, I do think that one should not, in theory, be allowed to specialize optional<T> in contrast to eg.: hash (which is explicitly allowed).
I don't have the logs with me but part of the rationale is that the interface treats access to the data as access to a pointer or reference, meaning that if you use a different data structure in the internals, some of the invariants of access might change; not to mention providing the interface with access to the data might require something like reinterpret_cast<(some_reference_type)>. Using a uint8_t to store a optional-bool, for example, would impose several extra requirements on the interface of optional<bool> that are different to the ones of optional<T>. What should the return type of operator* be, for example?
Basically, I'm guessing the idea is to avoid the whole vector<bool> fiasco again.
In your example, it might not be too bad, as the access type is still your_integer_type& (or pointer). But in that case, simply designing your integer type to allow for a "zombie" or "undetermined" value instead of relying on optional<> to do the job for you, with its extra overhead and requirements, might be the safest choice.
Make it easy to opt-in to space savings
I have decided that this is a useful thing to do, but a full specialization is a little more work than necessary (for instance, getting operator= correct).
I have posted on the Boost mailing list a way to simplify the task of specializing, especially when you only want to specialize some instantiations of a class template.
http://boost.2283326.n4.nabble.com/optional-Specializing-optional-to-save-space-td4680362.html
My current interface involves a special tag type used to 'unlock' access to particular functions. I have creatively named this type optional_tag. Only optional can construct an optional_tag. For a type to opt-in to a space-efficient representation, it needs the following member functions:
T(optional_tag) constructs an uninitialized value
initialize(optional_tag, Args && ...) constructs an object when there may be one in existence already
uninitialize(optional_tag) destroys the contained object
is_initialized(optional_tag) checks whether the object is currently in an initialized state
By always requiring the optional_tag parameter, we do not limit any function signatures. This is why, for instance, we cannot use operator bool() as the test, because the type may want that operator for other reasons.
An advantage of this over some other possible methods of implementing it is that you can make it work with any type that can naturally support such a state. It does not add any requirements such as having a move constructor.
You can see a full code implementation of the idea at
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/optional/optional.hpp?at=default&fileviewer=file-view-default
and for a class using the specialization:
https://bitbucket.org/davidstone/bounded_integer/src/8c5e7567f0d8b3a04cc98142060a020b58b2a00f/bounded_integer/detail/class.hpp?at=default&fileviewer=file-view-default
(lines 220 through 242)
An alternative approach
This is in contrast to my previous implementation, which required users to specialize a class template. You can see the old version here:
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/optional.hpp
and
https://bitbucket.org/davidstone/bounded_integer/src/2defec41add2079ba023c2c6d118ed8a274423c8/bounded_integer/detail/optional/specialization.hpp
The problem with this approach is that it is simply more work for the user. Rather than adding four member functions, the user must go into a new namespace and specialize a template.
In practice, all specializations would have an in_place_t constructor that forwards all arguments to the underlying type. The optional_tag approach, on the other hand, can just use the underlying type's constructors directly.
In the specialize optional_storage approach, the user also has the responsibility of adding proper reference-qualified overloads of a value function. In the optional_tag approach, we already have the value so we do not have to pull it out.
optional_storage also required standardizing as part of the interface of optional two helper classes, only one of which the user is supposed to specialize (and sometimes delegate their specialization to the other).
The difference between this and compact_optional
compact_optional is a way of saying "Treat this special sentinel value as the type being not present, almost like a NaN". It requires the user to know that the type they are working with has some special sentinel. An easily specializable optional is a way of saying "My type does not need extra space to store the not present state, but that state is not a normal value." It does not require anyone to know about the optimization to take advantage of it; everyone who uses the type gets it for free.
The future
My goal is to get this first into boost::optional, and then part of the std::optional proposal. Until then, you can always use bounded::optional, although it has a few other (intentional) interface differences.
I don't see how allowing or not allowing some particular bit pattern to represent the unengaged state falls under anything the standard covers.
If you were trying to convince a library vendor to do this, it would require an implementation, exhaustive tests to show you haven't inadvertently blown any of the requirements of optional (or accidentally invoked undefined behavior) and extensive benchmarking to show this makes a notable difference in real world (and not just contrived) situations.
Of course, you can do whatever you want to your own code.

How is is_standard_layout useful?

From what I understand, standard layout allows three things:
Empty base class optimization
Backwards compatibility with C with certain pointer casts
Use of offsetof
Now, included in the library is the is_standard_layout predicate metafunction, but I can't see much use for it in generic code as those C features I listed above seem extremely rare to need checking in generic code. The only thing I can think of is using it inside static_assert, but that is only to make code more robust and isn't required.
How is is_standard_layout useful? Are there any things which would be impossible without it, thus requiring it in the standard library?
General response
It is a way of validating assumptions. You wouldn't want to write code that assumes standard layout if that wasn't the case.
C++11 provides a bunch of utilities like this. They are particularly valuable for writing generic code (templates) where you would otherwise have to trust the client code to not make any mistakes.
Notes specific to is_standard_layout
It looks to me like the (pseudo code) definition of is_pod would roughly be...
// note: applied recursively to all members
bool is_pod(T) { return is_standard_layout(T) && is_trivial(T); }
So, you need to know is_standard_layout in order to implement is_pod. Given that, we might as well expose is_standard_layout as a tool available to library developers. Also of note: if you have a use-case for is_pod, you might want to consider the possibility that is_standard_layout might actually be a better (more accurate) choice in that case, since POD is essentially a subset of standard layout.
I get the feeling that they added every conceivable variant of type evaluation, regardless of any obvious value, just in case someone might encounter a need sometime before the next standard comes out. I doubt if piling on these "extra" type properties adds a significant additional burden to compiler developers.
There is a nice discussion of standard layout here: Why is C++11's POD "standard layout" definition the way it is?
There is also a lot of good detail at cppreference.com: Non-static data members

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.
A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.
Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)
Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.