C++11 - typeid uniqueness - c++

In C++11, I am using this
typeid(T).name()
for my own hash computation. I don't need the result to be same between the program runs or compilations. I just need it to be unique for the types.
I know, that it can return same name for different types, but it is usually with const, pointers etc.
In my case, T is only class XY, struct XX or derived types.
In this case, can I assume, that T will be unique?

You should use std::type_index for mapping purposes.
The type_index class is a wrapper class around a std::type_info
object, that can be used as index in associative and unordered
associative containers. The relationship with type_info object is
maintained through a pointer, therefore type_index is
CopyConstructible and CopyAssignable.

std::type_info::name is implementation-defined, so you shouldn't rely on it being unique for different types.
Since you're doing this for hash computation, you should use std::type_info::hash_code instead. Although this doesn't guarantee that the values will be unique, the standard says that implementations should try and return different values for different types. So long as your hash map implementation has reasonable collision handling, this should be sufficient for you.

As stated on cppreference:
Returns an implementation defined null-terminated character string
containing the name of the type. No guarantees are given, in
particular, the returned string can be identical for several types and
change between invocations of the same program.
So, no, you can't. You can't assume anything actually.
Although, hash_code() gives you:
size_t hash_code() const noexcept;
7 Returns: An unspecified value, except that within a single execution
of the program, it shall return the same value for any two type_info
objects which compare equal.
8 Remark: an implementation should return different values for two
type_info objects which do not compare equal.
Which means that hash_code() can be used to distinguish two different types only if operator== for type_info supports this.

What you might be able to do is take address of a member.
class HashBase {
virtual intptr_t get() = 0;
};
template <typename T>
class Hash : HashBase {
static const int _addr = 0;
intptr_t get() override { return reinterpret_cast<intptr_t>(&_addr); }
};

Related

Iterate over C++ Vector of Booleans using Boolean References [duplicate]

Item 18 of Scott Meyers's book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says to avoid vector <bool> as it's not an STL container and it doesn't really hold bools.
The following code:
vector <bool> v;
bool *pb =&v[0];
will not compile, violating a requirement of STL containers.
Error:
cannot convert 'std::vector<bool>::reference* {aka std::_Bit_reference*}' to 'bool*' in initialization
vector<T>::operator [] return type is supposed to be T&, but why is it a special case for vector<bool>?
What does vector<bool> really consist of?
The Item further says:
deque<bool> v; // is a STL container and it really contains bools
Can this be used as an alternative to vector<bool>?
Can anyone please explain this?
For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.
In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.
On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].
Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.
The problems is that vector<bool> returns a proxy reference object instead of a true reference, so that C++98 style code bool * p = &v[0]; won't compile. However, modern C++11 with auto p = &v[0]; can be made to compile if operator& also returns a proxy pointer object. Howard Hinnant has written a blog post detailing the algorithmic improvements when using such proxy references and pointers.
Scott Meyers has a long Item 30 in More Effective C++ about proxy classes. You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy<T> and iterator_proxy<T>) can be made mutually consistent in the sense that reference_proxy<T>::operator&() and iterator_proxy<T>::operator*() are each other's inverse.
However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector<bool>::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).
TL;DR: no vector<bool> is not a container because the Standard requires a real reference, but it can be made to behave almost like a container, at least much closer with C++11 (auto) than in C++98.
vector<bool> contains boolean values in compressed form using only one bit for value (and not 8 how bool[] arrays do). It is not possible to return a reference to a bit in c++, so there is a special helper type, "bit reference", which provides you a interface to some bit in memory and allows you to use standard operators and casts.
Many consider the vector<bool> specialization to be a mistake.
In a paper "Deprecating Vestigial Library Parts in C++17"
There is a proposal to
Reconsider vector Partial Specialization.
There has been a long history of the bool partial specialization of
std::vector not satisfying the container requirements, and in
particular, its iterators not satisfying the requirements of a random
access iterator. A previous attempt to deprecate this container was
rejected for C++11, N2204.
One of the reasons for rejection is that it is not clear what it would
mean to deprecate a particular specialization of a template. That
could be addressed with careful wording. The larger issue is that the
(packed) specialization of vector offers an important
optimization that clients of the standard library genuinely seek, but
would not longer be available. It is unlikely that we would be able to
deprecate this part of the standard until a replacement facility is
proposed and accepted, such as N2050. Unfortunately, there are no such
revised proposals currently being offered to the Library Evolution
Working Group.
Look at how it is implemented. the STL builds vastly on templates and therefore the headers do contain the code they do.
for instance look at the stdc++ implementation here.
also interesting even though not an stl conforming bit vector is the llvm::BitVector from here.
the essence of the llvm::BitVector is a nested class called reference and suitable operator overloading to make the BitVector behaves similar to vector with some limitations. The code below is a simplified interface to show how BitVector hides a class called reference to make the real implementation almost behave like a real array of bool without using 1 byte for each value.
class BitVector {
public:
class reference {
reference &operator=(reference t);
reference& operator=(bool t);
operator bool() const;
};
reference operator[](unsigned Idx);
bool operator[](unsigned Idx) const;
};
this code here has the nice properties:
BitVector b(10, false); // size 10, default false
BitVector::reference &x = b[5]; // that's what really happens
bool y = b[5]; // implicitly converted to bool
assert(b[5] == false); // converted to bool
assert(b[6] == b[7]); // bool operator==(const reference &, const reference &);
b[5] = true; // assignment on reference
assert(b[5] == true); // and actually it does work.
This code actually has a flaw, try to run:
std::for_each(&b[5], &b[6], some_func); // address of reference not an iterator
will not work because assert( (&b[5] - &b[3]) == (5 - 3) ); will fail (within llvm::BitVector)
this is the very simple llvm version. std::vector<bool> has also working iterators in it.
thus the call for(auto i = b.begin(), e = b.end(); i != e; ++i) will work. and also std::vector<bool>::const_iterator.
However there are still limitations in std::vector<bool> that makes it behave differently in some cases.
This comes from http://www.cplusplus.com/reference/vector/vector-bool/
Vector of bool This is a specialized version of vector, which is used
for elements of type bool and optimizes for space.
It behaves like the unspecialized version of vector, with the
following changes:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is
stored in a single bit.
Elements are not constructed using the allocator object, but their value is directly set on the proper bit in the internal storage.
Member function flip and a new signature for member swap.
A special member type, reference, a class that accesses individual bits in the container's internal storage with an interface that
emulates a bool reference. Conversely, member type const_reference is
a plain bool.
The pointer and iterator types used by the container are not necessarily neither pointers nor conforming iterators, although they
shall simulate most of their expected behavior.
These changes provide a quirky interface to this specialization and
favor memory optimization over processing (which may or may not suit
your needs). In any case, it is not possible to instantiate the
unspecialized template of vector for bool directly. Workarounds to
avoid this range from using a different type (char, unsigned char) or
container (like deque) to use wrapper types or further specialize for
specific allocator types.
bitset is a class that provides a similar functionality for fixed-size
arrays of bits.

Why the reference of std::vector<bool> is a prvalue whereas some document says that:"std::vector::reference is to provide an l-value"? [duplicate]

Item 18 of Scott Meyers's book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says to avoid vector <bool> as it's not an STL container and it doesn't really hold bools.
The following code:
vector <bool> v;
bool *pb =&v[0];
will not compile, violating a requirement of STL containers.
Error:
cannot convert 'std::vector<bool>::reference* {aka std::_Bit_reference*}' to 'bool*' in initialization
vector<T>::operator [] return type is supposed to be T&, but why is it a special case for vector<bool>?
What does vector<bool> really consist of?
The Item further says:
deque<bool> v; // is a STL container and it really contains bools
Can this be used as an alternative to vector<bool>?
Can anyone please explain this?
For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.
In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.
On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].
Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.
The problems is that vector<bool> returns a proxy reference object instead of a true reference, so that C++98 style code bool * p = &v[0]; won't compile. However, modern C++11 with auto p = &v[0]; can be made to compile if operator& also returns a proxy pointer object. Howard Hinnant has written a blog post detailing the algorithmic improvements when using such proxy references and pointers.
Scott Meyers has a long Item 30 in More Effective C++ about proxy classes. You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy<T> and iterator_proxy<T>) can be made mutually consistent in the sense that reference_proxy<T>::operator&() and iterator_proxy<T>::operator*() are each other's inverse.
However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector<bool>::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).
TL;DR: no vector<bool> is not a container because the Standard requires a real reference, but it can be made to behave almost like a container, at least much closer with C++11 (auto) than in C++98.
vector<bool> contains boolean values in compressed form using only one bit for value (and not 8 how bool[] arrays do). It is not possible to return a reference to a bit in c++, so there is a special helper type, "bit reference", which provides you a interface to some bit in memory and allows you to use standard operators and casts.
Many consider the vector<bool> specialization to be a mistake.
In a paper "Deprecating Vestigial Library Parts in C++17"
There is a proposal to
Reconsider vector Partial Specialization.
There has been a long history of the bool partial specialization of
std::vector not satisfying the container requirements, and in
particular, its iterators not satisfying the requirements of a random
access iterator. A previous attempt to deprecate this container was
rejected for C++11, N2204.
One of the reasons for rejection is that it is not clear what it would
mean to deprecate a particular specialization of a template. That
could be addressed with careful wording. The larger issue is that the
(packed) specialization of vector offers an important
optimization that clients of the standard library genuinely seek, but
would not longer be available. It is unlikely that we would be able to
deprecate this part of the standard until a replacement facility is
proposed and accepted, such as N2050. Unfortunately, there are no such
revised proposals currently being offered to the Library Evolution
Working Group.
Look at how it is implemented. the STL builds vastly on templates and therefore the headers do contain the code they do.
for instance look at the stdc++ implementation here.
also interesting even though not an stl conforming bit vector is the llvm::BitVector from here.
the essence of the llvm::BitVector is a nested class called reference and suitable operator overloading to make the BitVector behaves similar to vector with some limitations. The code below is a simplified interface to show how BitVector hides a class called reference to make the real implementation almost behave like a real array of bool without using 1 byte for each value.
class BitVector {
public:
class reference {
reference &operator=(reference t);
reference& operator=(bool t);
operator bool() const;
};
reference operator[](unsigned Idx);
bool operator[](unsigned Idx) const;
};
this code here has the nice properties:
BitVector b(10, false); // size 10, default false
BitVector::reference &x = b[5]; // that's what really happens
bool y = b[5]; // implicitly converted to bool
assert(b[5] == false); // converted to bool
assert(b[6] == b[7]); // bool operator==(const reference &, const reference &);
b[5] = true; // assignment on reference
assert(b[5] == true); // and actually it does work.
This code actually has a flaw, try to run:
std::for_each(&b[5], &b[6], some_func); // address of reference not an iterator
will not work because assert( (&b[5] - &b[3]) == (5 - 3) ); will fail (within llvm::BitVector)
this is the very simple llvm version. std::vector<bool> has also working iterators in it.
thus the call for(auto i = b.begin(), e = b.end(); i != e; ++i) will work. and also std::vector<bool>::const_iterator.
However there are still limitations in std::vector<bool> that makes it behave differently in some cases.
This comes from http://www.cplusplus.com/reference/vector/vector-bool/
Vector of bool This is a specialized version of vector, which is used
for elements of type bool and optimizes for space.
It behaves like the unspecialized version of vector, with the
following changes:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is
stored in a single bit.
Elements are not constructed using the allocator object, but their value is directly set on the proper bit in the internal storage.
Member function flip and a new signature for member swap.
A special member type, reference, a class that accesses individual bits in the container's internal storage with an interface that
emulates a bool reference. Conversely, member type const_reference is
a plain bool.
The pointer and iterator types used by the container are not necessarily neither pointers nor conforming iterators, although they
shall simulate most of their expected behavior.
These changes provide a quirky interface to this specialization and
favor memory optimization over processing (which may or may not suit
your needs). In any case, it is not possible to instantiate the
unspecialized template of vector for bool directly. Workarounds to
avoid this range from using a different type (char, unsigned char) or
container (like deque) to use wrapper types or further specialize for
specific allocator types.
bitset is a class that provides a similar functionality for fixed-size
arrays of bits.

Comparing two type_info from typeid() operator

Is it OK to compare the results from two typeid() results? cppreference has this note about this operator:
There is no guarantee that the same std::type_info instance will be
referred to by all evaluations of the typeid expression on the same
type, although std::type_info::hash_code of those type_info objects
would be identical, as would be their std::type_index.
const std::type_info& ti1 = typeid(A);
const std::type_info& ti2 = typeid(A);
assert(&ti1 == &ti2); // not guaranteed
assert(ti1.hash_code() == ti2.hash_code()); // guaranteed
assert(std::type_index(ti1) == std::type_index(ti2)); // guaranteed
My understanding is that the the return is a reference to a static L value of type type_info. It's saying &ti1 == &ti2 is not guaranteed to be the same for the same types. It instead says to use the hash code or the std::type_index class. However it doesn't mention if comparing the types directly:
ti1 == ti2;
is guaranteed to be true. I've used this before, does the documentation implicitly mean this is guaranteed?
std::type_info is a class-type, which means that the ti1 == ti2 expression will trigger an overloaded operator==. Its behavior is described by [type.info]/p2:
bool operator==(const type_info& rhs) const noexcept;
Effects: Compares the current object with rhs.
Returns: true if the two values describe the same type.
Some information on the implementation could be of interest: for g++/clang, the type_info starts with two pointers. The second one points to a fixed character string, which is the value returned by name().
** Note that this implementation is not required by the standard, and may vary across different targets for the same compiler.
Comparison is done by first checking if the type_info are at the same address; if so, they are equal; if not, next call strcmp() on the two 'name' strings. And the strcmp result determines the ordering for .before() method (and by extension, the ordering for type_index).
Usually, there is only one type_info in the program for any given type. But, when using shared libraries, it's possible to end up with one in a shared library, and another somewhere else. So, comparing the address is not sufficient to test whether two type_info represent the same type, nor can the address be used for ordering.
If two type_info exist for the same type, their name() will return equivalent character strings, but those strings will be at different addresses, because the string constant and the type_info are generated together.
The .hash_code() method is disappointing: it calls a function to hash the name() string, character by character. g++ version calls strlen to find its len, and then calls the same function used for std::hash(std::string). And this happens even if the type is not unknown, as in e.g. typeid(std::complex<float>).hash_code()- where the compiler could, in principle, compute the result at compile time.
In my x86_64 clang++-9.0 installation, I'm seeing an odd result - hash_code() returns the same thing as name(), but cast to a size_t. This will often work, but will fail in cases where two type_info for the same type exist in the program. Also, it's not a very rich hash, consider the range of values which occur within a 64-bit address space. It's possible that my installation is somehow getting the wrong header files and this is the result, but it seems to work OK otherwise. Maybe this is an actual defect and nobody uses hash_code() because it's so slow...
I tried another clang-9 for a RISC processor, and it was similar to g++ for hash_code(), but didn't need to call strlen.

Is this use of reinterpret_cast on differently-qualified struct members safe?

I have looked at the following — related — questions, and none of them seem to address my exact issue: one, two, three.
I am writing a collection of which the elements (key-value pairs) are stored along with some bookkeeping information:
struct Element {
Key key;
Value value;
int flags;
};
std::vector<Element> elements;
(For simplicity, suppose that both Key and Value are standard-layout types. The collection won't be used with any other types anyway.)
In order to support iterator-based access, I've written iterators that override operator-> and operator* to return to the user a pointer and a reference, respectively, to the key-value pair. However, due to the nature of the collection, the user is never allowed to change the returned key. For this reason, I've declared a KeyValuePair structure:
struct KeyValuePair {
const Key key;
Value value;
};
And I've implemented operator-> on the iterator like this:
struct iterator {
size_t index;
KeyValuePair *operator->() {
return reinterpret_cast<KeyValuePair *>(&elements[index]);
}
};
My question is: is this use of reinterpret_cast well-defined, or does it invoke undefined behavior? I have tried to interpret relevant parts of the standard and examined answers to questions about similar issues, however, I failed to draw a definitive conclusion from them, because…:
the two struct types share some initial data members (namely, key and value) that only differ in const-qualification;
the standard does not explicitly say that T and cv T are layout-compatible, but it doesn't state the converse either; furthermore, it mandates that they should have the same representation and alignment requirements;
Two standard-layout class types share a common initial sequence if the first however many members have layout-compatible types;
for union types containing members of class type that share a common initial sequence, it is permitted to examine the members of such initial sequence using either of the union members (9.2p18).
– there's no similar explicit guarantee made about reinterpret_casted pointers-to-structs sharing a common initial sequence.
– it is, however, guaranteed that a pointer-to-struct points to its initial member (9.2p19).
Using merely this information, I found it impossible to deduce whether the Element and KeyValuePair structs share a common initial sequence, or have anything other in common that would justify my reinterpret_cast.
As an aside, if you think using reinterpret_cast for this purpose is inappropriate, and I'm really facing an XY problem and therefore I should simply do something else to achieve my goal, let me know.
My question is: is this use of reinterpret_cast well-defined, or does
it invoke undefined behavior?
reinterpret_cast is the wrong approach here, you're simply violating strict aliasing. It is somewhat perplexing that reinterpret_cast and union diverge here, but the wording is very clear about this scenario.
You might be better off simply defining a union thusly:
union elem_t {
Element e{}; KeyValuePair p;
/* special member functions defined if necessary */
};
… and using that as your vector element type. Note that cv-qualification is ignored when determining layout-compability - [basic.types]/11:
Two types cv1 T1 and cv2 T2 are layout-compatible types if
T1 and T2 are the same type, […]
Hence Element and KeyValuePair do indeed share a common initial sequence, and accessing the corresponding members of p, provided e is alive, is well-defined.
Another approach: Define
struct KeyValuePair {
Key key;
mutable Value value;
};
struct Element : KeyValuePair {
int flags;
};
Now provide an iterator that simply wraps a const_iterator from the vector and upcasts the references/pointers to be exposed. key won't be modifiable, but value will be.

What is `type_info::before` useful for?

According to cplusplus.com, the std::type_info::before() function...
Returns true if the type precedes the type of rhs in the collation order.
The collation order is just an internal order kept by a particular implementation and is not necessarily related to inheritance relations or declaring order.
So what is it useful for?
Consider you want to put your type_info objects as keys into a map<type_info*, value>. The type_info doesn't have an operator < defined, so you must provide your own comparator. The only thing that is guaranteed to work from the type_info interface is the before() function, since neither the addresses of type_info nor the name() must be unique:
struct compare {
bool operator ()(const type_info* a, const type_info* b) const {
return a->before(*b);
}
};
std::map<const type_info*, std::string, compare> m;
void f() {
m[&typeid(int)] = "Hello world";
}
This is useful to define an order on typeinfo objects, e.g. to put them into a std::map. The obvious follow-up question is: why isn't it spelled operator<()? I don't know the answer to this question.
It gives an ordering.
That is required if you want to store values in some containers, like std::map.
Think of it as less-than (<) operator for type_info objects. If you ever wanted to store in ordered collection - such a set of map - you can use it to make an appropriate comparator. It's a reliable and preferred way, as opposed to, say, using type's name which might not be unique.