Is this use of reinterpret_cast on differently-qualified struct members safe? - c++

I have looked at the following — related — questions, and none of them seem to address my exact issue: one, two, three.
I am writing a collection of which the elements (key-value pairs) are stored along with some bookkeeping information:
struct Element {
Key key;
Value value;
int flags;
};
std::vector<Element> elements;
(For simplicity, suppose that both Key and Value are standard-layout types. The collection won't be used with any other types anyway.)
In order to support iterator-based access, I've written iterators that override operator-> and operator* to return to the user a pointer and a reference, respectively, to the key-value pair. However, due to the nature of the collection, the user is never allowed to change the returned key. For this reason, I've declared a KeyValuePair structure:
struct KeyValuePair {
const Key key;
Value value;
};
And I've implemented operator-> on the iterator like this:
struct iterator {
size_t index;
KeyValuePair *operator->() {
return reinterpret_cast<KeyValuePair *>(&elements[index]);
}
};
My question is: is this use of reinterpret_cast well-defined, or does it invoke undefined behavior? I have tried to interpret relevant parts of the standard and examined answers to questions about similar issues, however, I failed to draw a definitive conclusion from them, because…:
the two struct types share some initial data members (namely, key and value) that only differ in const-qualification;
the standard does not explicitly say that T and cv T are layout-compatible, but it doesn't state the converse either; furthermore, it mandates that they should have the same representation and alignment requirements;
Two standard-layout class types share a common initial sequence if the first however many members have layout-compatible types;
for union types containing members of class type that share a common initial sequence, it is permitted to examine the members of such initial sequence using either of the union members (9.2p18).
– there's no similar explicit guarantee made about reinterpret_casted pointers-to-structs sharing a common initial sequence.
– it is, however, guaranteed that a pointer-to-struct points to its initial member (9.2p19).
Using merely this information, I found it impossible to deduce whether the Element and KeyValuePair structs share a common initial sequence, or have anything other in common that would justify my reinterpret_cast.
As an aside, if you think using reinterpret_cast for this purpose is inappropriate, and I'm really facing an XY problem and therefore I should simply do something else to achieve my goal, let me know.

My question is: is this use of reinterpret_cast well-defined, or does
it invoke undefined behavior?
reinterpret_cast is the wrong approach here, you're simply violating strict aliasing. It is somewhat perplexing that reinterpret_cast and union diverge here, but the wording is very clear about this scenario.
You might be better off simply defining a union thusly:
union elem_t {
Element e{}; KeyValuePair p;
/* special member functions defined if necessary */
};
… and using that as your vector element type. Note that cv-qualification is ignored when determining layout-compability - [basic.types]/11:
Two types cv1 T1 and cv2 T2 are layout-compatible types if
T1 and T2 are the same type, […]
Hence Element and KeyValuePair do indeed share a common initial sequence, and accessing the corresponding members of p, provided e is alive, is well-defined.
Another approach: Define
struct KeyValuePair {
Key key;
mutable Value value;
};
struct Element : KeyValuePair {
int flags;
};
Now provide an iterator that simply wraps a const_iterator from the vector and upcasts the references/pointers to be exposed. key won't be modifiable, but value will be.

Related

Can C++20 range adaptors refine the range concept of underlying range?

Consider the following simple range:
struct my_range {
int* begin();
int* end();
const int* data();
};
Although this class has a data() member, according to the definition of contiguous_range in [range.refinements]:
template<class T>
concept contiguous_­range =
random_­access_­range<T> && contiguous_­iterator<iterator_t<T>> &&
requires(T& t) {
{ ranges::data(t) } -> same_­as<add_pointer_t<range_reference_t<T>>>;
};
ranges::data(t) will directly call the my_range's member function data() and return const int*, but since my_range::begin() returns int*, this makes add_pointer_t<range_reference_t<my_range>> to be int*, so the last requires-clause is not satisfied, so that my_range is not a contiguous_range.
However, when I apply some range adaptors to my_range, it can construct a contiguous_range (goldbot):
random_access_range auto r1 = my_range{};
static_assert(!contiguous_range<my_range>);
contiguous_range auto r2 = r1 | std::views::take(1);
This is because take_view inherits view_interface, and the view_interface::data() only constrains the derived's iterator to be contiguous_iterator.
Since my_range::begin() returns int* which models contiguous_iterator, so view_interface::data() is instantiated and returns to_address(ranges::begin(derived)) which is int*, this makes both take_view::data() and begin() return int*, so r2 satisfies the last requires-clause and models contiguous_range.
Here, the range adaptors seem to refine the range concept of the underlying range, that is, converting a random_access_range to a contiguous_range, which seems to be dangerous since it makes ranges::data(r2) can return a modifiable int* pointer:
std::same_as<const int*> auto d1 = r1.data();
std::same_as<int*> auto d2 = r2.data();
I don't know if this refinement is allowed? Can this be considered a defect of the standard? Or is there something wrong with the definition of my_range?
I would not consider this a defect. The iterator/range model treats iterators as the truth. Several kinds of ranges deliberately have more functionality than just being a pair of iterators of some kind. This is because said functionality is materially useful and users of a range of a particular kind should expect to be able to use it. But this also leaves open the possibility of defining an incoherent range: where the iterator concept is stronger than the range concept because the range lacks certain functionality expected of the iterator concept.
If someone creates an incoherent range type, any functionality that operates on such a range (views, but also algorithms) has 3 options:
Believe the iterators.
Believe the range.
Error out.
Now for algorithms, if the algorithm wants to use the data member, it makes sense that it will believe the range. That is where the member is after all.
But for a view, does it make sense to believe the range? Views don't store copies of ranges. After all, a range can be a container. They instead store and operate on iterators and sentinels. They therefore treat ranges as just a way to get an iterator/sentinel pair.
When a view defines itself as a particular range type, it therefore manufactures the ancillary functional of the range from what it stores: the iterator/sentinel pair. And most such things are pretty simple to manufacture. The data member of a contiguous_range can be manufactured by using std::to_address (a requirement of being a contiguous_iterator) on the result of begin().
So when given an incoherent range, it would actually be harder to filter such things out based on the original range type. Particularly in light of view_interface, which only sees the new view type, not the range it is built from.
After all, not all views are built from other ranges. iota_view is a view, but it isn't built from anything. But its iterators are random access; so too is its range. single_view is likewise not built from a "range"; it treats a single object as a single-element contiguous range. And subrange is built from an iterator/sentinel pair, not a range.
So either views built from other ranges would have to have their own special view_interface... or you create circumstances where views are just better than their original ranges. Or you error out.
It should also be noted that the current behavior is 100% safe. No code will be functionally broken by having a view be stronger than the range it was built from. After all, your not-quite-contiguous_range type still provides non-const access to the elements. The user just has to work a bit harder for it.

Why the reference of std::vector<bool> is a prvalue whereas some document says that:"std::vector::reference is to provide an l-value"? [duplicate]

Item 18 of Scott Meyers's book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says to avoid vector <bool> as it's not an STL container and it doesn't really hold bools.
The following code:
vector <bool> v;
bool *pb =&v[0];
will not compile, violating a requirement of STL containers.
Error:
cannot convert 'std::vector<bool>::reference* {aka std::_Bit_reference*}' to 'bool*' in initialization
vector<T>::operator [] return type is supposed to be T&, but why is it a special case for vector<bool>?
What does vector<bool> really consist of?
The Item further says:
deque<bool> v; // is a STL container and it really contains bools
Can this be used as an alternative to vector<bool>?
Can anyone please explain this?
For space-optimization reasons, the C++ standard (as far back as C++98) explicitly calls out vector<bool> as a special standard container where each bool uses only one bit of space rather than one byte as a normal bool would (implementing a kind of "dynamic bitset"). In exchange for this optimization it doesn't offer all the capabilities and interface of a normal standard container.
In this case, since you can't take the address of a bit within a byte, things such as operator[] can't return a bool& but instead return a proxy object that allows to manipulate the particular bit in question. Since this proxy object is not a bool&, you can't assign its address to a bool* like you could with the result of such an operator call on a "normal" container. In turn this means that bool *pb =&v[0]; isn't valid code.
On the other hand deque doesn't have any such specialization called out so each bool takes a byte and you can take the address of the value return from operator[].
Finally note that the MS standard library implementation is (arguably) suboptimal in that it uses a small chunk size for deques, which means that using deque as a substitute isn't always the right answer.
The problems is that vector<bool> returns a proxy reference object instead of a true reference, so that C++98 style code bool * p = &v[0]; won't compile. However, modern C++11 with auto p = &v[0]; can be made to compile if operator& also returns a proxy pointer object. Howard Hinnant has written a blog post detailing the algorithmic improvements when using such proxy references and pointers.
Scott Meyers has a long Item 30 in More Effective C++ about proxy classes. You can come a long way to almost mimic the builtin types: for any given type T, a pair of proxies (e.g. reference_proxy<T> and iterator_proxy<T>) can be made mutually consistent in the sense that reference_proxy<T>::operator&() and iterator_proxy<T>::operator*() are each other's inverse.
However, at some point one needs to map the proxy objects back to behave like T* or T&. For iterator proxies, one can overload operator->() and access the template T's interface without reimplementing all the functionality. However, for reference proxies, you would need to overload operator.(), and that is not allowed in current C++ (although Sebastian Redl presented such a proposal on BoostCon 2013). You can make a verbose work-around like a .get() member inside the reference proxy, or implement all of T's interface inside the reference (this is what is done for vector<bool>::bit_reference), but this will either lose the builtin syntax or introduce user-defined conversions that do not have builtin semantics for type conversions (you can have at most one user-defined conversion per argument).
TL;DR: no vector<bool> is not a container because the Standard requires a real reference, but it can be made to behave almost like a container, at least much closer with C++11 (auto) than in C++98.
vector<bool> contains boolean values in compressed form using only one bit for value (and not 8 how bool[] arrays do). It is not possible to return a reference to a bit in c++, so there is a special helper type, "bit reference", which provides you a interface to some bit in memory and allows you to use standard operators and casts.
Many consider the vector<bool> specialization to be a mistake.
In a paper "Deprecating Vestigial Library Parts in C++17"
There is a proposal to
Reconsider vector Partial Specialization.
There has been a long history of the bool partial specialization of
std::vector not satisfying the container requirements, and in
particular, its iterators not satisfying the requirements of a random
access iterator. A previous attempt to deprecate this container was
rejected for C++11, N2204.
One of the reasons for rejection is that it is not clear what it would
mean to deprecate a particular specialization of a template. That
could be addressed with careful wording. The larger issue is that the
(packed) specialization of vector offers an important
optimization that clients of the standard library genuinely seek, but
would not longer be available. It is unlikely that we would be able to
deprecate this part of the standard until a replacement facility is
proposed and accepted, such as N2050. Unfortunately, there are no such
revised proposals currently being offered to the Library Evolution
Working Group.
Look at how it is implemented. the STL builds vastly on templates and therefore the headers do contain the code they do.
for instance look at the stdc++ implementation here.
also interesting even though not an stl conforming bit vector is the llvm::BitVector from here.
the essence of the llvm::BitVector is a nested class called reference and suitable operator overloading to make the BitVector behaves similar to vector with some limitations. The code below is a simplified interface to show how BitVector hides a class called reference to make the real implementation almost behave like a real array of bool without using 1 byte for each value.
class BitVector {
public:
class reference {
reference &operator=(reference t);
reference& operator=(bool t);
operator bool() const;
};
reference operator[](unsigned Idx);
bool operator[](unsigned Idx) const;
};
this code here has the nice properties:
BitVector b(10, false); // size 10, default false
BitVector::reference &x = b[5]; // that's what really happens
bool y = b[5]; // implicitly converted to bool
assert(b[5] == false); // converted to bool
assert(b[6] == b[7]); // bool operator==(const reference &, const reference &);
b[5] = true; // assignment on reference
assert(b[5] == true); // and actually it does work.
This code actually has a flaw, try to run:
std::for_each(&b[5], &b[6], some_func); // address of reference not an iterator
will not work because assert( (&b[5] - &b[3]) == (5 - 3) ); will fail (within llvm::BitVector)
this is the very simple llvm version. std::vector<bool> has also working iterators in it.
thus the call for(auto i = b.begin(), e = b.end(); i != e; ++i) will work. and also std::vector<bool>::const_iterator.
However there are still limitations in std::vector<bool> that makes it behave differently in some cases.
This comes from http://www.cplusplus.com/reference/vector/vector-bool/
Vector of bool This is a specialized version of vector, which is used
for elements of type bool and optimizes for space.
It behaves like the unspecialized version of vector, with the
following changes:
The storage is not necessarily an array of bool values, but the library implementation may optimize storage so that each value is
stored in a single bit.
Elements are not constructed using the allocator object, but their value is directly set on the proper bit in the internal storage.
Member function flip and a new signature for member swap.
A special member type, reference, a class that accesses individual bits in the container's internal storage with an interface that
emulates a bool reference. Conversely, member type const_reference is
a plain bool.
The pointer and iterator types used by the container are not necessarily neither pointers nor conforming iterators, although they
shall simulate most of their expected behavior.
These changes provide a quirky interface to this specialization and
favor memory optimization over processing (which may or may not suit
your needs). In any case, it is not possible to instantiate the
unspecialized template of vector for bool directly. Workarounds to
avoid this range from using a different type (char, unsigned char) or
container (like deque) to use wrapper types or further specialize for
specific allocator types.
bitset is a class that provides a similar functionality for fixed-size
arrays of bits.

Allowed usages of memset in C++ [duplicate]

Let's say I have defined a zero_initialize() function:
template<class T>
T zero_initialize()
{
T result;
std::memset(&result, 0, sizeof(result));
return result;
}
// usage: auto data = zero_initialize<Data>();
Calling zero_initialize() for some types would lead to undefined behavior1, 2. I'm currently enforcing T to verify std::is_pod. With that trait being deprecated in C++20 and the coming of concepts, I'm curious how zero_initialize() should evolve.
What (minimal) trait / concept can guarantee memsetting an object is well defined?
Should I use std::uninitialized_fill instead of std::memset? And why?
Is this function made obsolete by one of C++ initialization syntaxes for a subset of types? Or will it be with the upcoming of future C++ versions?
1) Erase all members of a class.
2) What would be reason for “undefined behaviors” upon using memset on library class(std::string)? [closed]
There is technically no object property in C++ which specifies that user code can legally memset a C++ object. And that includes POD, so if you want to be technical, your code was never correct. Even TriviallyCopyable is a property about doing byte-wise copies between existing objects (sometimes through an intermediary byte buffer); it says nothing about inventing data and shoving it into the object's bits.
That being said, you can be reasonably sure this will work if you test is_trivially_copyable and is_trivially_default_constructible. That last one is important, because some TriviallyCopyable types still want to be able to control their contents. For example, such a type could have a private int variable that is always 5, initialized in its default constructor. So long as no code with access to the variable changes it, it will always be 5. The C++ object model guarantees this.
So you can't memset such an object and still get well-defined behavior from the object model.
What (minimal) trait / concept can guarantee memsetting an object is well defined?
Per the std::memset reference on cppreference the behavior of memset on a non TriviallyCopyable type is undefined. So if it is okay to memset a TriviallyCopyable then you can add a static_assert to your class to check for that like
template<class T>
T zero_initialize()
{
static_assert(std::is_trivial_v<T>, "Error: T must be TriviallyCopyable");
T result;
std::memset(&result, 0, sizeof(result));
return result;
}
Here we use std::is_trivial_v to make sure that not only is the class trivially copyable but it also has a trivial default constructor so we know it is safe to be zero initialized.
Should I use std::uninitialized_fill instead of std::memset? And why?
You don't need to here since you are only initializing a single object.
Is this function made obsolete by one of C++ initialization syntaxes for a subset of types? Or will it be with the upcoming of future C++ versions?
Value or braced initialization does make this function "obsolete". T() and T{} will give you a value initialized T and if T doesn't have a default constructor it will be zero initialized. That means you could rewrite the function as
template<class T>
T zero_initialize()
{
static_assert(std::is_trivial_v<T>, "Error: T must be TriviallyCopyable");
return {};
}
The most general definable trait that guarantees your zero_initialize will actually zero-initialize objects is
template <typename T>
struct can_zero_initialize :
std::bool_constant<std::is_integral_v<
std::remove_cv_t<std::remove_all_extents_t<T>>>> {};
Not too useful. But the only guarantee about bitwise or bytewise representations of fundamental types in the Standard is [basic.fundamental]/7 "The representations of integral types shall define values by use of a pure binary numeration system." There is no guarantee that a floating-point value with all bytes zero is a zero value. There is no guarantee that any pointer or pointer-to-member value with all bytes zero is a null pointer value. (Though both of these are usually true in practice.)
If all non-static members of a trivially-copyable class type are (arrays of) (cv-qualified) integral types, I think that would also be okay, but there's no possible way to test for that, unless reflection comes to C++.

What trait / concept can guarantee memsetting an object is well defined?

Let's say I have defined a zero_initialize() function:
template<class T>
T zero_initialize()
{
T result;
std::memset(&result, 0, sizeof(result));
return result;
}
// usage: auto data = zero_initialize<Data>();
Calling zero_initialize() for some types would lead to undefined behavior1, 2. I'm currently enforcing T to verify std::is_pod. With that trait being deprecated in C++20 and the coming of concepts, I'm curious how zero_initialize() should evolve.
What (minimal) trait / concept can guarantee memsetting an object is well defined?
Should I use std::uninitialized_fill instead of std::memset? And why?
Is this function made obsolete by one of C++ initialization syntaxes for a subset of types? Or will it be with the upcoming of future C++ versions?
1) Erase all members of a class.
2) What would be reason for “undefined behaviors” upon using memset on library class(std::string)? [closed]
There is technically no object property in C++ which specifies that user code can legally memset a C++ object. And that includes POD, so if you want to be technical, your code was never correct. Even TriviallyCopyable is a property about doing byte-wise copies between existing objects (sometimes through an intermediary byte buffer); it says nothing about inventing data and shoving it into the object's bits.
That being said, you can be reasonably sure this will work if you test is_trivially_copyable and is_trivially_default_constructible. That last one is important, because some TriviallyCopyable types still want to be able to control their contents. For example, such a type could have a private int variable that is always 5, initialized in its default constructor. So long as no code with access to the variable changes it, it will always be 5. The C++ object model guarantees this.
So you can't memset such an object and still get well-defined behavior from the object model.
What (minimal) trait / concept can guarantee memsetting an object is well defined?
Per the std::memset reference on cppreference the behavior of memset on a non TriviallyCopyable type is undefined. So if it is okay to memset a TriviallyCopyable then you can add a static_assert to your class to check for that like
template<class T>
T zero_initialize()
{
static_assert(std::is_trivial_v<T>, "Error: T must be TriviallyCopyable");
T result;
std::memset(&result, 0, sizeof(result));
return result;
}
Here we use std::is_trivial_v to make sure that not only is the class trivially copyable but it also has a trivial default constructor so we know it is safe to be zero initialized.
Should I use std::uninitialized_fill instead of std::memset? And why?
You don't need to here since you are only initializing a single object.
Is this function made obsolete by one of C++ initialization syntaxes for a subset of types? Or will it be with the upcoming of future C++ versions?
Value or braced initialization does make this function "obsolete". T() and T{} will give you a value initialized T and if T doesn't have a default constructor it will be zero initialized. That means you could rewrite the function as
template<class T>
T zero_initialize()
{
static_assert(std::is_trivial_v<T>, "Error: T must be TriviallyCopyable");
return {};
}
The most general definable trait that guarantees your zero_initialize will actually zero-initialize objects is
template <typename T>
struct can_zero_initialize :
std::bool_constant<std::is_integral_v<
std::remove_cv_t<std::remove_all_extents_t<T>>>> {};
Not too useful. But the only guarantee about bitwise or bytewise representations of fundamental types in the Standard is [basic.fundamental]/7 "The representations of integral types shall define values by use of a pure binary numeration system." There is no guarantee that a floating-point value with all bytes zero is a zero value. There is no guarantee that any pointer or pointer-to-member value with all bytes zero is a null pointer value. (Though both of these are usually true in practice.)
If all non-static members of a trivially-copyable class type are (arrays of) (cv-qualified) integral types, I think that would also be okay, but there's no possible way to test for that, unless reflection comes to C++.

C++11 - typeid uniqueness

In C++11, I am using this
typeid(T).name()
for my own hash computation. I don't need the result to be same between the program runs or compilations. I just need it to be unique for the types.
I know, that it can return same name for different types, but it is usually with const, pointers etc.
In my case, T is only class XY, struct XX or derived types.
In this case, can I assume, that T will be unique?
You should use std::type_index for mapping purposes.
The type_index class is a wrapper class around a std::type_info
object, that can be used as index in associative and unordered
associative containers. The relationship with type_info object is
maintained through a pointer, therefore type_index is
CopyConstructible and CopyAssignable.
std::type_info::name is implementation-defined, so you shouldn't rely on it being unique for different types.
Since you're doing this for hash computation, you should use std::type_info::hash_code instead. Although this doesn't guarantee that the values will be unique, the standard says that implementations should try and return different values for different types. So long as your hash map implementation has reasonable collision handling, this should be sufficient for you.
As stated on cppreference:
Returns an implementation defined null-terminated character string
containing the name of the type. No guarantees are given, in
particular, the returned string can be identical for several types and
change between invocations of the same program.
So, no, you can't. You can't assume anything actually.
Although, hash_code() gives you:
size_t hash_code() const noexcept;
7 Returns: An unspecified value, except that within a single execution
of the program, it shall return the same value for any two type_info
objects which compare equal.
8 Remark: an implementation should return different values for two
type_info objects which do not compare equal.
Which means that hash_code() can be used to distinguish two different types only if operator== for type_info supports this.
What you might be able to do is take address of a member.
class HashBase {
virtual intptr_t get() = 0;
};
template <typename T>
class Hash : HashBase {
static const int _addr = 0;
intptr_t get() override { return reinterpret_cast<intptr_t>(&_addr); }
};