what is the new feature in c++20 [[no_unique_address]]? - c++

i have read the new c++20 feature no_unique_address several times and i hope if some one can explain and illustrate with an example better than this example below taken from c++ reference.
Explanation Applies to the name being declared in the declaration of a
non-static data member that's not a bit field.
Indicates that this data member need not have an address distinct from
all other non-static data members of its class. This means that if the
member has an empty type (e.g. stateless Allocator), the compiler may
optimise it to occupy no space, just like if it were an empty base. If
the member is not empty, any tail padding in it may be also reused to
store other data members.
#include <iostream>
struct Empty {}; // empty class
struct X {
int i;
Empty e;
};
struct Y {
int i;
[[no_unique_address]] Empty e;
};
struct Z {
char c;
[[no_unique_address]] Empty e1, e2;
};
struct W {
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
// e1 and e2 cannot share the same address because they have the
// same type, even though they are marked with [[no_unique_address]].
// However, either may share address with c.
static_assert(sizeof(Z) >= 2);
// e1 and e2 cannot have the same address, but one of them can share with
// c[0] and the other with c[1]
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
can some one explain to me what is the purpose behind this feature and when should i use it?
e1 and e2 cannot have the same address, but one of them can share with c[0] and the other with c[1] can some one explain? why do we have such kind of relation ?

The purpose behind the feature is exactly as stated in your quote: "the compiler may optimise it to occupy no space". This requires two things:
An object which is empty.
An object that wants to have an non-static data member of a type which may be empty.
The first one is pretty simple, and the quote you used even spells it out an important application. Objects of type std::allocator do not actually store anything. It is merely a class-based interface into the global ::new and ::delete memory allocators. Allocators that don't store data of any kind (typically by using a global resource) are commonly called "stateless allocators".
Allocator-aware containers are required to store the value of an allocator that the user provides (which defaults to a default-constructed allocator of that type). That means the container must have a subobject of that type, which is initialized by the allocator value the user provides. And that subobject takes up space... in theory.
Consider std::vector. The common implementation of this type is to use 3 pointers: one for the beginning of the array, one for the end of the useful part of the array, and one for the end of the allocated block for the array. In a 64-bit compilation, these 3 pointers require 24 bytes of storage.
A stateless allocator doesn't actually have any data to store. But in C++, every object has a size of at least 1. So if vector stored an allocator as a member, every vector<T, Alloc> would have to take up at least 32 bytes, even if the allocator stores nothing.
The common workaround to this is to derive vector<T, Alloc> from Alloc itself. The reason being that base class subobject are not required to have a size of 1. If a base class has no members and has no non-empty base classes, then the compiler is permitted to optimize the size of the base class within the derived class to not actually take up space. This is called the "empty base optimization" (and it's required for standard layout types).
So if you provide a stateless allocator, a vector<T, Alloc> implementation that inherits from this allocator type is still just 24 bytes in size.
But there's a problem: you have to inherit from the allocator. And that's really annoying. And dangerous. First, the allocator could be final, which is in fact allowed by the standard. Second, the allocator could have members that interfere with the vector's members. Third, it's an idiom that people have to learn, which makes it folk wisdom among C++ programmers, rather than an obvious tool for any of them to use.
So while inheritance is a solution, it's not a very good one.
This is what [[no_unique_address]] is for. It would allow a container to store the allocator as a member subobject rather than as a base class. If the allocator is empty, then [[no_unique_address]] will allow the compiler to make it take up no space within the class's definition. So such a vector could still be 24 bytes in size.
e1 and e2 cannot have the same address, but one of them can share with c[0] and the other with c1 can some one explain? why do we have such kind of relation ?
C++ has a fundamental rule that its object layout must follow. I call it the "unique identity rule".
For any two objects, at least one of the following must be true:
They must have different types.
They must have different addresses in memory.
They must actually be the same object.
e1 and e2 are not the same object, so #3 is violated. They also share the same type, so #1 is violated. Therefore, they must follow #2: they must not have the same address. In this case, since they are subobjects of the same type, this means that the compiler-defined object layout of this type cannot give them the same offset within the object.
e1 and c[0] are distinct objects, so again #3 fails. But they satisfy #1, since they have different types. Therefore (subject to the rules of [[no_unique_address]]) the compiler could assign them to the same offset within the object. The same goes for e2 and c[1].
If the compiler wants to assign two different members of a class to the same offset within the containing object, then they must be of different types (note that this is recursive through all of each of their subobjects). Therefore, if they have the same type, they must have different addresses.

In order to understand [[no_unique_address]], let's take a look at unique_ptr. It has the following signature:
template<class T, class Deleter = std::default_delete<T>>
class unique_ptr;
In this declaration, Deleter represents a type which provides the operation used to delete a pointer.
We can implement unique_ptr like this:
template<class T, class Deleter>
class unique_ptr {
T* pointer = nullptr;
Deleter deleter;
public:
// Stuff
// ...
// Destructor:
~unique_ptr() {
// deleter must overload operator() so we can call it like a function
// deleter can also be a lambda
deleter(pointer);
}
};
So what's wrong with this implementation? We want unique_ptr to be as light-weight as possible. Ideally, it should be the exact same size as a regular pointer. But because we have the Deleter member, unqiue_ptr will end up being at least 16 bytes: 8 for the pointer, and then 8 additional ones to store the Deleter, even if Deleter is empty.
[[no_unique_address]] solves this issue:
template<class T, class Deleter>
class unique_ptr {
T* pointer = nullptr;
// Now, if Deleter is empty it won't take up any space in the class
[[no_unique_address]] Deleter deleter;
public:
// STuff...

While the other answers explained it pretty well already, let me explain it from a slightly different perspective:
The root of the problem is that C++ does not allow for zero sized objects (i.e. we always have sizeof(obj) > 0).
This is essentially a consequence of very fundamental definitions in the C++ standard: The unique identity rule (as Nicol Bolas explained) but also from the definition of the "object" as a non-empty sequence of bytes.
However this leads to unpleasant issues when writing generic code. This is somewhat expected because here a corner-case (-> empty type) receives a special treatment, that deviates from the systematic behavior of the other cases (-> size increases in a non-systematic way).
The effects are:
Space is wasted, when stateless objects (i.e. classes/structs with no members) are used
Zero length arrays are forbidden.
Since one arrives at these problems very quickly when writing generic code, there have been several attempts for mitigation
The empty base class optimization. This solves 1) for a subset of cases
Introduction of std::array which allows for N==0. This solves 2) but still has issue 1)
The introcduction of [no_unique_address], which finally solves 1) for all remaining cases. At least when the user explicity requests it.
Introduction of std::is_empty. Needed because the obvious sizeof does not work (as sizeof(Empty) >= 1). (Thanks to Dwayne Robinson)
Maybe allowing zero-sized objects would have been the cleaner solution which could have prevented the fragmentation *). However when you search for zero-sized object on SO you will find questions with different answers (sometimes not convincing) and quickly notice that this is a disputed topic.
Allowing zero-sized objects would require a change at the heart of the C++ language and given the fact that the C++ language is very complex already, the standard comittee likely decided for the minimal invasive route and just introduced a new attribute.
Together with the other mitigations from above it finally solves all issues due to disallowal of zero-sized objects. Even though it is maybe not be the nicest solution from a fundamental point of view, it is effective.
*) To me the unique-identity-rule for zero sized types does not make much sense anyway. Why should we want objects, which are stateless per programmers choice (i.e. have no non-static data members), to have an unique address in the first place? The address is some kind of (immutable) state of an object and if the programmer wanted a state they could just add a nonstatic data member.

Related

Initializing an array of trivially_copyable but not default_constructible objects from bytes. Confusion in [intro.object]

We are initializing (large) arrays of trivially_copiable objects from secondary storage, and questions such as this or this leaves us with little confidence in our implemented approach.
Below is a minimal example to try to illustrate the "worrying" parts in the code.
Please also find it on Godbolt.
Example
Let's have a trivially_copyable but not default_constructible user type:
struct Foo
{
Foo(double a, double b) :
alpha{a},
beta{b}
{}
double alpha;
double beta;
};
Trusting cppreference:
Objects of trivially-copyable types that are not potentially-overlapping subobjects are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
Now, we want to read a binary file into an dynamic array of Foo. Since Foo is not default constructible, we cannot simply:
std::unique_ptr<Foo[]> invalid{new Foo[dynamicSize]}; // Error, no default ctor
Alternative (A)
Using uninitialized unsigned char array as storage.
std::unique_ptr<unsigned char[]> storage{
new unsigned char[dynamicSize * sizeof(Foo)] };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << reinterpret_cast<Foo *>(storage.get())[index].alpha << "\n";
Is there an UB because object of actual type Foo are never explicitly created in storage?
Alternative (B)
The storage is explicitly typed as an array of Foo.
std::unique_ptr<Foo[]> storage{
static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << storage[index].alpha << "\n";
This alternative was inspired by this post. Yet, is it better defined? It seems there are still no explicit creation of object of type Foo.
It is notably getting rid of the reinterpret_cast when accessing the Foo data member (this cast might have violated the Type Aliasing rule).
Overall Questions
Are any of these alternatives defined by the standard? Are they actually different?
If not, is there a correct way to implement this (without first initializing all Foo instances to values that will be discarded immediately after)
Is there any difference in undefined behaviours between versions of the C++ standard?
(In particular, please see this comment with regard to C++20)
What you're trying to do ultimately is create an array of some type T by memcpying bytes from elsewhere without default constructing the Ts in the array first.
Pre-C++20 cannot do this without provoking UB at some point.
The problem ultimately comes down to [intro.object]/1, which defines the ways objects get created:
An object is created by a definition, by a new-expression, when implicitly changing the active member of a union, or when a temporary object is created ([conv.rval], [class.temporary]).
If you have a pointer of type T*, but no T object has been created in that address, you can't just pretend that the pointer points to an actual T. You have to cause that T to come into being, and that requires doing one of the above operations. And the only available one for your purposes is the new-expression, which requires that the T is default constructible.
If you want to memcpy into such objects, they must exist first. So you have to create them. And for arrays of such objects, that means they need to be default constructible.
So if it is at all possible, you need a (likely defaulted) default constructor.
In C++20, certain operations can implicitly create objects (provoking "implicit object creation" or IOC). IOC only works on implicit lifetime types, which for classes:
A class S is an implicit-lifetime class if it is an aggregate or has at least one trivial eligible constructor and a trivial, non-deleted destructor.
Your class qualifies, as it has a trivial copy constructor (which is "eligible") and a trivial destructor.
If you create an array of byte-wise types (unsigned char, std::byte, or char), this is said to "implicitly create objects" in that storage. This property also applies to the memory returned by malloc and operator new. This means that if you do certain kinds of undefined behavior to pointers to that storage, the system will automatically create objects (at the point where the array was created) that would make that behavior well-defined.
So if you allocate such storage, cast a pointer to it to a T*, and then start using it as though it pointed to a T, the system will automatically create Ts in that storage, so long as it was appropriately aligned.
Therefore, your alternative A works just fine:
When you apply [index] to your casted pointer, C++ will retroactively create an array of Foo in that storage. That is, because you used the memory like an array of Foo exists there, C++20 will make an array of Foo exist there, exactly as if you had created it back at the new unsigned char statement.
However, alternative B will not work as is. You did not use new[] Foo to create the array, so you cannot use delete[] Foo to delete it. You can still use unique_ptr, but you'll have to create a deleter that explicitly calls operator delete on the pointer:
struct mem_delete
{
template<typename T>
void operator(T *ptr)
{
::operator delete[](ptr);
}
};
std::unique_ptr<Foo[], mem_delete> storage{
static_cast<Foo *>(::operator new[](dynamicSize * sizeof(Foo))) };
input.read(reinterpret_cast<char *>(storage.get()), dynamicSize * sizeof(Foo));
std::cout << storage[index].alpha << "\n";
Again, storage[index] creates an array of T as if it were created at the time the memory was allocated.
My first question is: What are you trying to achieve?
Is there an issue with reading each entry individually?
Are you assuming that your code will speed up by reading an array?
Is latency really a factor?
Why can't you just add a default constructor to the class?
Why can't you enhance input.read() to read directly into an array? See std::extent_v<T>
Assuming the constraints you defined, I would start with writing it the simple way, reading one entry at a time, and benchmark it.
Having said that, that which you describe is a common paradigm and, yes, can break a lot of rules.
C++ is very (overly) cautious about things like alignment which can be issues on certain platforms and non-issues on others. This is only "undefined behaviour" because no cross-platform guarantees can be given by the C++ standard itself, even though many techniques work perfectly well in practice.
The textbook way to do this is to create an empty buffer and memcpy into a proper object, but as your input is serialised (potentially by another system), there isn't actually a guarantee that the padding and alignment will match the memory layout which the local compiler determined for the sequence so you would still have to do this one item at a time.
My advice is to write a unit-test to ensure that there are no issues and potentially embed that into the code as a static assertion. The technique you described breaks some C++ rules but that doesn't mean it's breaking, for example, x86 rules.
Alternative (A): Accessing a —non-static— member of an object before its lifetime begins.
The behavior of the program is undefined (See: [basic.life]).
Alternative (B): Implicit call to the implicitly deleted default constructor.
The program is ill-formed (See: [class.default.ctor]).
I'm not sure about the latter. If someone more knowledgeable knows if/why this is UB please correct me.
You can manage the memory yourself, and then return a unique_ptr which uses a custom deleter. Since you can't use new[], you can't use the plain version of unique_ptr<T[]> and you need to manually call the destructor and deleter using an allocator.
template <class Allocator = std::allocator<Foo>>
struct FooDeleter : private Allocator {
using pointer = typename std::allocator_traits<Allocator>::pointer;
explicit FooDeleter(const Allocator &alloc, len) : Allocator(alloc), len(len) {}
void operator()(pointer p) {
for (pointer i = p; i != p + len; ++i) {
Allocator::destruct(i);
}
Allocator::deallocate(p, len);
}
size_t len;
};
std::unique_ptr<Foo[], FooDeleter<>> create(size_t len) {
std::allocator<Foo> alloc;
Foo *p = nullptr, *i = nullptr;
try {
p = alloc.allocate(len);
for (i = p; i != p + len; ++i) {
alloc.construct(i , 1.0f, 2.0f);
}
} catch (...) {
while (i > p) {
alloc.destruct(i--);
}
if (p)
alloc.deallocate(p);
throw;
}
return std::unique_ptr<Foo[], FooDeleter<>>{p, FooDeleter<>(alloc, len)};
}

Can I use an int (as opposed to a char) array as a memory arena where objects are created with placement new?

The question concerns a home-grown container template (a kind of std::array/vector hybrid) which holds an untyped array for storage. New elements are added by a push_back() member function which copy-constructs an element via placement new. This way the container does not require the contained type to have a default constructor, and we avoid default-constructing potentially never needed elements.
Typically, such storage would be a character type like std::byte.
We are using a bouquet of compilers. One of them predates the C++11 alignment facilities like alignasor aligned_storage. Absent that, they all all need different pragmas or attributes to guarantee alignment. In order to simplify the build and avoid manual alignment computation noise, we had the idea to use an array of 32 bit integers which have an alignment guarantee. Here is the core of the implementation:
template <class T> struct vec
{
uint32_t storage[NUM];
T *freeMem;
vec() : freeMem((T *)storage) {}
T *push_back(const T &t) { return new (freeMem++) T(t); }
};
Notably, we use a typed (differently typed than the storage array) pointer to pass the storage location to placement new; we think that's not an aliasing violation because we don't read or write through it before the object with type T is created.
Also notable is that the newly created objects may incompletely straddle two or more of the original int objects in the storage area, if sizeof(T) is not a multiple of sizeof(uint32_t).
I would think we have neither aliasing nor object lifetime issues. Is that so?

Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?

From http://en.cppreference.com/w/cpp/string/byte/memcpy:
If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined.
At my work, we have used std::memcpy for a long time to bitwise swap objects that are not TriviallyCopyable using:
void swapMemory(Entity* ePtr1, Entity* ePtr2)
{
static const int size = sizeof(Entity);
char swapBuffer[size];
memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}
and never had any issues.
I understand that it is trivial to abuse std::memcpy with non-TriviallyCopyable objects and cause undefined behavior downstream. However, my question:
Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects? Why does the standard deem it necessary to specify that?
UPDATE
The contents of http://en.cppreference.com/w/cpp/string/byte/memcpy have been modified in response to this post and the answers to the post. The current description says:
If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined unless the program does not depend on the effects of the destructor of the target object (which is not run by memcpy) and the lifetime of the target object (which is ended, but not started by memcpy) is started by some other means, such as placement-new.
PS
Comment by #Cubbi:
#RSahu if something guarantees UB downstream, it renders the entire program undefined. But I agree that it appears to be possible to skirt around UB in this case and modified cppreference accordingly.
Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?
It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call.
Using the target object - calling its member functions, accessing its data members - is clearly undefined[basic.life]/6, and so is a subsequent, implicit destructor call[basic.life]/4 for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5:
However, if any such execution contains an undefined operation, this
International Standard places no requirement on the implementation
executing that program with that input (not even with regard to
operations preceding the first undefined operation).
If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the memcpy call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make.
It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. std::copy on pointers to trivially copyable types usually calls memcpy on the underlying bytes. So does swap.
So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.
It is easy enough to construct a class where that memcpy-based swap breaks:
struct X {
int x;
int* px; // invariant: always points to x
X() : x(), px(&x) {}
X(X const& b) : x(b.x), px(&x) {}
X& operator=(X const& b) { x = b.x; return *this; }
};
memcpying such object breaks that invariant.
GNU C++11 std::string does exactly that with short strings.
This is similar to how the standard file and string streams are implemented. The streams eventually derive from std::basic_ios which contains a pointer to std::basic_streambuf. The streams also contain the specific buffer as a member (or base class sub-object), to which that pointer in std::basic_ios points to.
Because the standard says so.
Compilers may assume that non-TriviallyCopyable types are only copied via their copy/move constructors/assignment operators. This could be for optimization purposes (if some data is private, it could defer setting it until a copy / move occurs).
The compiler is even free to take your memcpy call and have it do nothing, or format your hard drive. Why? Because the standard says so. And doing nothing is definitely faster than moving bits around, so why not optimize your memcpy to an equally-valid faster program?
Now, in practice, there are many problems that can occur when you just blit around bits in types that don't expect it. Virtual function tables might not be set up right. Instrumentation used to detect leaks may not be set up right. Objects whose identity includes their location get completely messed up by your code.
The really funny part is that using std::swap; swap(*ePtr1, *ePtr2); should be able to be compiled down to a memcpy for trivially copyable types by the compiler, and for other types be defined behavior. If the compiler can prove that copy is just bits being copied, it is free to change it to memcpy. And if you can write a more optimal swap, you can do so in the namespace of the object in question.
C++ does not guarantee for all types that their objects occupy contiguous bytes of storage [intro.object]/5
An object of trivially copyable or standard-layout type (3.9) shall
occupy contiguous bytes of storage.
And indeed, through virtual base classes, you can create non-contiguous objects in major implementations. I have tried to build an example where a base class subobject of an object x is located before x's starting address. To visualize this, consider the following graph/table, where the horizontal axis is address space, and the vertical axis is the level of inheritance (level 1 inherits from level 0). Fields marked by dm are occupied by direct data members of the class.
L | 00 08 16
--+---------
1 | dm
0 | dm
This is a usual memory layout when using inheritance. However, the location of a virtual base class subobject is not fixed, since it can be relocated by child classes that also inherit from the same base class virtually. This can lead to the situation that the level 1 (base class sub)object reports that it begins at address 8 and is 16 bytes large. If we naively add those two numbers, we'd think it occupies the address space [8, 24) even though it actually occupies [0, 16).
If we can create such a level 1 object, then we cannot use memcpy to copy it: memcpy would access memory that does not belong to this object (addresses 16 to 24). In my demo, is caught as a stack-buffer-overflow by clang++'s address sanitizer.
How to construct such an object? By using multiple virtual inheritance, I came up with an object that has the following memory layout (virtual table pointers are marked as vp). It is composed through four layers of inheritance:
L 00 08 16 24 32 40 48
3 dm
2 vp dm
1 vp dm
0 dm
The issue described above will arise for the level 1 base class subobject. Its starting address is 32, and it is 24 bytes large (vptr, its own data members and level 0's data members).
Here's the code for such a memory layout under clang++ and g++ # coliru:
struct l0 {
std::int64_t dummy;
};
struct l1 : virtual l0 {
std::int64_t dummy;
};
struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;
};
struct l3 : l2, virtual l1 {
std::int64_t dummy;
};
We can produce a stack-buffer-overflow as follows:
l3 o;
l1& so = o;
l1 t;
std::memcpy(&t, &so, sizeof(t));
Here's a complete demo that also prints some info about the memory layout:
#include <cstdint>
#include <cstring>
#include <iomanip>
#include <iostream>
#define PRINT_LOCATION() \
std::cout << std::setw(22) << __PRETTY_FUNCTION__ \
<< " at offset " << std::setw(2) \
<< (reinterpret_cast<char const*>(this) - addr) \
<< " ; data is at offset " << std::setw(2) \
<< (reinterpret_cast<char const*>(&dummy) - addr) \
<< " ; naively to offset " \
<< (reinterpret_cast<char const*>(this) - addr + sizeof(*this)) \
<< "\n"
struct l0 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); }
};
struct l1 : virtual l0 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l0::report(addr); }
};
struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l1::report(addr); }
};
struct l3 : l2, virtual l1 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l2::report(addr); }
};
void print_range(void const* b, std::size_t sz)
{
std::cout << "[" << (void const*)b << ", "
<< (void*)(reinterpret_cast<char const*>(b) + sz) << ")";
}
void my_memcpy(void* dst, void const* src, std::size_t sz)
{
std::cout << "copying from ";
print_range(src, sz);
std::cout << " to ";
print_range(dst, sz);
std::cout << "\n";
}
int main()
{
l3 o{};
o.report(reinterpret_cast<char const*>(&o));
std::cout << "the complete object occupies ";
print_range(&o, sizeof(o));
std::cout << "\n";
l1& so = o;
l1 t;
my_memcpy(&t, &so, sizeof(t));
}
Live demo
Sample output (abbreviated to avoid vertical scrolling):
l3::report at offset 0 ; data is at offset 16 ; naively to offset 48
l2::report at offset 0 ; data is at offset 8 ; naively to offset 40
l1::report at offset 32 ; data is at offset 40 ; naively to offset 56
l0::report at offset 24 ; data is at offset 24 ; naively to offset 32
the complete object occupies [0x9f0, 0xa20)
copying from [0xa10, 0xa28) to [0xa20, 0xa38)
Note the two emphasized end offsets.
Many of these answers mention that memcpy could break invariants in the class, which would cause undefined behaviour later (and which in most cases should be reason enough not to risk it), but that doesn't seem to be what you're really asking.
One reason for why the memcpy call itself is deemed to be undefined behaviour is to give as much room as possible to the compiler to make optimizations based on the target platform. By having the call itself be UB, the compiler is allowed to do weird, platform-dependent things.
Consider this (very contrived and hypothetical) example: For a particular hardware platform, there might be several different kinds of memory, with some being faster than others for different operations. There might, for instance, be a kind of special memory that allows extra fast memory copies. A compiler for this (imaginary) platform is therefore allowed to place all TriviallyCopyable types in this special memory, and implement memcpy to use special hardware instructions that only work on this memory.
If you were to use memcpy on non-TriviallyCopyable objects on this platform, there might be some low-level INVALID OPCODE crash in the memcpy call itself.
Not the most convincing of arguments, perhaps, but the point is that the standard doesn't forbid it, which is only possible through making the memcpy call UB.
memcpy will copy all the bytes, or in your case swap all the bytes, just fine. An overzealous compiler could take the "undefined behaviour" as an excuse to to all kinds of mischief, but most compilers won't do that. Still, it is possible.
However, after these bytes are copied, the object that you copied them to may not be a valid object anymore. Simple case is a string implementation where large strings allocate memory, but small strings just use a part of the string object to hold characters, and keep a pointer to that. The pointer will obviously point to the other object, so things will be wrong. Another example I have seen was a class with data that was used in very few instances only, so that data was kept in a database with the address of the object as a key.
Now if your instances contain a mutex for example, I would think that moving that around could be a major problem.
Another reason that memcpy is UB (apart from what has been mentioned in the other answers - it might break invariants later on) is that it is very hard for the standard to say exactly what would happen.
For non-trivial types, the standard says very little about how the object is laid out in memory, in which order the members are placed, where the vtable pointer is, what the padding should be, etc. The compiler has huge amounts of freedom in deciding this.
As a result, even if the standard wanted to allow memcpy in these "safe" situations, it would be impossible to state what situations are safe and which aren't, or when exactly the real UB would be triggered for unsafe cases.
I suppose that you could argue that the effects should be implementation-defined or unspecified, but I'd personally feel that would be both digging a bit too deep into platform specifics and giving a little bit too much legitimacy to something that in the general case is rather unsafe.
First, note that it is unquestionable that all memory for mutable C/C++ objects has to be un-typed, un-specialized, usable for any mutable object. (I guess the memory for global const variables could hypothetically be typed, there is just no point with such hyper complication for such tiny corner case.) Unlike Java, C++ has no typed allocation of a dynamic object: new Class(args) in Java is a typed object creation: creation an object of a well defined type, that might live in typed memory. On the other hand, the C++ expression new Class(args) is just a thin typing wrapper around type-less memory allocation, equivalent with new (operator new(sizeof(Class)) Class(args): the object is created in "neutral memory". Changing that would mean changing a very big part of C++.
Forbidding the bit copy operation (whether done by memcpy or the equivalent user defined byte by byte copy) on some type gives a lot freedom to the implementation for polymorphic classes (those with virtual functions), and other so called "virtual classes" (not a standard term), that is the classes that use the virtual keyword.
The implementation of polymorphic classes could use a global associative map of addresses which associate the address of a polymorphic object and its virtual functions. I believe that was an option seriously considered during the design of the first iterations C++ language (or even "C with classes"). That map of polymorphic objects might use special CPU features and special associative memory (such features aren't exposed to the C++ user).
Of course we know that all practical implementations of virtual functions use vtables (a constant record describing all dynamic aspects of a class) and put a vptr (vtable pointer) in each polymorphic base class subobject, as that approach is extremely simple to implement (at least for the simplest cases) and very efficient. There is no global registry of polymorphic objects in any real world implementation except possibly in debug mode (I don't know such debug mode).
The C++ standard made the lack of global registry somewhat official by saying that you can skip the destructor call when you reuse the memory of an object, as long as you don't depend on the "side effects" of that destructor call. (I believe that means that the "side effects" are user created, that is the body of the destructor, not implementation created, as automatically done to the destructor by the implementation.)
Because in practice in all implementations, the compiler just uses vptr (pointer to vtables) hidden members, and these hidden members will be copied properly bymemcpy; as if you did a plain member-wise copy of the C struct representing the polymorphic class (with all its hidden members). Bit-wise copies, or complete C struct members-wise copies (the complete C struct includes hidden members) will behave exactly as a constructor call (as done by placement new), so all you have to do it let the compiler think you might have called placement new. If you do a strongly external function call (a call to a function that cannot be inlined and whose implementation cannot be examined by the compiler, like a call to a function defined in a dynamically loaded code unit, or a system call), then the compiler will just assume that such constructors could have been called by the code it cannot examine. Thus the behavior of memcpy here is defined not by the language standard, but by the compiler ABI (Application Binary Interface). The behavior of a strongly external function call is defined by the ABI, not just by the language standard. A call to a potentially inlinable function is defined by the language as its definition can be seen (either during compiler or during link time global optimization).
So in practice, given appropriate "compiler fences" (such as a call to an external function, or just asm("")), you can memcpy classes that only use virtual functions.
Of course, you have to be allowed by the language semantic to do such placement new when you do a memcpy: you cannot willy-nilly redefine the dynamic type of an existing object and pretend you have not simply wrecked the old object. If you have a non const global, static, automatic, member subobject, array subobject, you can overwrite it and put another, unrelated object there; but if the dynamic type is different, you cannot pretend that it's still the same object or subobject:
struct A { virtual void f(); };
struct B : A { };
void test() {
A a;
if (sizeof(A) != sizeof(B)) return;
new (&a) B; // OK (assuming alignement is OK)
a.f(); // undefined
}
The change of polymorphic type of an existing object is simply not allowed: the new object has no relation with a except for the region of memory: the continuous bytes starting at &a. They have different types.
[The standard is strongly divided on whether *&a can be used (in typical flat memory machines) or (A&)(char&)a (in any case) to refer to the new object. Compiler writers are not divided: you should not do it. This a deep defect in C++, perhaps the deepest and most troubling.]
But you cannot in portable code perform bitwise copy of classes that use virtual inheritance, as some implementations implement those classes with pointers to the virtual base subobjects: these pointers that were properly initialized by the constructor of the most derived object would have their value copied by memcpy (like a plain member wise copy of the C struct representing the class with all its hidden members) and wouldn't point the subobject of the derived object!
Other ABI use address offsets to locate these base subobjects; they depend only on the type of the most derived object, like final overriders and typeid, and thus can be stored in the vtable. On these implementation, memcpy will work as guaranteed by the ABI (with the above limitation on changing the type of an existing object).
In either case, it is entirely an object representation issue, that is, an ABI issue.
Ok, lets try your code with a little example:
#include <iostream>
#include <string>
#include <string.h>
void swapMemory(std::string* ePtr1, std::string* ePtr2) {
static const int size = sizeof(*ePtr1);
char swapBuffer[size];
memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}
int main() {
std::string foo = "foo", bar = "bar";
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
swapMemory(&foo, &bar);
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
return 0;
}
On my machine, this prints the following before crashing:
foo = foo, bar = bar
foo = foo, bar = bar
Weird, eh? The swap does not seem to be performed at all. Well, the memory was swapped, but std::string uses the small-string-optimization on my machine: It stores short strings within a buffer that's part of the std::string object itself, and just points its internal data pointer at that buffer.
When swapMemory() swaps the bytes, it swaps both the pointers and the buffers. So, the pointer in the foo object now points at the storage in the bar object, which now contains the string "foo". Two levels of swap make no swap.
When std::string's destructor subsequently tries to clean up, more evil happens: The data pointer does not point at the std::string's own internal buffer anymore, so the destructor deduces that that memory must have been allocated on the heap, and tries to delete it. The result on my machine is a simple crash of the program, but the C++ standard would not care if pink elephants were to appear. The behavior is totally undefined.
And that is the fundamental reason why you should not be using memcpy() on non-trivially copyable objects: You do not know whether the object contains pointers/references to its own data members, or depends on its own location in memory in any other way. If you memcpy() such an object, the basic assumption that the object cannot move around in memory is violated, and some classes like std::string do rely on this assumption. The C++ standard draws the line at the distinction between (non-)trivially copyable objects to avoid going into more, unnecessary detail about pointers and references. It only makes an exception for trivially copyable objects and says: Well, in this case you are safe. But do not blame me on the consequences should you try to memcpy() any other objects.
What I can perceive here is that -- for some practical applications -- the C++ Standard may be to restrictive, or rather, not permittive enough.
As shown in other answers memcpy breaks down quickly for "complicated" types, but IMHO, it actually should work for Standard Layout Types as long as the memcpy doesn't break what the defined copy-operations and destructor of the Standard Layout type do. (Note that a even TC class is allowed to have a non-trivial constructor.) The standard only explicitly calls out TC types wrt. this, however.
A recent draft quote (N3797):
3.9 Types
...
2 For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char. If the content of the array of char
or unsigned char is copied back into the object, the object shall
subsequently hold its original value. [ Example:
#define N sizeof(T)
char buf[N]; T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy,
// obj might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
// holds its original value
—end example ]
3 For any trivially copyable type T, if two pointers to T point to
distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a
base-class subobject, if the underlying bytes (1.7) making up obj1 are
copied into obj2, obj2 shall subsequently hold the same value as obj1.
[ Example:
T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p
—end example ]
The standard here talks about trivially copyable types, but as was observed by #dyp above, there are also standard layout types that do not, as far as I can see, necessarily overlap with Trivially Copyable types.
The standard says:
1.8 The C++ object model
(...)
5 (...) An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.
So what I see here is that:
The standard says nothing about non Trivially Copyable types wrt. memcpy. (as already mentioned several times here)
The standard has a separate concept for Standard Layout types that occupy contiguous storage.
The standard does not explicitly allow nor disallow using memcpy on objects of Standard Layout that are not Trivially Copyable.
So it does not seem to be explicitly called out UB, but it certainly also isn't what is referred to as unspecified behavior, so one could conclude what #underscore_d did in the comment to the accepted answer:
(...) You can't just say "well, it
wasn't explicitly called out as UB, therefore it's defined
behaviour!", which is what this thread seems to amount to. N3797 3.9
points 2~3 do not define what memcpy does for non-trivially-copyable
objects, so (...) [t]hat's pretty much functionally
equivalent to UB in my eyes as both are useless for writing reliable, i.e. portable code
I personally would conclude that it amounts to UB as far as portability goes (oh, those optimizers), but I think that with some hedging and knowledge of the concrete implementation, one can get away with it. (Just make sure it's worth the trouble.)
Side Note: I also think that the standard really should explicitly incorporate Standard Layout type semantics into the whole memcpy mess, because it's a valid and useful usecase to do bitwise copy of non Trivially Copyable objects, but that's beside the point here.
Link: Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

Struct inheritance vs class inheritance in C++

I just discovered from this Q/A that structs are inheritable in C++ but, is it a good practice, or is it preferable to use classes? In which cases is preferable and in which ones is not?
I have never needed this, but now I have a bunch of messages of different types, but same longitude. I got them in binary in a char array, and I just copy them with memcpy to the struct to fill its fields (I don't know if it is even possible to do it with std::copy).
I guess it would be great to be able to inherit every struct from a base struct with common headers, that is why I searched for this. So a second question would be: if I do this with classes, is it possible to do a memcpy (or std:copy) from a buffer to a class?
Whether you can use a bitwise copy or not has nothing to do with the struct or class tag and only depends on whether said struct or class is_trivially_copiable. Whether they are is defined in the Standard (9/6 [class]) and it basically boils down to not having to declare any other special member methods than constructors.
The bitwise copy is then allowed by the Standard in 3.9/2 [basic.types]
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value. [ Example:
#define N sizeof(T)
char buf[N];
T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy,
// `obj` might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of `obj`
// of scalar type holds its original value
—end example ]
Note: a bitwise copy of padding bytes will lead to reports in Valgrind.
Using std::copy to the same effect:
char const* b = reinterpret_cast<char const*>(&obj);
std::copy(b, b + N, buf);
The only difference between struct and class is the default access modifier to its members. In struct it's public and in class it's private (until stated otherwise). Besides that struct and class are identical in C++.
Sometimes structs are prefered for PDO (Plain Data Objects) over classes for readability but that's really up to a coding convention.

Empty Data Member Optimization: would it be possible?

In C++, most of the optimizations are derived from the as-if rule. That is, as long as the program behaves as-if no optimization had taken place, then they are valid.
The Empty Base Optimization is one such trick: in some conditions, if the base class is empty (does not have any non-static data member), then the compiler may elide its memory representation.
Apparently it seems that the standard forbids this optimization on data members, that is even if a data member is empty, it must still take at least one byte worth of place: from n3225, [class]
4 - Complete objects and member subobjects of class type shall have nonzero size.
Note: this leads to the use of private inheritance for Policy Design in order to have EBO kick in when appropriate
I was wondering if, using the as-if rule, one could still be able to perform this optimization.
edit: following a number of answers and comments, and to make it clearer what I am wondering about.
First, let me give an example:
struct Empty {};
struct Foo { Empty e; int i; };
My question is, why is sizeof(Foo) != sizeof(int) ? In particular, unless you specify some packing, chances are due to alignment issues that Foo will be twice the size of int, which seems ridiculously inflated.
Note: my question is not why is sizeof(Foo) != 0, this is not actually required by EBO either
According to C++, it is because no sub-object may have a zero size. However a base is authorized to have a zero size (EBO) therefore:
struct Bar: Empty { int i; };
is likely (thanks to EBO) to obey sizeof(Bar) == sizeof(int).
Steve Jessop seems to be of an opinion that it is so that no two sub-objects would have the same address. I thought about it, however it doesn't actually prevent the optimization in most cases:
If you have "unused" memory, then it is trivial:
struct UnusedPadding { Empty e; Empty f; double d; int i; };
// chances are that the layout will leave some memory after int
But in fact, it's even "worse" than that, because Empty space is never written to (you'd better not if EBO kicks in...) and therefore you could actually place it at an occupied place that is not the address of another object:
struct Virtual { virtual ~Virtual() {} Empty e; Empty f; int i; };
// most compilers will reserve some space for a virtual pointer!
Or, even in our original case:
struct Foo { Empty e; int i; }; // deja vu!
One could have (char*)foo.e == (char*)foo.i + 1 if all we wanted were different address.
It is coming to c++20 with the [[no_unique_address]] attribute.
The proposal P0840r2 has been accepted into the draft standard. It has this example:
template<typename Key, typename Value, typename Hash, typename Pred, typename Allocator>
class hash_map {
[[no_unique_address]] Hash hasher;
[[no_unique_address]] Pred pred;
[[no_unique_address]] Allocator alloc;
Bucket *buckets;
// ...
public:
// ...
};
Under the as-if rule:
struct A {
EmptyThing x;
int y;
};
A a;
assert((void*)&(a.x) != (void*)&(a.y));
The assert must not be triggered. So I don't see any benefit in secretly making x have size 0, when you'd just need to add padding to the structure anyway.
I suppose in theory a compiler could track whether pointers might be taken to the members, and make the optimization only if they definitely aren't. This would have limited use, since there'd be two different versions of the struct with different layouts: one for the optimized case and one for general code.
But for example if you create an instance of A on the stack, and do something with it that is entirely inlined (or otherwise visible to the optimizer), yes, parts of the struct could be completely omitted. This isn't specific to empty objects, though - an empty object is just a special case of an object whose storage isn't accessed, and therefore could in some situations never be allocated at all.
C++ for technical reasons mandates that empty classes should have non-zero size.
This is to enforce that distinct objects have distinct memory addresses. So compilers silently insert a byte into "empty" objects.
This constraint does not apply to base class parts of derived classes as they are not free-standing.
Because Empty is a POD-type, you can use memcpy to overwrite its "representation", so it better not share it with another C++ object or useful data.
Given struct Empty { }; consider what happens if sizeof(Empty) == 0. Generic code that allocates heap for Empty objects could easily behave differently, as - for example - a realloc(p, n * sizeof(T)), where T is Empty, is then equivalent to free(p). If sizeof(Empty) != 0 then things like memset/memcpy etc. would try to work on memory regions that weren't in use by the Empty objects. So, the compiler would need to stitch up things like sizeof(Empty) on the basis of the eventual usage of the value - that sounds close to impossible to me.
Separately, under current C++ rules the assurance that each member has a distinct address means you can use those addresses to encode some state about those fields - e.g. a textual field name, whether some member function of the field object should be visited etc.. If addresses suddenly coincide, any existing code reliant on these keys could break.