Empty Data Member Optimization: would it be possible?

Empty Data Member Optimization: would it be possible? - c++

In C++, most of the optimizations are derived from the as-if rule. That is, as long as the program behaves as-if no optimization had taken place, then they are valid.
The Empty Base Optimization is one such trick: in some conditions, if the base class is empty (does not have any non-static data member), then the compiler may elide its memory representation.
Apparently it seems that the standard forbids this optimization on data members, that is even if a data member is empty, it must still take at least one byte worth of place: from n3225, [class]
4 - Complete objects and member subobjects of class type shall have nonzero size.
Note: this leads to the use of private inheritance for Policy Design in order to have EBO kick in when appropriate
I was wondering if, using the as-if rule, one could still be able to perform this optimization.
edit: following a number of answers and comments, and to make it clearer what I am wondering about.
First, let me give an example:
struct Empty {};
struct Foo { Empty e; int i; };
My question is, why is sizeof(Foo) != sizeof(int) ? In particular, unless you specify some packing, chances are due to alignment issues that Foo will be twice the size of int, which seems ridiculously inflated.
Note: my question is not why is sizeof(Foo) != 0, this is not actually required by EBO either
According to C++, it is because no sub-object may have a zero size. However a base is authorized to have a zero size (EBO) therefore:
struct Bar: Empty { int i; };
is likely (thanks to EBO) to obey sizeof(Bar) == sizeof(int).
Steve Jessop seems to be of an opinion that it is so that no two sub-objects would have the same address. I thought about it, however it doesn't actually prevent the optimization in most cases:
If you have "unused" memory, then it is trivial:
struct UnusedPadding { Empty e; Empty f; double d; int i; };
// chances are that the layout will leave some memory after int
But in fact, it's even "worse" than that, because Empty space is never written to (you'd better not if EBO kicks in...) and therefore you could actually place it at an occupied place that is not the address of another object:
struct Virtual { virtual ~Virtual() {} Empty e; Empty f; int i; };
// most compilers will reserve some space for a virtual pointer!
Or, even in our original case:
struct Foo { Empty e; int i; }; // deja vu!
One could have (char*)foo.e == (char*)foo.i + 1 if all we wanted were different address.

It is coming to c++20 with the [[no_unique_address]] attribute.
The proposal P0840r2 has been accepted into the draft standard. It has this example:
template<typename Key, typename Value, typename Hash, typename Pred, typename Allocator>
class hash_map {
[[no_unique_address]] Hash hasher;
[[no_unique_address]] Pred pred;
[[no_unique_address]] Allocator alloc;
Bucket *buckets;
// ...
public:
// ...
};

Under the as-if rule:
struct A {
EmptyThing x;
int y;
};
A a;
assert((void*)&(a.x) != (void*)&(a.y));
The assert must not be triggered. So I don't see any benefit in secretly making x have size 0, when you'd just need to add padding to the structure anyway.
I suppose in theory a compiler could track whether pointers might be taken to the members, and make the optimization only if they definitely aren't. This would have limited use, since there'd be two different versions of the struct with different layouts: one for the optimized case and one for general code.
But for example if you create an instance of A on the stack, and do something with it that is entirely inlined (or otherwise visible to the optimizer), yes, parts of the struct could be completely omitted. This isn't specific to empty objects, though - an empty object is just a special case of an object whose storage isn't accessed, and therefore could in some situations never be allocated at all.

C++ for technical reasons mandates that empty classes should have non-zero size.
This is to enforce that distinct objects have distinct memory addresses. So compilers silently insert a byte into "empty" objects.
This constraint does not apply to base class parts of derived classes as they are not free-standing.

Because Empty is a POD-type, you can use memcpy to overwrite its "representation", so it better not share it with another C++ object or useful data.

Given struct Empty { }; consider what happens if sizeof(Empty) == 0. Generic code that allocates heap for Empty objects could easily behave differently, as - for example - a realloc(p, n * sizeof(T)), where T is Empty, is then equivalent to free(p). If sizeof(Empty) != 0 then things like memset/memcpy etc. would try to work on memory regions that weren't in use by the Empty objects. So, the compiler would need to stitch up things like sizeof(Empty) on the basis of the eventual usage of the value - that sounds close to impossible to me.
Separately, under current C++ rules the assurance that each member has a distinct address means you can use those addresses to encode some state about those fields - e.g. a textual field name, whether some member function of the field object should be visited etc.. If addresses suddenly coincide, any existing code reliant on these keys could break.

Related

Reset directly the content of a struct

I have implemented a function that resets the content of the structure to which a pointer points:
template <typename Struct>
void initialize(Struct* s)
{
*s = Struct{};
}
I have performance issues when Struct becomes big (above 10K) because Struct is created in the stack and then assigned to *s. I was wondering if I could improve it:
Is it possible to initialize directly *s without the temporary Struct{} object?
Should I instead evaluate the size of Struct and build it in the heap if it is big?
Thank you in advance

Firstly, you should probably use a reference; not a pointer. The point of this is to avoid null indirection bugs.
If the class is trivial and value initialise to zero (which is typically the case for most trivial types), then the optimiser should compile your function into a call to memset, without any need for initialisation of a temporary object. So there should be no reason to worry in that case.
You could call memset explicitly, although that is technically not portable to exotic systems in case the class contains certain types (for example, null pointer does not necessarily have the representation of zero).
Is it possible to initialize directly *s without the temporary Struct{} object?.
Yes, if you're willing to change the requirements of the function. Currently it works for classes that are default constructible and move assignable.
You can avoid creation of a temporary object if you modify the pointed object directly. In following example, there are no temporaries of type Struct created:
constexpr void
initialize(Struct& s)
{
s.member1 = T1{};
s.member2 = T2{};
To make this generic, the operation could be performed in a member function. Thus, you could specify a requirement that the pointed class has a member function with particular name and no parameters:
s.clear();
You can combine both approaches for types which they apply to:
template<class Struct>
constexpr void
initialize(Struct& s)
{
if constexpr (std::is_trivially_copyable_v<Struct>) {
// alternative 1, if you trust your optimiser
s = Struct{};
// alternative 2, if you doubt the quality of the optimiser
// technically may have different meaning on exotic systems
std::memset(&s, 0, sizeof s);
} else {
s.clear();
}
}
If you need this to work with some classes that conforms to neither requirement, then you'll need to specialise the template.
Should I instead evaluate the size of Struct and build it in the heap if it is big [10K]?
You generally should avoid having public classes that large entirely. If you need such large storage, you could wrap it in a type that allocates it dynamically. Something like this:
class Struct
{
private:
struct VeryLarge{/.../};
std::unique_ptr<VeryLarge> storage;
public:
// public interface

what is the new feature in c++20 [[no_unique_address]]?

i have read the new c++20 feature no_unique_address several times and i hope if some one can explain and illustrate with an example better than this example below taken from c++ reference.
Explanation Applies to the name being declared in the declaration of a
non-static data member that's not a bit field.
Indicates that this data member need not have an address distinct from
all other non-static data members of its class. This means that if the
member has an empty type (e.g. stateless Allocator), the compiler may
optimise it to occupy no space, just like if it were an empty base. If
the member is not empty, any tail padding in it may be also reused to
store other data members.
#include <iostream>
struct Empty {}; // empty class
struct X {
int i;
Empty e;
};
struct Y {
int i;
[[no_unique_address]] Empty e;
};
struct Z {
char c;
[[no_unique_address]] Empty e1, e2;
};
struct W {
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
// e1 and e2 cannot share the same address because they have the
// same type, even though they are marked with [[no_unique_address]].
// However, either may share address with c.
static_assert(sizeof(Z) >= 2);
// e1 and e2 cannot have the same address, but one of them can share with
// c[0] and the other with c[1]
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
can some one explain to me what is the purpose behind this feature and when should i use it?
e1 and e2 cannot have the same address, but one of them can share with c[0] and the other with c[1] can some one explain? why do we have such kind of relation ?

The purpose behind the feature is exactly as stated in your quote: "the compiler may optimise it to occupy no space". This requires two things:
An object which is empty.
An object that wants to have an non-static data member of a type which may be empty.
The first one is pretty simple, and the quote you used even spells it out an important application. Objects of type std::allocator do not actually store anything. It is merely a class-based interface into the global ::new and ::delete memory allocators. Allocators that don't store data of any kind (typically by using a global resource) are commonly called "stateless allocators".
Allocator-aware containers are required to store the value of an allocator that the user provides (which defaults to a default-constructed allocator of that type). That means the container must have a subobject of that type, which is initialized by the allocator value the user provides. And that subobject takes up space... in theory.
Consider std::vector. The common implementation of this type is to use 3 pointers: one for the beginning of the array, one for the end of the useful part of the array, and one for the end of the allocated block for the array. In a 64-bit compilation, these 3 pointers require 24 bytes of storage.
A stateless allocator doesn't actually have any data to store. But in C++, every object has a size of at least 1. So if vector stored an allocator as a member, every vector<T, Alloc> would have to take up at least 32 bytes, even if the allocator stores nothing.
The common workaround to this is to derive vector<T, Alloc> from Alloc itself. The reason being that base class subobject are not required to have a size of 1. If a base class has no members and has no non-empty base classes, then the compiler is permitted to optimize the size of the base class within the derived class to not actually take up space. This is called the "empty base optimization" (and it's required for standard layout types).
So if you provide a stateless allocator, a vector<T, Alloc> implementation that inherits from this allocator type is still just 24 bytes in size.
But there's a problem: you have to inherit from the allocator. And that's really annoying. And dangerous. First, the allocator could be final, which is in fact allowed by the standard. Second, the allocator could have members that interfere with the vector's members. Third, it's an idiom that people have to learn, which makes it folk wisdom among C++ programmers, rather than an obvious tool for any of them to use.
So while inheritance is a solution, it's not a very good one.
This is what [[no_unique_address]] is for. It would allow a container to store the allocator as a member subobject rather than as a base class. If the allocator is empty, then [[no_unique_address]] will allow the compiler to make it take up no space within the class's definition. So such a vector could still be 24 bytes in size.
e1 and e2 cannot have the same address, but one of them can share with c[0] and the other with c1 can some one explain? why do we have such kind of relation ?
C++ has a fundamental rule that its object layout must follow. I call it the "unique identity rule".
For any two objects, at least one of the following must be true:
They must have different types.
They must have different addresses in memory.
They must actually be the same object.
e1 and e2 are not the same object, so #3 is violated. They also share the same type, so #1 is violated. Therefore, they must follow #2: they must not have the same address. In this case, since they are subobjects of the same type, this means that the compiler-defined object layout of this type cannot give them the same offset within the object.
e1 and c[0] are distinct objects, so again #3 fails. But they satisfy #1, since they have different types. Therefore (subject to the rules of [[no_unique_address]]) the compiler could assign them to the same offset within the object. The same goes for e2 and c[1].
If the compiler wants to assign two different members of a class to the same offset within the containing object, then they must be of different types (note that this is recursive through all of each of their subobjects). Therefore, if they have the same type, they must have different addresses.

In order to understand [[no_unique_address]], let's take a look at unique_ptr. It has the following signature:
template<class T, class Deleter = std::default_delete<T>>
class unique_ptr;
In this declaration, Deleter represents a type which provides the operation used to delete a pointer.
We can implement unique_ptr like this:
template<class T, class Deleter>
class unique_ptr {
T* pointer = nullptr;
Deleter deleter;
public:
// Stuff
// ...
// Destructor:
~unique_ptr() {
// deleter must overload operator() so we can call it like a function
// deleter can also be a lambda
deleter(pointer);
}
};
So what's wrong with this implementation? We want unique_ptr to be as light-weight as possible. Ideally, it should be the exact same size as a regular pointer. But because we have the Deleter member, unqiue_ptr will end up being at least 16 bytes: 8 for the pointer, and then 8 additional ones to store the Deleter, even if Deleter is empty.
[[no_unique_address]] solves this issue:
template<class T, class Deleter>
class unique_ptr {
T* pointer = nullptr;
// Now, if Deleter is empty it won't take up any space in the class
[[no_unique_address]] Deleter deleter;
public:
// STuff...

While the other answers explained it pretty well already, let me explain it from a slightly different perspective:
The root of the problem is that C++ does not allow for zero sized objects (i.e. we always have sizeof(obj) > 0).
This is essentially a consequence of very fundamental definitions in the C++ standard: The unique identity rule (as Nicol Bolas explained) but also from the definition of the "object" as a non-empty sequence of bytes.
However this leads to unpleasant issues when writing generic code. This is somewhat expected because here a corner-case (-> empty type) receives a special treatment, that deviates from the systematic behavior of the other cases (-> size increases in a non-systematic way).
The effects are:
Space is wasted, when stateless objects (i.e. classes/structs with no members) are used
Zero length arrays are forbidden.
Since one arrives at these problems very quickly when writing generic code, there have been several attempts for mitigation
The empty base class optimization. This solves 1) for a subset of cases
Introduction of std::array which allows for N==0. This solves 2) but still has issue 1)
The introcduction of [no_unique_address], which finally solves 1) for all remaining cases. At least when the user explicity requests it.
Introduction of std::is_empty. Needed because the obvious sizeof does not work (as sizeof(Empty) >= 1). (Thanks to Dwayne Robinson)
Maybe allowing zero-sized objects would have been the cleaner solution which could have prevented the fragmentation *). However when you search for zero-sized object on SO you will find questions with different answers (sometimes not convincing) and quickly notice that this is a disputed topic.
Allowing zero-sized objects would require a change at the heart of the C++ language and given the fact that the C++ language is very complex already, the standard comittee likely decided for the minimal invasive route and just introduced a new attribute.
Together with the other mitigations from above it finally solves all issues due to disallowal of zero-sized objects. Even though it is maybe not be the nicest solution from a fundamental point of view, it is effective.
*) To me the unique-identity-rule for zero sized types does not make much sense anyway. Why should we want objects, which are stateless per programmers choice (i.e. have no non-static data members), to have an unique address in the first place? The address is some kind of (immutable) state of an object and if the programmer wanted a state they could just add a nonstatic data member.

Why would the behavior of std::memcpy be undefined for objects that are not TriviallyCopyable?

From http://en.cppreference.com/w/cpp/string/byte/memcpy:
If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined.
At my work, we have used std::memcpy for a long time to bitwise swap objects that are not TriviallyCopyable using:
void swapMemory(Entity* ePtr1, Entity* ePtr2)
{
static const int size = sizeof(Entity);
char swapBuffer[size];
memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}
and never had any issues.
I understand that it is trivial to abuse std::memcpy with non-TriviallyCopyable objects and cause undefined behavior downstream. However, my question:
Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects? Why does the standard deem it necessary to specify that?
UPDATE
The contents of http://en.cppreference.com/w/cpp/string/byte/memcpy have been modified in response to this post and the answers to the post. The current description says:
If the objects are not TriviallyCopyable (e.g. scalars, arrays, C-compatible structs), the behavior is undefined unless the program does not depend on the effects of the destructor of the target object (which is not run by memcpy) and the lifetime of the target object (which is ended, but not started by memcpy) is started by some other means, such as placement-new.
PS
Comment by #Cubbi:
#RSahu if something guarantees UB downstream, it renders the entire program undefined. But I agree that it appears to be possible to skirt around UB in this case and modified cppreference accordingly.

Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?
It's not! However, once you copy the underlying bytes of one object of a non-trivially copyable type into another object of that type, the target object is not alive. We destroyed it by reusing its storage, and haven't revitalized it by a constructor call.
Using the target object - calling its member functions, accessing its data members - is clearly undefined[basic.life]/6, and so is a subsequent, implicit destructor call[basic.life]/4 for target objects having automatic storage duration. Note how undefined behavior is retrospective. [intro.execution]/5:
However, if any such execution contains an undefined operation, this
International Standard places no requirement on the implementation
executing that program with that input (not even with regard to
operations preceding the first undefined operation).
If an implementation spots how an object is dead and necessarily subject to further operations that are undefined, ... it may react by altering your programs semantics. From the memcpy call onward. And this consideration gets very practical once we think of optimizers and certain assumptions that they make.
It should be noted that standard libraries are able and allowed to optimize certain standard library algorithms for trivially copyable types, though. std::copy on pointers to trivially copyable types usually calls memcpy on the underlying bytes. So does swap.
So simply stick to using normal generic algorithms and let the compiler do any appropriate low-level optimizations - this is partly what the idea of a trivially copyable type was invented for in the first place: Determining the legality of certain optimizations. Also, this avoids hurting your brain by having to worry about contradictory and underspecified parts of the language.

It is easy enough to construct a class where that memcpy-based swap breaks:
struct X {
int x;
int* px; // invariant: always points to x
X() : x(), px(&x) {}
X(X const& b) : x(b.x), px(&x) {}
X& operator=(X const& b) { x = b.x; return *this; }
};
memcpying such object breaks that invariant.
GNU C++11 std::string does exactly that with short strings.
This is similar to how the standard file and string streams are implemented. The streams eventually derive from std::basic_ios which contains a pointer to std::basic_streambuf. The streams also contain the specific buffer as a member (or base class sub-object), to which that pointer in std::basic_ios points to.

Because the standard says so.
Compilers may assume that non-TriviallyCopyable types are only copied via their copy/move constructors/assignment operators. This could be for optimization purposes (if some data is private, it could defer setting it until a copy / move occurs).
The compiler is even free to take your memcpy call and have it do nothing, or format your hard drive. Why? Because the standard says so. And doing nothing is definitely faster than moving bits around, so why not optimize your memcpy to an equally-valid faster program?
Now, in practice, there are many problems that can occur when you just blit around bits in types that don't expect it. Virtual function tables might not be set up right. Instrumentation used to detect leaks may not be set up right. Objects whose identity includes their location get completely messed up by your code.
The really funny part is that using std::swap; swap(*ePtr1, *ePtr2); should be able to be compiled down to a memcpy for trivially copyable types by the compiler, and for other types be defined behavior. If the compiler can prove that copy is just bits being copied, it is free to change it to memcpy. And if you can write a more optimal swap, you can do so in the namespace of the object in question.

C++ does not guarantee for all types that their objects occupy contiguous bytes of storage [intro.object]/5
An object of trivially copyable or standard-layout type (3.9) shall
occupy contiguous bytes of storage.
And indeed, through virtual base classes, you can create non-contiguous objects in major implementations. I have tried to build an example where a base class subobject of an object x is located before x's starting address. To visualize this, consider the following graph/table, where the horizontal axis is address space, and the vertical axis is the level of inheritance (level 1 inherits from level 0). Fields marked by dm are occupied by direct data members of the class.
L | 00 08 16
--+---------
1 | dm
0 | dm
This is a usual memory layout when using inheritance. However, the location of a virtual base class subobject is not fixed, since it can be relocated by child classes that also inherit from the same base class virtually. This can lead to the situation that the level 1 (base class sub)object reports that it begins at address 8 and is 16 bytes large. If we naively add those two numbers, we'd think it occupies the address space [8, 24) even though it actually occupies [0, 16).
If we can create such a level 1 object, then we cannot use memcpy to copy it: memcpy would access memory that does not belong to this object (addresses 16 to 24). In my demo, is caught as a stack-buffer-overflow by clang++'s address sanitizer.
How to construct such an object? By using multiple virtual inheritance, I came up with an object that has the following memory layout (virtual table pointers are marked as vp). It is composed through four layers of inheritance:
L 00 08 16 24 32 40 48
3 dm
2 vp dm
1 vp dm
0 dm
The issue described above will arise for the level 1 base class subobject. Its starting address is 32, and it is 24 bytes large (vptr, its own data members and level 0's data members).
Here's the code for such a memory layout under clang++ and g++ # coliru:
struct l0 {
std::int64_t dummy;
};
struct l1 : virtual l0 {
std::int64_t dummy;
};
struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;
};
struct l3 : l2, virtual l1 {
std::int64_t dummy;
};
We can produce a stack-buffer-overflow as follows:
l3 o;
l1& so = o;
l1 t;
std::memcpy(&t, &so, sizeof(t));
Here's a complete demo that also prints some info about the memory layout:
#include <cstdint>
#include <cstring>
#include <iomanip>
#include <iostream>
#define PRINT_LOCATION() \
std::cout << std::setw(22) << __PRETTY_FUNCTION__ \
<< " at offset " << std::setw(2) \
<< (reinterpret_cast<char const*>(this) - addr) \
<< " ; data is at offset " << std::setw(2) \
<< (reinterpret_cast<char const*>(&dummy) - addr) \
<< " ; naively to offset " \
<< (reinterpret_cast<char const*>(this) - addr + sizeof(*this)) \
<< "\n"
struct l0 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); }
};
struct l1 : virtual l0 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l0::report(addr); }
};
struct l2 : virtual l0, virtual l1 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l1::report(addr); }
};
struct l3 : l2, virtual l1 {
std::int64_t dummy;
void report(char const* addr) { PRINT_LOCATION(); l2::report(addr); }
};
void print_range(void const* b, std::size_t sz)
{
std::cout << "[" << (void const*)b << ", "
<< (void*)(reinterpret_cast<char const*>(b) + sz) << ")";
}
void my_memcpy(void* dst, void const* src, std::size_t sz)
{
std::cout << "copying from ";
print_range(src, sz);
std::cout << " to ";
print_range(dst, sz);
std::cout << "\n";
}
int main()
{
l3 o{};
o.report(reinterpret_cast<char const*>(&o));
std::cout << "the complete object occupies ";
print_range(&o, sizeof(o));
std::cout << "\n";
l1& so = o;
l1 t;
my_memcpy(&t, &so, sizeof(t));
}
Live demo
Sample output (abbreviated to avoid vertical scrolling):
l3::report at offset 0 ; data is at offset 16 ; naively to offset 48
l2::report at offset 0 ; data is at offset 8 ; naively to offset 40
l1::report at offset 32 ; data is at offset 40 ; naively to offset 56
l0::report at offset 24 ; data is at offset 24 ; naively to offset 32
the complete object occupies [0x9f0, 0xa20)
copying from [0xa10, 0xa28) to [0xa20, 0xa38)
Note the two emphasized end offsets.

Many of these answers mention that memcpy could break invariants in the class, which would cause undefined behaviour later (and which in most cases should be reason enough not to risk it), but that doesn't seem to be what you're really asking.
One reason for why the memcpy call itself is deemed to be undefined behaviour is to give as much room as possible to the compiler to make optimizations based on the target platform. By having the call itself be UB, the compiler is allowed to do weird, platform-dependent things.
Consider this (very contrived and hypothetical) example: For a particular hardware platform, there might be several different kinds of memory, with some being faster than others for different operations. There might, for instance, be a kind of special memory that allows extra fast memory copies. A compiler for this (imaginary) platform is therefore allowed to place all TriviallyCopyable types in this special memory, and implement memcpy to use special hardware instructions that only work on this memory.
If you were to use memcpy on non-TriviallyCopyable objects on this platform, there might be some low-level INVALID OPCODE crash in the memcpy call itself.
Not the most convincing of arguments, perhaps, but the point is that the standard doesn't forbid it, which is only possible through making the memcpy call UB.

memcpy will copy all the bytes, or in your case swap all the bytes, just fine. An overzealous compiler could take the "undefined behaviour" as an excuse to to all kinds of mischief, but most compilers won't do that. Still, it is possible.
However, after these bytes are copied, the object that you copied them to may not be a valid object anymore. Simple case is a string implementation where large strings allocate memory, but small strings just use a part of the string object to hold characters, and keep a pointer to that. The pointer will obviously point to the other object, so things will be wrong. Another example I have seen was a class with data that was used in very few instances only, so that data was kept in a database with the address of the object as a key.
Now if your instances contain a mutex for example, I would think that moving that around could be a major problem.

Another reason that memcpy is UB (apart from what has been mentioned in the other answers - it might break invariants later on) is that it is very hard for the standard to say exactly what would happen.
For non-trivial types, the standard says very little about how the object is laid out in memory, in which order the members are placed, where the vtable pointer is, what the padding should be, etc. The compiler has huge amounts of freedom in deciding this.
As a result, even if the standard wanted to allow memcpy in these "safe" situations, it would be impossible to state what situations are safe and which aren't, or when exactly the real UB would be triggered for unsafe cases.
I suppose that you could argue that the effects should be implementation-defined or unspecified, but I'd personally feel that would be both digging a bit too deep into platform specifics and giving a little bit too much legitimacy to something that in the general case is rather unsafe.

First, note that it is unquestionable that all memory for mutable C/C++ objects has to be un-typed, un-specialized, usable for any mutable object. (I guess the memory for global const variables could hypothetically be typed, there is just no point with such hyper complication for such tiny corner case.) Unlike Java, C++ has no typed allocation of a dynamic object: new Class(args) in Java is a typed object creation: creation an object of a well defined type, that might live in typed memory. On the other hand, the C++ expression new Class(args) is just a thin typing wrapper around type-less memory allocation, equivalent with new (operator new(sizeof(Class)) Class(args): the object is created in "neutral memory". Changing that would mean changing a very big part of C++.
Forbidding the bit copy operation (whether done by memcpy or the equivalent user defined byte by byte copy) on some type gives a lot freedom to the implementation for polymorphic classes (those with virtual functions), and other so called "virtual classes" (not a standard term), that is the classes that use the virtual keyword.
The implementation of polymorphic classes could use a global associative map of addresses which associate the address of a polymorphic object and its virtual functions. I believe that was an option seriously considered during the design of the first iterations C++ language (or even "C with classes"). That map of polymorphic objects might use special CPU features and special associative memory (such features aren't exposed to the C++ user).
Of course we know that all practical implementations of virtual functions use vtables (a constant record describing all dynamic aspects of a class) and put a vptr (vtable pointer) in each polymorphic base class subobject, as that approach is extremely simple to implement (at least for the simplest cases) and very efficient. There is no global registry of polymorphic objects in any real world implementation except possibly in debug mode (I don't know such debug mode).
The C++ standard made the lack of global registry somewhat official by saying that you can skip the destructor call when you reuse the memory of an object, as long as you don't depend on the "side effects" of that destructor call. (I believe that means that the "side effects" are user created, that is the body of the destructor, not implementation created, as automatically done to the destructor by the implementation.)
Because in practice in all implementations, the compiler just uses vptr (pointer to vtables) hidden members, and these hidden members will be copied properly bymemcpy; as if you did a plain member-wise copy of the C struct representing the polymorphic class (with all its hidden members). Bit-wise copies, or complete C struct members-wise copies (the complete C struct includes hidden members) will behave exactly as a constructor call (as done by placement new), so all you have to do it let the compiler think you might have called placement new. If you do a strongly external function call (a call to a function that cannot be inlined and whose implementation cannot be examined by the compiler, like a call to a function defined in a dynamically loaded code unit, or a system call), then the compiler will just assume that such constructors could have been called by the code it cannot examine. Thus the behavior of memcpy here is defined not by the language standard, but by the compiler ABI (Application Binary Interface). The behavior of a strongly external function call is defined by the ABI, not just by the language standard. A call to a potentially inlinable function is defined by the language as its definition can be seen (either during compiler or during link time global optimization).
So in practice, given appropriate "compiler fences" (such as a call to an external function, or just asm("")), you can memcpy classes that only use virtual functions.
Of course, you have to be allowed by the language semantic to do such placement new when you do a memcpy: you cannot willy-nilly redefine the dynamic type of an existing object and pretend you have not simply wrecked the old object. If you have a non const global, static, automatic, member subobject, array subobject, you can overwrite it and put another, unrelated object there; but if the dynamic type is different, you cannot pretend that it's still the same object or subobject:
struct A { virtual void f(); };
struct B : A { };
void test() {
A a;
if (sizeof(A) != sizeof(B)) return;
new (&a) B; // OK (assuming alignement is OK)
a.f(); // undefined
}
The change of polymorphic type of an existing object is simply not allowed: the new object has no relation with a except for the region of memory: the continuous bytes starting at &a. They have different types.
[The standard is strongly divided on whether *&a can be used (in typical flat memory machines) or (A&)(char&)a (in any case) to refer to the new object. Compiler writers are not divided: you should not do it. This a deep defect in C++, perhaps the deepest and most troubling.]
But you cannot in portable code perform bitwise copy of classes that use virtual inheritance, as some implementations implement those classes with pointers to the virtual base subobjects: these pointers that were properly initialized by the constructor of the most derived object would have their value copied by memcpy (like a plain member wise copy of the C struct representing the class with all its hidden members) and wouldn't point the subobject of the derived object!
Other ABI use address offsets to locate these base subobjects; they depend only on the type of the most derived object, like final overriders and typeid, and thus can be stored in the vtable. On these implementation, memcpy will work as guaranteed by the ABI (with the above limitation on changing the type of an existing object).
In either case, it is entirely an object representation issue, that is, an ABI issue.

Ok, lets try your code with a little example:
#include <iostream>
#include <string>
#include <string.h>
void swapMemory(std::string* ePtr1, std::string* ePtr2) {
static const int size = sizeof(*ePtr1);
char swapBuffer[size];
memcpy(swapBuffer, ePtr1, size);
memcpy(ePtr1, ePtr2, size);
memcpy(ePtr2, swapBuffer, size);
}
int main() {
std::string foo = "foo", bar = "bar";
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
swapMemory(&foo, &bar);
std::cout << "foo = " << foo << ", bar = " << bar << std::endl;
return 0;
}
On my machine, this prints the following before crashing:
foo = foo, bar = bar
foo = foo, bar = bar
Weird, eh? The swap does not seem to be performed at all. Well, the memory was swapped, but std::string uses the small-string-optimization on my machine: It stores short strings within a buffer that's part of the std::string object itself, and just points its internal data pointer at that buffer.
When swapMemory() swaps the bytes, it swaps both the pointers and the buffers. So, the pointer in the foo object now points at the storage in the bar object, which now contains the string "foo". Two levels of swap make no swap.
When std::string's destructor subsequently tries to clean up, more evil happens: The data pointer does not point at the std::string's own internal buffer anymore, so the destructor deduces that that memory must have been allocated on the heap, and tries to delete it. The result on my machine is a simple crash of the program, but the C++ standard would not care if pink elephants were to appear. The behavior is totally undefined.
And that is the fundamental reason why you should not be using memcpy() on non-trivially copyable objects: You do not know whether the object contains pointers/references to its own data members, or depends on its own location in memory in any other way. If you memcpy() such an object, the basic assumption that the object cannot move around in memory is violated, and some classes like std::string do rely on this assumption. The C++ standard draws the line at the distinction between (non-)trivially copyable objects to avoid going into more, unnecessary detail about pointers and references. It only makes an exception for trivially copyable objects and says: Well, in this case you are safe. But do not blame me on the consequences should you try to memcpy() any other objects.

What I can perceive here is that -- for some practical applications -- the C++ Standard may be to restrictive, or rather, not permittive enough.
As shown in other answers memcpy breaks down quickly for "complicated" types, but IMHO, it actually should work for Standard Layout Types as long as the memcpy doesn't break what the defined copy-operations and destructor of the Standard Layout type do. (Note that a even TC class is allowed to have a non-trivial constructor.) The standard only explicitly calls out TC types wrt. this, however.
A recent draft quote (N3797):
3.9 Types
...
2 For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes (1.7) making up the object can be copied into
an array of char or unsigned char. If the content of the array of char
or unsigned char is copied back into the object, the object shall
subsequently hold its original value. [ Example:
#define N sizeof(T)
char buf[N]; T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy,
// obj might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
// holds its original value
—end example ]
3 For any trivially copyable type T, if two pointers to T point to
distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a
base-class subobject, if the underlying bytes (1.7) making up obj1 are
copied into obj2, obj2 shall subsequently hold the same value as obj1.
[ Example:
T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p
—end example ]
The standard here talks about trivially copyable types, but as was observed by #dyp above, there are also standard layout types that do not, as far as I can see, necessarily overlap with Trivially Copyable types.
The standard says:
1.8 The C++ object model
(...)
5 (...) An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.
So what I see here is that:
The standard says nothing about non Trivially Copyable types wrt. memcpy. (as already mentioned several times here)
The standard has a separate concept for Standard Layout types that occupy contiguous storage.
The standard does not explicitly allow nor disallow using memcpy on objects of Standard Layout that are not Trivially Copyable.
So it does not seem to be explicitly called out UB, but it certainly also isn't what is referred to as unspecified behavior, so one could conclude what #underscore_d did in the comment to the accepted answer:
(...) You can't just say "well, it
wasn't explicitly called out as UB, therefore it's defined
behaviour!", which is what this thread seems to amount to. N3797 3.9
points 2~3 do not define what memcpy does for non-trivially-copyable
objects, so (...) [t]hat's pretty much functionally
equivalent to UB in my eyes as both are useless for writing reliable, i.e. portable code
I personally would conclude that it amounts to UB as far as portability goes (oh, those optimizers), but I think that with some hedging and knowledge of the concrete implementation, one can get away with it. (Just make sure it's worth the trouble.)
Side Note: I also think that the standard really should explicitly incorporate Standard Layout type semantics into the whole memcpy mess, because it's a valid and useful usecase to do bitwise copy of non Trivially Copyable objects, but that's beside the point here.
Link: Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

Why can it be dangerous to use this POD struct as a base class?

I had this conversation with a colleague, and it turned out to be interesting. Say we have the following POD class
struct A {
void clear() { memset(this, 0, sizeof(A)); }
int age;
char type;
};
clear is intended to clear all members, setting to 0 (byte wise). What could go wrong if we use A as a base class? There's a subtle source for bugs here.

The compiler is likely to add padding bytes to A. So sizeof(A) extends beyond char type (until the end of the padding). However in case of inheritance the compiler might not add the padded bytes. So the call to memset will overwrite part of the subclass.

In addition to the other notes, sizeof is a compile-time operator, so clear() will not zero out any members added by derived classes (except as noted due to padding weirdness).
There's nothing really "subtle" about this; memset is a horrible thing to be using in C++. In the rare cases where you really can just fill memory with zeros and expect sane behaviour, and you really need to fill the memory with zeros, and zero-initializing everything via the initializer list the civilized way is somehow unacceptable, use std::fill instead.

In theory, the compiler can lay out base classes differently. C++03 §10 paragraph 5 says:
A base class subobject might have a layout (3.7) different from the layout of a most derived object of the same type.
As StackedCrooked mentioned, this might happen by the compiler adding padding to the end of the base class A when it exists as its own object, but the compiler might not add that padding when it's a base class. This would cause A::clear() to overwrite the first few bytes of the members of the subclass.
However in practice, I have not been able to get this to happen with either GCC or Visual Studio 2008. Using this test:
struct A
{
void clear() { memset(this, 0, sizeof(A)); }
int age;
char type;
};
struct B : public A
{
char x;
};
int main(void)
{
B b;
printf("%d %d %d\n", sizeof(A), sizeof(B), ((char*)&b.x - (char*)&b));
b.x = 3;
b.clear();
printf("%d\n", b.x);
return 0;
}
And modifying A, B, or both to be 'packed' (with #pragma pack in VS and __attribute__((packed)) in GCC), I couldn't get b.x to be overwritten in any case. Optimizations were enabled. The 3 values printed for the sizes/offsets were always 8/12/8, 8/9/8, or 5/6/5.

The clear method of the base class will only set the values of the class members.
According to alignment rules, the compiler is allowed to insert padding so that the next data member will occur on the aligned boundary. Thus there will be padding after the type data member. The first data member of the descendant will occupy this slot and be free from the effects of memset, since the sizeof the base class does not include the size of the descendant. Size of parent != size of child (unless child has no data members). See slicing.
Packing of structures is not a part of the language standard. Hopefully, with a good compiler, the size of a packed structure does not include any extra bytes after the last. Even so, a packed descendant inheriting from a packed parent should produce the same result: parent sets only the data members in the parent.

Briefly: It seems to me that the only one potentional problem is in that I can not found any info about "padding bytes" guarantees in C89, C2003 standarts....Do they have some extraordinary volatile or readonly behavior - I can not find even what does term "padding bytes" mean by the standarts...
Detailed:
For objects of POD types it is guaranteed by the C++2003 standard that:
when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value
guaranteed that there will be no padding in the beginning of a POD object
can break C++ rules about: goto statement, lifetime
For C89 there is also exist some guarantees about structures:
When used for a mixture of union structures if structs have same begining, then first compoments have perfect mathing
sizeof structures in C is equal to the amount of memory to store all the components, the place under the padding between the components, place padding under the following structures
In C components of the structure are given addresses. There is a guarantee that the components of the address are in ascending order. And the address of the first component coincides with the start address of the structure. Regardless of which endian the computer where the program runs
So It seems to me that such rules is appropriate to C++ also, and all is fine. I really think that in hardware level nobody will restrict you from write in padding bytes for non-const object.

memset() or value initialization to zero out a struct?

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).

If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.

I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.

Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.

The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.

In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.

If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js