Array of non-contiguous objects

Array of non-contiguous objects - c++

#include <iostream>
#include <cstring>
// This struct is not guaranteed to occupy contiguous storage
// in the sense of the C++ Object model (§1.8.5):
struct separated {
int i;
separated(int a, int b){i=a; i2=b;}
~separated(){i=i2=-1;} // nontrivial destructor --> not trivially copyable
private: int i2; // different access control --> not standard layout
};
int main() {
static_assert(not std::is_standard_layout<separated>::value,"sl");
static_assert(not std::is_trivial<separated>::value,"tr");
separated a[2]={{1,2},{3,4}};
std::memset(&a[0],0,sizeof(a[0]));
std::cout<<a[1].i;
// No guarantee that the previous line outputs 3.
}
// compiled with Debian clang version 3.5.0-10, C++14-standard
// (outputs 3)
What is the rationale behind weakening standard guarantees to the point that this program may show undefined behaviour?
The standard says:
"An object of array type contains a contiguously allocated non-empty set of N subobjects of type T." [dcl.array] §8.3.4.
If objects of type T do not occupy contiguous storage, how can an array of such objects do?
edit: removed possibly distracting explanatory text

1.
This is an instance of Occam's razor as adopted by the dragons that actually write compilers: Do not give more guarantees than needed to solve the problem, because otherwise your workload will double without compensation. Sophisticated classes adapted to fancy hardware or to historic hardware were part of the problem. (hinting by BaummitAugen and M.M)
2.
(contiguous=sharing a common border, next or together in sequence)
First, it is not that objects of type T either always or never occupy contiguous storage. There may be different memory layouts for the same type within a single binary.
[class.derived] §10 (8): A base class subobject might have a layout different from ...
This would be enough to lean back and be satisfied that what is happening on our computers does not contradict the standard. But let's amend the question. A better question would be:
Does the standard permit arrays of objects that do not occupy contiguous storage individually, while at the same time every two successive subobjects share a common border?
If so, this would influence heavily how char* arithmetic relates to T* arithmetic.
Depending on whether you understand the OP standard quote meaning that only the subobjects share a common border, or that also within each subobject, the bytes share a common border, you may arrive at different conclusions.
Assuming the first, you find that
'contiguously allocated' or 'stored contiguously' may simply mean &a[n]==&a[0] + n (§23.3.2.1), which is a statement about subobject addresses that would not imply that the array resides within a single sequence of contiguous bytes.
If you assume the stronger version, you may arrive at the 'element offset==sizeof(T)' conclusion brought forward in T* versus char* pointer arithmetic
That would also imply that one could force otherwise possibly non-contiguous objects into a contiguous layout by declaring them T t[1]; instead of T t;
Now how to resolve this mess? There is a fundamentally ambiguous definition of the sizeof() operator in the standard that seems to be a relict of the time when, at least per architecture, type roughly equaled layout, which is not the case any more. (How does placement new know which layout to create?)
When applied to a class, the result [of sizeof()] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. [expr.sizeof] §5.3.3 (2)
But wait, the amount of required padding depends on the layout, and a single type may have more than one layout. So we're bound to add a grain of salt and take the minimum over all possible layouts, or do something equally arbitrary.
Finally, the array definition would benefit from a disambiguation in terms of char* arithmetic, in case this is the intended meaning. Otherwise, the answer to question 1 applies accordingly.
A few remarks related to now deleted answers and comments:
As is discussed in Can technically objects occupy non-contiguous bytes of storage?, non-contiguous objects actually exist. Furthermore, memseting a subobject naively may invalidate unrelated subobjects of the containing object, even for perfectly contiguous, trivially copyable objects:
#include <iostream>
#include <cstring>
struct A {
private: int a;
public: short i;
};
struct B : A {
short i;
};
int main()
{
static_assert(std::is_trivial<A>::value , "A not trivial.");
static_assert(not std::is_standard_layout<A>::value , "sl.");
static_assert(std::is_trivial<B>::value , "B not trivial.");
B object;
object.i=1;
std::cout<< object.B::i;
std::memset((void*)&(A&)object ,0,sizeof(A));
std::cout<<object.B::i;
}
// outputs 10 with g++/clang++, c++11, Debian 8, amd64
Therefore, it is conceivable that the memset in the question post might zero a[1].i, such that the program would output 0 instead of 3.
There are few occasions where one would use memset-like functions with C++-objects at all. (Normally, destructors of subobjects will fail blatantly if you do that.) But sometimes one wishes to scrub the contents of an 'almost-POD'-class in its destructor, and this might be the exception.

Related

Is using std::memcpy on a whole union guaranteed to preserve the active union member?

In C++, it's well-defined to read from an union member which was most recently written, aka the active union member.
My question is whether std::memcpying a whole union object, as opposed to copying a particular union member, to an uninitialized memory area will preserve the active union member.
union A {
int x;
char y[4];
};
A a;
a.y[0] = 'U';
a.y[1] = 'B';
a.y[2] = '?';
a.y[3] = '\0';
std::byte buf[sizeof(A)];
std::memcpy(buf, &a, sizeof(A));
A& a2 = *reinterpret_cast<A*>(buf);
std::cout << a2.y << '\n'; // is `A::y` the active member of `a2`?

The assignments you have are okay because the assignment to non-class member a.y "begins its lifetime". However, your std::memcpy does not do that, so any accesses to members of a2 are invalid. So, you are relying on the consequences of undefined behaviour. Technically. In practice most toolchains are rather lax about aliasing and lifetime of primitive-type union members.
Unfortunately, there's more UB here, in that you're violating aliasing for the union itself: you can pretend that a T is a bunch of bytes, but you can't pretend that a bunch of bytes is a T, no matter how much reinterpret_casting you do. You could instantiate an A a2 normally and std::copy/std::memcpy over it from a, and then you're just back to the union member lifetime problem, if you care about that. But, I guess, if this option were open to you, you'd just be writing A a2 = a in the first place…

My question is whether std::memcpying a whole union object, as opposed to copying a particular union member, to an uninitialized memory area will preserve the active union member.
It'll be copied as expected.
It's how you're reading the result that may, or may not, make your program have undefined behavior.
Using std::memcpy copies char's from one source to a destination. Raw memory copying is ok. Reading from memory as something it wasn't initialized to be is not ok.

So far as I can tell, the C++ Standard makes no distinction between the following two functions on platforms where the sizes of int and foo would happen to be identical [as would typically be the case]
struct s1 { int x; };
struct s2 { int x; };
union foo { s1 a; s2 b; } u1, u2;
void test1(void)
{
u1.a.x = 1;
u2.b.x = 2;
std::memcpy(&u1, &u2, sizeof u1);
}
void test2(void)
{
u1.a = 1;
u2.b = 2;
std::memcpy(&u1.a.x, &u2.b.x, sizeof u1.a.x);
}
If a union of trivially-copyable types is a trivially-copyable type, that would suggest that the active member of u1 after the memcpy in test1 should be b. In the equivalent function test2, however, copying all the bytes from an int object into an one that's part of active union member s1.a should leave the active union member as a.
IMHO, this issue could be easily resolved by recognizing that a union may have multiple "potentially active" members, and allowing certain actions to be performed on any member that is at least potentially active (rather than limiting them to one particular active member). That would among other things allow the Common Initial Sequence rule to be made much clearer more useful, without unduly inhibiting optimizations, by specifying that the act of taking the address of a union member makes it "at least potentially" active, until the next time the union is written via non-character-based access, and providing that common-initial-sequence inspection or bytewise writing of a potentially active union member is allowed, but does not change the active member.
Unfortunately, when the Standard was first written, there was no effort to explore all of the relevant corner cases, much less reach a consensus about how they should be handled. At the time, I don't think there would have been opposition to the idea of officially accommodating multiple potentially-active members, since most compiler designs would naturally accommodate that without difficulty. Unfortunately, some compilers have evolved in a way that would make support for such constructs more difficult than it would have been if accommodated from the start, and their maintainers would block any changes that would contradict their design decisions, even if the Standard was never intended to allow such decisions in the first place.

Before I answer your question, I think your code should add this:
static_assert(std::is_trivial<A>());
Because in order to keep compatibility with C, trivial types get extra guarantees. For example, the requirement of running an object's constructor before using it (See https://eel.is/c++draft/class.cdtor) applies only to an object whose constructor is not trivial.
Because your union is trivial, your code is fine up to and including the memcpy. Where you run into trouble is *reinterpret_cast<A*>(buf);
Specifically, you are using an A object before its lifetime has begin.
As stated in https://eel.is/c++draft/basic.life , lifetime begins when storage with proper alignment and size for the type has been obtained, and its initialization is complete. A trivial type has "vacuous" initialization, so no problem there, however the storage is a problem.
When your example gets storage for buf,
std::byte buf[sizeof(A)];
It does not obtain proper alignment for the type. You'd need to change that line to:
alignas(A) std::byte buf[sizeof(A)];

Copy trivially copyable types using temporary storage areas: is it allowed?

This question is a follow up of a comment to an answer of another question.
Consider the following example:
#include <cstring>
#include <type_traits>
#include <cassert>
int main() {
std::aligned_storage_t<sizeof(void*), alignof(void*)> storage, copy;
int i = 42;
std::memcpy(&storage, &i, sizeof(int));
copy = storage;
int j{};
std::memcpy(&j, &copy, sizeof(int));
assert(j == 42);
}
This works (for some definitions of works). However, the standard tells us this:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char, unsigned char, or std::byte .
If the content of that array is copied back into the object, the object shall subsequently hold its original value. [ Example:
#define N sizeof(T)
char buf[N];
T obj; // obj initialized to its original value
std::memcpy(buf, &obj, N); // between these two calls to std::memcpy, obj might be modified
std::memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type holds its original value
 — end example ]
And this:
For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1. [ Example:
T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p
 — end example ]
In any case it mentions that copying a trivially copyable type in a buffer and then copy it back in a new instance of the original type is allowed.
In the example above I do something similar, plus I copy also the buffer in a new buffer (this resembles a bit more the real world case).
In the comments linked at the top of the question, the author says that this behavior is underspecified. On the other side, I cannot see eg how could I send an int over the network and use it on the other end if this isn't allowed (copy an int in a buffer, send it over the network, receive it as a buffer and memcpy it in an instance of int - more or less what I do in the example, without a network in between).
Is this allowed by some other bullets of the standard I missed or is this really underspecified?

It reads fine to me.
You've copied the underlying bytes of obj1 into obj2. Both are trivial and of the same type. The prose you quote permits this explicitly.
The fact that said underlying bytes were temporarily stored in a correctly-sized and correctly-aligned holding area, via an also-explicitly-permitted reinterpretation as char*, doesn't seem to change that. They're still "those bytes". There's no rule that says copying must be "direct" in order to satisfy features like this.
Indeed, this is not only a completely common pattern when dealing with network transfer (conventional use of course doesn't make it right on its own), but also a historically normal thing to do that the standard would be mad not to account for (which gives me all the assurance I need that it is indeed intended).
I can see how there may be doubt, given that the rule is first given for copying those bytes back into the original object, then given again for copying those bytes into a new object. But I can't detect any logical difference between the two circumstances, and therefore find the first quoted wording to be largely redundant. It's possible the author just wanted to be crystal clear that this safety applies identically in both cases.

To me, this is one of the most ambiguous issues in C++. Honestly speaking, I never got confused by anything in C++ as much as type punning. There's always a corner case that seems to be not covered (or underspecified, like you put it).
However, conversion from integers to raw memory (char*) is supposed to be allowed for serialization/examination of underlying object.
What's the solution?
Unit tests. That's my solution to the problem. You do what complies most with the standard, and you write basic unit tests that test your particular assumption. Then, whenever you compile a new version or move to a new compiler, you run the unit tests and verify that the compiler does what you expect it to do.

Why do c++ standard not guarantee pointer-interconvertibility between an array of objects and its first element? [duplicate]

The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.

There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.

More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.

Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.

Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.

What is the correct way to allocate and use an untyped memory block in C++?

The answers I got for this question until now has two exactly the opposite kinds of answers: "it's safe" and "it's undefined behaviour". I decided to rewrite the question in whole to get some better clarifying answers, for me and for anyone who might arrive here via Google.
Also, I removed the C tag and now this question is C++ specific
I am making an 8-byte-aligned memory heap that will be used in my virtual machine. The most obvious approach that I can think of is by allocating an array of std::uint64_t.
std::unique_ptr<std::uint64_t[]> block(new std::uint64_t[100]);
Let's assume sizeof(float) == 4 and sizeof(double) == 8. I want to store a float and a double in block and print the value.
float* pf = reinterpret_cast<float*>(&block[0]);
double* pd = reinterpret_cast<double*>(&block[1]);
*pf = 1.1;
*pd = 2.2;
std::cout << *pf << std::endl;
std::cout << *pd << std::endl;
I'd also like to store a C-string saying "hello".
char* pc = reinterpret_cast<char*>(&block[2]);
std::strcpy(pc, "hello\n");
std::cout << pc;
Now I want to store "Hello, world!" which goes over 8 bytes, but I still can use 2 consecutive cells.
char* pc2 = reinterpret_cast<char*>(&block[3]);
std::strcpy(pc2, "Hello, world\n");
std::cout << pc2;
For integers, I don't need a reinterpret_cast.
block[5] = 1;
std::cout << block[5] << std::endl;
I'm allocating block as an array of std::uint64_t for the sole purpose of memory alignment. I also do not expect anything larger than 8 bytes by its own to be stored in there. The type of the block can be anything if the starting address is guaranteed to be 8-byte-aligned.
Some people already answered that what I'm doing is totally safe, but some others said that I'm definitely invoking undefined behaviour.
Am I writing correct code to do what I intend? If not, what is the appropriate way?

The global allocation functions
To allocate an arbitrary (untyped) block of memory, the global allocation functions (§3.7.4/2);
void* operator new(std::size_t);
void* operator new[](std::size_t);
Can be used to do this (§3.7.4.1/2).
§3.7.4.1/2
The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function).
And 3.11 has this to say about a fundamental alignment requirement;
§3.11/2
A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t).
Just to be sure on the requirement that the allocation functions must behave like this;
§3.7.4/3
Any allocation and/or deallocation functions defined in a C++ program, including the default versions in the library, shall conform to the semantics specified in 3.7.4.1 and 3.7.4.2.
Quotes from C++ WD n4527.
Assuming the 8-byte alignment is less than the fundamental alignment of the platform (and it looks like it is, but this can be verified on the target platform with static_assert(alignof(std::max_align_t) >= 8)) - you can use the global ::operator new to allocate the memory required. Once allocated, the memory can be segmented and used given the size and alignment requirements you have.
An alternative here is the std::aligned_storage and it would be able to give you memory aligned at whatever the requirement is.
typename std::aligned_storage<sizeof(T), alignof(T)>::type buffer[100];
From the question, I assume here that the both the size and alignment of T would be 8.
A sample of what the final memory block could look like is (basic RAII included);
struct DataBlock {
const std::size_t element_count;
static constexpr std::size_t element_size = 8;
void * data = nullptr;
explicit DataBlock(size_t elements) : element_count(elements)
{
data = ::operator new(elements * element_size);
}
~DataBlock()
{
::operator delete(data);
}
DataBlock(DataBlock&) = delete; // no copy
DataBlock& operator=(DataBlock&) = delete; // no assign
// probably shouldn't move either
DataBlock(DataBlock&&) = delete;
DataBlock& operator=(DataBlock&&) = delete;
template <class T>
T* get_location(std::size_t index)
{
// https://stackoverflow.com/a/6449951/3747990
// C++ WD n4527 3.9.2/4
void* t = reinterpret_cast<void*>(reinterpret_cast<unsigned char*>(data) + index*element_size);
// 5.2.9/13
return static_cast<T*>(t);
// C++ WD n4527 5.2.10/7 would allow this to be condensed
//T* t = reinterpret_cast<T*>(reinterpret_cast<unsigned char*>(data) + index*element_size);
//return t;
}
};
// ....
DataBlock block(100);
I've constructed more detailed examples of the DataBlock with suitable template construct and get functions etc., live demo here and here with further error checking etc..
A note on the aliasing
It does look like there are some aliasing issues in the original code (strictly speaking); you allocate memory of one type and cast it to another type.
It may probably work as you expect on your target platform, but you cannot rely on it. The most practical comment I've seen on this is;
"Undefined behaviour has the nasty result of usually doing what you think it should do, until it doesn’t” - hvd.
The code you have probably will work. I think it is better to use the appropriate global allocation functions and be sure that there is no undefined behaviour when allocating and using the memory you require.
Aliasing will still be applicable; once the memory is allocated - aliasing is applicable in how it is used. Once you have an arbitrary block of memory allocated (as above with the global allocation functions) and the lifetime of an object begins (§3.8/1) - aliasing rules apply.
What about std::allocator?
Whilst the std::allocator is for homogenous data containers and what your are looking for is akin to heterogeneous allocations, the implementation in your standard library (given the Allocator concept) offers some guidance on raw memory allocations and corresponding construction of the objects required.

Update for the new question:
The great news is there's a simple and easy solution to your real problem: Allocate the memory with new (unsigned char[size]). Memory allocated with new is guaranteed in the standard to be aligned in a way suitable for use as any type, and you can safely alias any type with char*.
The standard reference, 3.7.3.1/2, allocation functions:
The pointer returned shall be suitably aligned so that it can be
converted to a pointer of any complete object type and then used to
access the object or array in the storage allocated
Original answer for the original question:
At least in C++98/03 in 3.10/15 we have the following which pretty clearly makes it still undefined behavior (since you're accessing the value through a type that's not enumerated in the list of exceptions):
If a program attempts to access the stored value of an object through
an lvalue of other than one of the following types the behavior is
undefined):
— the dynamic type of the object,
— a cvqualified version of the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cvqualified version of the dynamic type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
— a type that is a (possibly cvqualified) base class type of the dynamic type of the object,
— a char or unsigned char type.

pc pf and pd are all different types that access memory specified in block as uint64_t, so for say 'pf the shared types are float and uint64_t.
One would violate the strict aliasing rule were once to write using one type and read using another since the compile could we reorder the operations thinking there is no shared access. This is not your case however, since the uint64_t array is only used for assignment, it is exactly the same as using alloca to allocate the memory.
Incidentally there is no issue with the strict aliasing rule when casting from any type to a char type and visa versa. This is a common pattern used for data serialization and deserialization.

A lot of discussion here and given some answers that are slightly wrong, but making up good points, I just try to summarize:
exactly following the text of the standard (no matter what version) ... yes, this is undefined behaviour. Note the standard doesn't even have the term strict aliasing -- just a set of rules to enforce it no matter what implementations could define.
understanding the reason behind the "strict aliasing" rule, it should work nicely on any implementation as long as neither float or double take more than 64 bits.
the standard won't guarantee you anything about the size of float or double (intentionally) and that's the reason why it is that restrictive in the first place.
you can get around all this by ensuring your "heap" is an allocated object (e.g. get it with malloc()) and access the aligned slots through char * and shifting your offset by 3 bits.
you still have to make sure that anything you store in such a slot won't take more than 64 bits. (that's the hard part when it comes to portability)
In a nutshell: your code should be safe on any "sane" implementation as long as size constraints aren't a problem (means: the answer to the question in your title is most likely no), BUT it's still undefined behaviour (means: the answer to your last paragraph is yes)

I'll make it short: All your code works with defined semantics if you allocate the block using
std::unique_ptr<char[], std::free>
mem(static_cast<char*>(std::malloc(800)));
Because
every type is allowed to alias with a char[] and
malloc() is guaranteed to return a block of memory sufficiently aligned for all types (except maybe SIMD ones).
We pass std::free as a custom deleter, because we used malloc(), not new[], so calling delete[], the default, would be undefined behaviour.
If you're a purist, you can also use operator new:
std::unique_ptr<char[]>
mem(static_cast<char*>(operator new[](800)));
Then we don't need a custom deleter. Or
std::unique_ptr<char[]> mem(new char[800]);
to avoid the static_cast from void* to char*. But operator new can be replaced by the user, so I'm always a bit wary of using it. OTOH; malloc cannot be replaced (only in platform-specific ways, such as LD_PRELOAD).

Yes, because the memory locations pointed to by pf could overlap depending on the size of float and double. If they didn't, then the results of reading *pd and *pf would be well defined but not the results of reading from block or pc.

The behavior of C++ and the CPU are distinct. Although the standard provides memory suitable for any object, the rules and optimizations imposed by the CPU make the alignment for any given object "undefined" - an array of short would reasonably be 2 byte aligned, but an array of a 3 byte structure may be 8 byte aligned. A union of all possible types can be created and used between your storage and the usage to ensure no alignment rules are broken.
union copyOut {
char Buffer[200]; // max string length
int16 shortVal;
int32 intVal;
int64 longIntVal;
float fltVal;
double doubleVal;
} copyTarget;
memcpy( copyTarget.Buffer, Block[n], sizeof( data ) ); // move from unaligned space into union
// use copyTarget member here.

If you tag this as C++ question,
(1) why use uint64_t[] but not std::vector?
(2) in term of memory management, your code lack of management logic, which should keep track of which blocks are in use and which are free and the tracking of contiguoous blocks, and of course the allocate and release block methods.
(3) the code shows an unsafe way of using memory. For example, the char* is not const and therefore the block can be potentially be written to and overwrite the next block(s). The reinterpret_cast is consider danger and should be abstract from the memory user logic.
(4) the code doesn't show the allocator logic. In C world, the malloc function is untyped and in C++ world, the operator new is typed. You should consider something like the new operator.

Purpose of Unions in C and C++

I have used unions earlier comfortably; today I was alarmed when I read this post and came to know that this code
union ARGB
{
uint32_t colour;
struct componentsTag
{
uint8_t b;
uint8_t g;
uint8_t r;
uint8_t a;
} components;
} pixel;
pixel.colour = 0xff040201; // ARGB::colour is the active member from now on
// somewhere down the line, without any edit to pixel
if(pixel.components.a) // accessing the non-active member ARGB::components
is actually undefined behaviour I.e. reading from a member of the union other than the one recently written to leads to undefined behaviour. If this isn't the intended usage of unions, what is? Can some one please explain it elaborately?
Update:
I wanted to clarify a few things in hindsight.
The answer to the question isn't the same for C and C++; my ignorant younger self tagged it as both C and C++.
After scouring through C++11's standard I couldn't conclusively say that it calls out accessing/inspecting a non-active union member is undefined/unspecified/implementation-defined. All I could find was §9.5/1:
If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members. §9.2/19: Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
While in C, (C99 TC3 - DR 283 onwards) it's legal to do so (thanks to Pascal Cuoq for bringing this up). However, attempting to do it can still lead to undefined behavior, if the value read happens to be invalid (so called "trap representation") for the type it is read through. Otherwise, the value read is implementation defined.
C89/90 called this out under unspecified behavior (Annex J) and K&R's book says it's implementation defined. Quote from K&R:
This is the purpose of a union - a single variable that can legitimately hold any of one of several types. [...] so long as the usage is consistent: the type retrieved must be the type most recently stored. It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another.
Extract from Stroustrup's TC++PL (emphasis mine)
Use of unions can be essential for compatness of data [...] sometimes misused for "type conversion".
Above all, this question (whose title remains unchanged since my ask) was posed with an intention of understanding the purpose of unions AND not on what the standard allows E.g. Using inheritance for code reuse is, of course, allowed by the C++ standard, but it wasn't the purpose or the original intention of introducing inheritance as a C++ language feature. This is the reason Andrey's answer continues to remain as the accepted one.

The purpose of unions is rather obvious, but for some reason people miss it quite often.
The purpose of union is to save memory by using the same memory region for storing different objects at different times. That's it.
It is like a room in a hotel. Different people live in it for non-overlapping periods of time. These people never meet, and generally don't know anything about each other. By properly managing the time-sharing of the rooms (i.e. by making sure different people don't get assigned to one room at the same time), a relatively small hotel can provide accommodations to a relatively large number of people, which is what hotels are for.
That's exactly what union does. If you know that several objects in your program hold values with non-overlapping value-lifetimes, then you can "merge" these objects into a union and thus save memory. Just like a hotel room has at most one "active" tenant at each moment of time, a union has at most one "active" member at each moment of program time. Only the "active" member can be read. By writing into other member you switch the "active" status to that other member.
For some reason, this original purpose of the union got "overridden" with something completely different: writing one member of a union and then inspecting it through another member. This kind of memory reinterpretation (aka "type punning") is not a valid use of unions. It generally leads to undefined behavior is described as producing implementation-defined behavior in C89/90.
EDIT: Using unions for the purposes of type punning (i.e. writing one member and then reading another) was given a more detailed definition in one of the Technical Corrigenda to the C99 standard (see DR#257 and DR#283). However, keep in mind that formally this does not protect you from running into undefined behavior by attempting to read a trap representation.

You could use unions to create structs like the following, which contains a field that tells us which component of the union is actually used:
struct VAROBJECT
{
enum o_t { Int, Double, String } objectType;
union
{
int intValue;
double dblValue;
char *strValue;
} value;
} object;

The behavior is undefined from the language point of view. Consider that different platforms can have different constraints in memory alignment and endianness. The code in a big endian versus a little endian machine will update the values in the struct differently. Fixing the behavior in the language would require all implementations to use the same endianness (and memory alignment constraints...) limiting use.
If you are using C++ (you are using two tags) and you really care about portability, then you can just use the struct and provide a setter that takes the uint32_t and sets the fields appropriately through bitmask operations. The same can be done in C with a function.
Edit: I was expecting AProgrammer to write down an answer to vote and close this one. As some comments have pointed out, endianness is dealt in other parts of the standard by letting each implementation decide what to do, and alignment and padding can also be handled differently. Now, the strict aliasing rules that AProgrammer implicitly refers to are a important point here. The compiler is allowed to make assumptions on the modification (or lack of modification) of variables. In the case of the union, the compiler could reorder instructions and move the read of each color component over the write to the colour variable.

The most common use of union I regularly come across is aliasing.
Consider the following:
union Vector3f
{
struct{ float x,y,z ; } ;
float elts[3];
}
What does this do? It allows clean, neat access of a Vector3f vec;'s members by either name:
vec.x=vec.y=vec.z=1.f ;
or by integer access into the array
for( int i = 0 ; i < 3 ; i++ )
vec.elts[i]=1.f;
In some cases, accessing by name is the clearest thing you can do. In other cases, especially when the axis is chosen programmatically, the easier thing to do is to access the axis by numerical index - 0 for x, 1 for y, and 2 for z.

As you say, this is strictly undefined behaviour, though it will "work" on many platforms. The real reason for using unions is to create variant records.
union A {
int i;
double d;
};
A a[10]; // records in "a" can be either ints or doubles
a[0].i = 42;
a[1].d = 1.23;
Of course, you also need some sort of discriminator to say what the variant actually contains. And note that in C++ unions are not much use because they can only contain POD types - effectively those without constructors and destructors.

In C it was a nice way to implement something like an variant.
enum possibleTypes{
eInt,
eDouble,
eChar
}
struct Value{
union Value {
int iVal_;
double dval;
char cVal;
} value_;
possibleTypes discriminator_;
}
switch(val.discriminator_)
{
case eInt: val.value_.iVal_; break;
In times of litlle memory this structure is using less memory than a struct that has all the member.
By the way C provides
typedef struct {
unsigned int mantissa_low:32; //mantissa
unsigned int mantissa_high:20;
unsigned int exponent:11; //exponent
unsigned int sign:1;
} realVal;
to access bit values.

Although this is strictly undefined behaviour, in practice it will work with pretty much any compiler. It is such a widely used paradigm that any self-respecting compiler will need to do "the right thing" in cases such as this. It's certainly to be preferred over type-punning, which may well generate broken code with some compilers.

In C++, Boost Variant implement a safe version of the union, designed to prevent undefined behavior as much as possible.
Its performances are identical to the enum + union construct (stack allocated too etc) but it uses a template list of types instead of the enum :)

The behaviour may be undefined, but that just means there isn't a "standard". All decent compilers offer #pragmas to control packing and alignment, but may have different defaults. The defaults will also change depending on the optimisation settings used.
Also, unions are not just for saving space. They can help modern compilers with type punning. If you reinterpret_cast<> everything the compiler can't make assumptions about what you are doing. It may have to throw away what it knows about your type and start again (forcing a write back to memory, which is very inefficient these days compared to CPU clock speed).

Technically it's undefined, but in reality most (all?) compilers treat it exactly the same as using a reinterpret_cast from one type to the other, the result of which is implementation defined. I wouldn't lose sleep over your current code.

For one more example of the actual use of unions, the CORBA framework serializes objects using the tagged union approach. All user-defined classes are members of one (huge) union, and an integer identifier tells the demarshaller how to interpret the union.

Others have mentioned the architecture differences (little - big endian).
I read the problem that since the memory for the variables is shared, then by writing to one, the others change and, depending on their type, the value could be meaningless.
eg.
union{
float f;
int i;
} x;
Writing to x.i would be meaningless if you then read from x.f - unless that is what you intended in order to look at the sign, exponent or mantissa components of the float.
I think there is also an issue of alignment: If some variables must be word aligned then you might not get the expected result.
eg.
union{
char c[4];
int i;
} x;
If, hypothetically, on some machine a char had to be word aligned then c[0] and c[1] would share storage with i but not c[2] and c[3].

In the C language as it was documented in 1974, all structure members shared a common namespace, and the meaning of "ptr->member" was defined as adding the
member's displacement to "ptr" and accessing the resulting address using the
member's type. This design made it possible to use the same ptr with member
names taken from different structure definitions but with the same offset;
programmers used that ability for a variety of purposes.
When structure members were assigned their own namespaces, it became impossible
to declare two structure members with the same displacement. Adding unions to
the language made it possible to achieve the same semantics that had been
available in earlier versions of the language (though the inability to have
names exported to an enclosing context may have still necessitated using a
find/replace to replace foo->member into foo->type1.member). What was
important was not so much that the people who added unions have any particular
target usage in mind, but rather that they provide a means by which programmers
who had relied upon the earlier semantics, for whatever purpose, should still
be able to achieve the same semantics even if they had to use a different
syntax to do it.

As others mentioned, unions combined with enumerations and wrapped into structs can be used to implement tagged unions. One practical use is to implement Rust's Result<T, E>, which is originally implemented using a pure enum (Rust can hold additional data in enumeration variants). Here is a C++ example:
template <typename T, typename E> struct Result {
public:
enum class Success : uint8_t { Ok, Err };
Result(T val) {
m_success = Success::Ok;
m_value.ok = val;
}
Result(E val) {
m_success = Success::Err;
m_value.err = val;
}
inline bool operator==(const Result& other) {
return other.m_success == this->m_success;
}
inline bool operator!=(const Result& other) {
return other.m_success != this->m_success;
}
inline T expect(const char* errorMsg) {
if (m_success == Success::Err) throw errorMsg;
else return m_value.ok;
}
inline bool is_ok() {
return m_success == Success::Ok;
}
inline bool is_err() {
return m_success == Success::Err;
}
inline const T* ok() {
if (is_ok()) return m_value.ok;
else return nullptr;
}
inline const T* err() {
if (is_err()) return m_value.err;
else return nullptr;
}
// Other methods from https://doc.rust-lang.org/std/result/enum.Result.html
private:
Success m_success;
union _val_t { T ok; E err; } m_value;
}

You can use a a union for two main reasons:
A handy way to access the same data in different ways, like in your example
A way to save space when there are different data members of which only one can ever be 'active'
1 Is really more of a C-style hack to short-cut writing code on the basis you know how the target system's memory architecture works. As already said you can normally get away with it if you don't actually target lots of different platforms. I believe some compilers might let you use packing directives also (I know they do on structs)?
A good example of 2. can be found in the VARIANT type used extensively in COM.

#bobobobo code is correct as #Joshua pointed out (sadly I'm not allowed to add comments, so doing it here, IMO bad decision to disallow it in first place):
https://en.cppreference.com/w/cpp/language/data_members#Standard_layout tells that it is fine to do so, at least since C++14
In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2 (except that reading a volatile member through non-volatile glvalue is undefined).
since in the current case T1 and T2 donate the same type anyway.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js