Is it implementation-defined that how to deal with [[no_unique_address]]? - c++

Below is excerpted from cppref but reduced to demo:
#include <iostream>
struct Empty {}; // empty class
struct W
{
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
std::cout << std::boolalpha;
// e1 and e2 cannot have the same address, but one of them can share with
// c[0] and the other with c[1]
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
The documentation says the output might be:
sizeof(W) == 2 is true
However, both gcc and clang output as follows:
sizeof(W) == 2 is false
Is it implementation-defined that how to deal with [[no_unique_address]]?

See [intro.object]/8:
An object has nonzero size if ... Otherwise, if the object is a base class subobject of a standard-layout class type with no non-static data members, it has zero size.
Otherwise, the circumstances under which the object has zero size are implementation-defined.
Empty base class optimization became mandatory for standard-layout classes in C++11 (see here for discussion). Empty member optimization is never mandatory. It is implementation-defined, as you suspected.

Yes, virtually all aspects of no_unique_address are implementation-defined. It is a tool to allow for optimizations, not to enforce them.
That being said, you should never assume that no_unique_address will work when you attempt to have two subobjects with the same type. The standard still requires that all distinct subobjects of the same type have different addresses, no_unique_address or not. And while it is possible that the compiler could assign these empty subobjects distinct addresses by radically reordering the members... they're pretty much not going to do that.
Your best bet for reasonably taking advantage of no_unique_address optimizations is to never have two subobjects of the same type, and try to put all possibly empty members first. That is, you should expect implementations to assign an empty no_unique_address member to the offset of the next member (or to the offset of the containing struct as a whole).

Related

Do data member addresses lie between (this) and (this+1)?

Suppose that we have the following two inequalities inside a member function
this <= (void *) &this->data_member
and
&this->data_member < (void *) (this+1)
Are they guaranteed to be true?
(They seem to be true in a few cases that I checked.)
Edit: I missed ampersands, now it's the correct form of inequalities.
From CPP standard draft 4713:
6.6.2 Object model [intro.object]/7
An object of trivially copyable or standard-layout type (6.7) shall occupy contiguous bytes of storage.
12.2 Class members [class.mem]/18
Non-static data members of a (non-union) class with the same access control (Clause 14) are allocated so that later members have higher addresses within a class object.
12.2 Class members [class.mem]/25
If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. Otherwise, its address is the same as the address of its first base class subobject (if any).
Taking all the above together, we can say the first equation holds for at least trivially copyable objects.
Also from the online cpp reference:
The result of comparing two pointers to objects (after conversions) is defined as follows:
1) If two pointers point to different elements of the same array, or to subobjects within different elements of the same array, the pointer to the element with the higher subscript compares greater. In other words, they results of comparing the pointers is the same as the result of comparing the indexes of the elements they point to.
2) If one pointer points to an element of an array, or to a subobject of the element of the array, and another pointer points one past the last element of the array, the latter pointer compares greater. Pointers to single objects are treated as pointers to arrays of one: &obj+1 compares greater than &obj (since C++17)
So if your data_member is not a pointer and has not been allocated memory separately, the equations you have posted hold good
for at least trivially copyable objects.
The full standard text amounts to this:
[expr.rel] - 4: The result of comparing unequal pointers to objects82 is defined in terms of a partial order consistent with the following rules:
We are dealing with a partial order here, not a total order. That does mean a < b and b < c implies a < c, but not much else.
(Note 82 states that non-array objects are considered elements of a single-element array for this purpose, with the intuitive meaning/behavior of "pointer to element one past the end").
(4.1)
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Pointers to different members are not pointers to (subobjects of) elements of the same array. This rule does not apply.
(4.2)
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control ([class.access]), neither member is a subobject of zero size, and their class is not a union.
This rule only relates pointers to data members of the same object, not of a different object.
(4.3)
Otherwise, neither pointer is required to compare greater than the other.
Thus, you do not get any guarantees from the standard. Whether you can find a real-world system where you get a different result than you expect is another question.
The value of an object is within its representation, and this representation is a sequence of unsigned char: [basic.types]/4
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
The value representation of an object of type T is the set of bits that participate in representing a value of type T.[...]
So for formalism fundamentalists, it is true that value is not defined but the term appears in the definition of access:[defns.access]:
read or modify the value of an object
So is the value of a suboject part of the value of a complete object? I suppose this is what is intended by the standard.
The comparison should be true if you cast object pointers to unsigned char*. (This is a common practice that falls on an under-specification core issue #1701)
No, here's a counterexample
#include <iostream>
struct A
{
int a_member[10];
};
struct B : public virtual A
{
int b_member[10];
void print_b() { std::cout << static_cast<void*>(this) << " " << static_cast<void*>(std::addressof(this->a_member)) << " " << static_cast<void*>(this + 1) << std::endl; }
};
struct C : public virtual A
{
int c_member[10];
void print_c() { std::cout << static_cast<void*>(this) << " " << static_cast<void*>(std::addressof(this->a_member)) << " " << static_cast<void*>(this + 1) << std::endl; }
};
struct D : public B, public C
{
void print_d()
{
print_b();
print_c();
}
};
int main()
{
D d;
d.print_d();
}
With the possible output (as seen here)
0x7fffc6bf9fb0 0x7fffc6bfa010 0x7fffc6bfa008
0x7fffc6bf9fe0 0x7fffc6bfa010 0x7fffc6bfa038
Note that the a_member is outside of the B pointed to by this in print_b

Do array elements count as a common initial sequence?

Sort of related to my previous question:
Do elements of arrays count as a common initial sequence?
struct arr4 { int arr[4]; };
struct arr2 { int arr[2]; };
union U
{
arr4 _arr4;
arr2 _arr2;
};
U u;
u._arr4.arr[0] = 0; //write to active
u._arr2.arr[0]; //read from inactive
According to this cppreference page:
In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2....
Would this be legal, or would it also be illegal type punning?
C++11 says (9.2):
If a standard-layout union contains two or more standard-layout structs that share a common initial sequence,
and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted
to inspect the common initial part of any of them. Two standard-layout structs share a common initial
sequence if corresponding members have layout-compatible types and either neither member is a bit-field or
both are bit-fields with the same width for a sequence of one or more initial members.
As to whether arrays of different size form a valid common initial sequence, 3.9 says:
If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types
These arrays are not the same type, so this doesn't apply. There is no special further exception for arrays, so the arrays may not be layout-compatible and do not form a common initial sequence.
In practice, though, I know of a compiler (GCC) which:
ignores the "common initial sequence" rule, and
allows type punning anyway, but only when accesses are "via the union type" (as in your example), in which case the "common initial sequence" rule is obeyed indirectly (because a "common initial sequence" implies a common initial layout on the architectures the compiler supports).
I suspect many other compilers take a similar approach. In your example, where you type-pun via the union object, such compilers will give you the expected result - reading from the inactive member should give you value written via the inactive member.
The C Standard would allow an implementation to vary the placement of an array object within a structure based upon the number of elements. Among other things, there may be some circumstances where it may be useful to word-align a byte array which would occupy exactly one word, but not to word-align arrays of other sizes. For example, on a system with 8-bit char and 32-bit words, processing a structure such as:
struct foo {
char header;
char dat[4];
};
in a manner that word-aligns dat may allow an access to dat[i] to be processed by loading a word and shifting it right by a 0, 8, 16, or 24 bits, but such advantages might not be applicable had the structure instead been:
struct foo {
char header;
char dat[5];
};
The Standard was clearly not intended to forbid implementations from laying out structures in such ways, on platforms where doing so would be useful. On the other hand, when the Standard was written, compilers which would place arrays within a structure at offsets that were unaffected by the arrays' sizes would unanimously behave as though array elements that were present in two structures were part of the same Common Initial Sequence, and nothing in the published Rationale for the Standard suggests any intention to discourage such implementations from continuing to behave in such fashion. Code which relied upon such treatment would have been "non-portable", but correct on all implementations which followed common struct layout practices.

Union initialization

What are the rules that govern the uninitialized bytes of a union ? (Assuming some are initialized)
Below is a 32 bytes union of which I initialize only the first 16 bytes via the first member.
It seems the remaining bytes are zero-initialized. That's great for my use case but I am wondering what's the rule behind this - I was expecting garbage.
#include <cstdint>
#include <iostream>
using namespace std;
union Blah {
struct {
int64_t a;
int64_t b;
};
int64_t c[4];
}
int main()
{
Blah b = {{ 1, 2 }}; // initialize first member, so only the first 16 bytes.
// prints 1, 2, 0, 0 -- not 1, 2, <garbage>, <garbage>
cout << b.c[0] << ", " << b.c[1] << ", " << b.c[2] << ", " << b.c[3] << '\n';
return 0;
}
I've compiled on GCC 4.7.2 with -O3 -Wall -Wextra -pedantic (that last one required giving a name to the anonymous struct). That hopefully should save me from being lucky.
I've also tried to overlay two variables with two different scopes on the stack but gcc didn't give them the same address.
I've also tried replacing the array by another struct in that case that would have mattered, but it didn't change anything.
I can't access online compilers from here, they're blocked by my work.
The most pertinent part of the C11 standard 6.2.6.1.7, while not speaking specifically to initialization:
When a value is stored in a member of an object of union type, the
bytes of the object representation that do not correspond to that
member but do correspond to other members take unspecified values.
Section 6.7.9.17 says:
Each brace-enclosed initializer list has an associated current object.
When no designations are present, subobjects of the current object are
initialized in order according to the type of the current object:
array elements in increasing subscript order, structure members in
declaration order, and the first named member of a union.
but doesn't explicitly come out and say the other bits are not initialized. For static unions, 6.7.9.10 says:
the first named member is initialized (recursively) according to these
rules, and any padding is initialized to zero bits;
so the first named member and any padding bits would be zero-initialized, but the bits corresponding to other (by implication, larger) members of the union would be unspecified.
So you cannot count on those extra bytes being initialized to zero.
Note that technically, even if you do initialize your c array to zero, the moment you store something in your struct those excess bits become unspecified again, and you can't count on them still being zero. There's a lot of code out there which assumes this is true (e.g. putting a char array in a union to access the individual bytes), and in reality it probably will be, but the standard doesn't guarantee it.
Brace-enclosed initializers for a union are only permitted to initialize the first member. This is fine, and your initializer does initialize the anonymous struct, and causes the first member to be the active member.
In C++ only one member of a union may be active at any time. Trying to read the other members via the union causes undefined behaviour. Trying to read them by aliasing them as a character type gives unspecified values.
So I would say that the observed behavior is backed by the standard.
ISO/IEC 9899:201x in 6.7.9 (Initialization) statement 12 says:
If there are fewer initializers in a brace-enclosed list than there are elements or members
of an aggregate, or fewer characters in a string literal used to initialize an array of known
size than there are elements in the array, the remainder of the aggregate shall be
initialized implicitly the same as objects that have static storage duration.
Static objects are initialized to 0 (see 6.7.9.10 or The initialization of static variables in C).

Are anonymous unions acceptable for aliasing member variables in a struct?

Let's say that I have the following C++ code:
struct something
{
// ...
union { int size, length; };
// ...
};
This would create two members of the struct which access the same value: size and length.
Would treating the two members as complete aliases (i.e. setting the size, then accessing the length and vice/versa) be undefined behaviour? Is there a "better" way to implement this type of behaviour, or is this an acceptable implementation?
Yes, this is allowed and well-defined. According to §3.10 [basic.lval]:
10/ If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
— the dynamic type of the object
[...]
Since here we store an int and read through an int, we access the object through a glvalue of the same dynamic type than the object, thus things are fine.
There even is a special caveat in the Standard for structures that share the same prefix. Or, in standardese, standard-layout types that share a common initial sequence.
§9.2/18 If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
That is:
struct A { unsigned size; char type; };
struct B { unsigned length; unsigned capacity; };
union { A a; B b; } x;
assert(x.a.size == x.b.length);
EDIT: Given that int is not a struct (nor a class) I am afraid it's actually not formally defined (I certainly could not see anything in the Standard), but should be safe in practice... I've brought the matters to the isocpp forums; you might have found a hole.
EDIT: Following the above mentionned discussion, I have been shown §3.10/10.
It is not undefined behavior. Both of the aliases in the union will be accessing the same location in the memory. See below:
§9.2/18 If a standard-layout union contains two or more
standard-layout structs that share a common initial sequence, and if
the standard-layout union object currently contains one of these
standard-layout structs, it is permitted to inspect the common initial
part of any of them. Two standard-layout structs share a common
initial sequence if corresponding members have layout-compatible types
and either neither member is a bit-field or both are bit-fields with
the same width for a sequence of one or more initial members.
It is undefined if types have different initial sequence.
Values will be same. If you assign 5 to size then length will also be 5.

Can sizeof return 0 (zero)

Is it possible for the sizeof operator to ever return 0 (zero) in C or C++? If it is possible, is it correct from a standards point of view?
In C++ an empty class or struct has a sizeof at least 1 by definition. From the C++ standard, 9/3 "Classes": "Complete objects and member subobjects of class type shall have nonzero size."
In C an empty struct is not permitted, except by extension (or a flaw in the compiler).
This is a consequence of the grammar (which requires that there be something inside the braces) along with this sentence from 6.7.2.1/7 "Structure and union specifiers": "If the struct-declaration-list contains no named members, the behavior is undefined".
If a zero-sized structure is permitted, then it's a language extension (or a flaw in the compiler). For example, in GCC the extension is documented in "Structures with No Members", which says:
GCC permits a C structure to have no members:
struct empty {
};
The structure will have size zero. In C++, empty structures are part of the language. G++ treats empty structures as if they had a single member of type char.
sizeof never returns 0 in C and in C++. Every time you see sizeof evaluating to 0 it is a bug/glitch/extension of a specific compiler that has nothing to do with the language.
Every object in C must have a unique address. Worded another way, an address must hold no more than one object of a given type (in order for pointer dereferencing to work). That being said, consider an 'empty' struct:
struct emptyStruct {};
and, more specifically, an array of them:
struct emptyStruct array[10];
struct emptyStruct* ptr = &array[0];
If the objects were indeed empty (that is, if sizeof(struct emptyStruct) == 0), then ptr++ ==> (void*)ptr + sizeof(struct emptyStruct) ==> ptr, which doesn't make sense. Which object would *ptr then refer to, ptr[0] or ptr[1]?
Even if a structure has no contents, the compiler should treat it as if it is one byte in length in order to maintain the "one address, one object" principle.
The C language specification (section A7.4.8) words this requirement as
when applied to a structure or union,
the result (of the sizeof operator)
is the number of bytes in the object,
including any padding required to make
the object tile an array
Since a padding byte must be added to an "empty" object in order for it to work in an array, sizeof() must therefore return a value of at least 1 for any valid input.
Edit:
Section A8.3 of the C spec calls a struct without a list of members an incomplete type, and the definition of sizeof specifically states (with emphasis added):
The operator (sizeof) may not be
applied to an operand of function
type, or of incomplete type, or to a
bit-field.
That would imply that using sizeof on an empty struct would be equally as invalid as using it on a data type that has not been defined. If your compiler allows the use of empty structs, be aware that using sizeof on them is not allowed as per the C spec. If your compiler allows you to do this anyway, understand that this is non-standard behavior that will not work on all compilers; do not rely on this behavior.
Edit: See also this entry in Bjarne Stroustrup's FAQ.
Empty structs, as isbadawi mentions. Also gcc allows arrays of 0 size:
int a[0];
sizeof(a);
EDIT: After seeing the MSDN link, I tried the empty struct in VS2005 and sizeof did return 1. I'm not sure if that's a VS bug or if the spec is somehow flexible about that sort of thing
in my view, it is better that sizeof returns 0 for a structure of size 0 (in the spirit of c).
but then the programmer has to be careful when he takes the sizeof an empty struct.
but it may cause a problem.
when array of such structures is defined, then
&arr[1] == &arr[2] == &arr[0]
which makes them lose their identities.
i guess this doesnt directly answer your question, whether it is possible or not.
well that may be possible depending on the compiler. (as said in Michael's answer above).
typedef struct {
int : 0;
} x;
x x1;
x x2;
Under MSVC 2010 (/Za /Wall):
sizeof(x) == 4
&x1 != &x2
Under GCC (-ansi -pedantic -Wall) :
sizeof(x) == 0
&x1 != &x2
i.e. Even though under GCC it has zero size, instances of the struct have distinct addresses.
ANSI C (C89 and C99 - I haven't looked at C++) says "It shall be possible to express the address of each individual byte of an object uniquely." This seems ambiguous in the case of a zero-sized object, since it arguably has no bytes.
Edit: "A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field. As a special case of this, a bit-field with a width of 0 indicates that no further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed."
I think it never returns 0 in c , no empty structs is allowed
Here's a test, where sizeof yields 0
#include <stdio.h>
void func(int i)
{
int vla[i];
printf ("%u\n",(unsigned)sizeof vla);
}
int main(void)
{
func(0);
return 0;
}
If you have this :
struct Foo {};
struct Bar { Foo v[]; }
g++ -ansi returns sizeof(Bar) == 0. As does the clang & intel compiler.
However, this does not compile with gcc. I deduce it's a C++ extension.
struct Empty {
} em;
struct Zero {
Empty a[0];
} zr;
printf("em=%d\n", sizeof(em));
printf("zr=%d\n", sizeof(zr));
Result:
em=1
zr=0