C++ standard for member offsets of standard layout struct - c++

Does the C++11 standard guarantee that all compilers will choose the same memory offsets for all members in a given standard layout struct, assuming all members have guaranteed sizes (e.g. int32_t instead of int)?
That is, for a given member in a standard layout struct, does C++11 guarantee that offsetof will give the same value across all compilers?
If so, is there any specification of what that value would be, e.g. as a function of size, alignment, and order of the struct members?

There is no guarantee that offsetof will yield the same values across compilers.
There are guarantees about minimum sizes of types (e.g., char >= 8 bits, short, int >= 16 bits, long >= 32 bits, long long >= 64 bits), and the relationship between sizes1 (sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)).
For most types, it's guaranteed that the alignment requirement is no greater than the size of the type.
For any struct/class, the first (non-static2) element must be at the beginning of the class/struct, and in the absence of changes to the visibility, order is guaranteed to be in the order of definition. For example:
struct { // same if you use `class`
int a;
int b;
};
Since these are both public, a and b must be in that order. But:
struct {
int a;
int b;
private:
int c;
};
The first element (a) is required to be at the beginning of the struct, but because of the change from public to private, the compiler is (theoretically) allowed to arrange c before b.
This rule has changed over time though. In C++98, even a vacuous visibility specifier allowed rearrangement of members.
struct A {
int a;
int b;
public:
int c;
};
The public allows rearranging b and c even though they're both public. Since then it's been tightened up so it's only elements with differing visibility, and in C++ 23 the whole idea of rearranging elements based on visibility is gone (and long past time, in my opinion--I don't think anybody ever used it, so it's always been a rule you sort of needed to know, but did nobody any real good).
If you want to get really technical, the requirement isn't really on the size, but on the range, so in theory the relationship between sizes isn't quite guaranteed, but for for most practical purposes, it is.
A static element isn't normally allocated as part of the class/struct object at all. A static member is basically allocated as a global variable, but with some extra rules about visibility of its name.

No, there are no such guarantees. The C++ standard explicitly provides for type-specific padding and alignment requirements, for one thing, and that automatically dissolves this kind of guarantee.
It might be reasonable to anticipate uniform padding and alignment requirements for a specific hardware platform, that all compilers on that platform will implement, but that again is not guaranteed.

Absolutely not. Memory layout is completely up to the C++ implementation. The only exception, only for standard-layout classes, is that the first non-static data member or base class subobject(s) have zero offset. There are also some other constraints, e.g. due to sizes and alignment of subobjects and constraints on ordering of addresses of subobjects, but nothing that determines concrete offsets of subobjects.
However, typically compilers follow some ABI specification on any given architecture/platform, so that compilers for the same architecture/platform will likely use the same ABI and same memory layout (e.g. the SysV x86-64 ABI together with the Itanium C++ ABI on Linux x86-64 at least for both GCC and Clang).

Related

Is the address of the first data member of an instance the same as the address of the instance? [duplicate]

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.

Does the C/C++ memory model allow atomics at different granularities for the same bytes?

Suppose that I have the following type:
struct T {
int32 high;
int32 low;
};
Is it defined behavior to perform atomic accesses (using, e.g. atomic_load, atomic_fetch_add) on all of x, &x->high, and &x->low (assuming U* x)?
My understanding is that the C/C++ memory models are defined using histories of over individual locations (to accommodate weak memory architectures). If accesses can cross locations, does this mean synchronization across locations? If that is the case, then I assume that would imply that histories are essentially per-byte and accessing an int is just like synchronizing across the underlying 4 (or 8) bytes.
edit: revised the example to avoid the union since the main part of the question is about the concurrency model.
edit: revised to use the standard atomics from stdatomic.h
For C11/C18 (I cannot speak about C++) the Standard atomic_xxx() functions of <stdatomic.h> are only defined to take _Atomic qualified arguments. So to do atomic_xxx() operations on the fields of your struct T you would need:
struct T {
_Atomic int32 high;
_Atomic int32 low;
} ;
struct T foo, bar ;
and then you would be able to do (for example) atomic_fetch_add(&foo->high, 42). But bar = atomic_load(&foo) would be undefined.
Conversely, you could have:
struct T {
int32 high;
int32 low;
} ;
_Atomic struct T foo ;
struct T bar ;
and now bar = atomic_load(&foo) is defined. But access to any individual field in foo is undefined -- whether or not it's _Atomic.
Going by the Standard, _Atomic xxxx objects should be thought of as entirely distinct from "ordinary" xxxx objects -- they may have different sizes, representations and alignments. Casting an xxxx to/from an _Atomic xxxx is, therefore, no more sensible than casting one struct to/from another, different struct.
But, for gcc and the __atomic_xxx() built-ins, you can do whatever the processor will support. Indeed, for gcc the (otherwise) standard atomic_xxx() will accept arguments which are not_Atomic qualified types, and are mapped to the built-ins. clang, on the other hand, treats passing a not _Atomic qualified type to the standard functions as an error. IMHO this is a bug in gcc's <stdatomic.h>.
Loading an inactive union member yields an indeterminate value regardless of using or not using __atomic_load to do that load.

If `atomic<T>` is lock free and has the same size as `T`, will the memory layout be the same?

This question here indicates that std::atomic<T> is generally supposed to have the same size as T, and indeed that seems to be the case for gcc, clang, and msvc on x86, x64, and ARM.
In an implementation where std::atomic<T> is always lock free for some type T, is it's memory layout guaranteed to be the same as the memory layout of T? Are there any additional special requirements imposed by std::atomic, such as alignment?
Upon reviewing [atomics.types.generic], which the answer you linked quotes in part, the only remark regarding alignment is the note which you saw before:
Note: The representation of an atomic specialization need not have the same size as its corresponding argument type. Specializations should have the same size whenever possible, as this reduces the effort required to port existing code
In a newer version:
The representation of an atomic specialization
need not have the same size and alignment requirement as
its corresponding argument type.
Moreover, at least one architecture, IA64, gives a requirement for atomic behavior of instructions such as cmpxchg.acq, which indicates that it's likely that a compiler targeting IA64 may need to align atomic types differently than non-atomic types, even in the absence of a lock.
Furthermore, the use of a compiler feature such as packed structs will cause alignment to differ between atomic and non-atomic variants. Consider the following example:
#include <atomic>
#include <iostream>
struct __attribute__ ((packed)) atom{
char a;
std::atomic_long b;
};
struct __attribute__ ((packed)) nonatom{
char a;
long b;
};
atom atom1;
nonatom nonatom1;
int disp_aligns(int num) {
std::cout<< alignof(atom1.b) << std::endl;
std::cout<< alignof(nonatom1.b) << std::endl;
}
On at least one configuration, the alignment of atom1.b will be on an 8-byte boundary, while the alignment of nonatom1.b will be on a 1-byte boundary. However, this is under the supposition that we requested that the structs be packed; it's not clear whether you are interested in this case.
From the standard:
The representation of an atomic specialization need not have the same size and alignment requirement as its corresponding argument type.
So the answer, at least for now, is no, it is not guaranteed to be the same size, nor have same alignment. But it might have, unless it doesn't and then it won't.

Is sizeof(T) == sizeof(int)?

I've been poring over the draft standard and can't seem to find what I'm looking for.
If I have a standard-layout type
struct T {
unsigned handle;
};
Then I know that reinterpret_cast<unsigned*>(&t) == &t.handle for some T t;
The goal is to create some vector<T> v and pass &v[0] to a C function that expects a pointer to an array of unsigned integers.
So, does the standard define sizeof(T) == sizeof(unsigned) and does that imply that an array of T would have the same layout as an array of unsigned?
While this question addresses a very similar topic, I'm asking about the specific case where both the data member and the class are standard layout, and the data member is a fundamental type.
I've read some paragraphs that seem to hint that maybe it might be true, but nothing that hits the nail on the head. For example:
§ 9.2.17
Two standard-layout struct (Clause 9) types are layout-compatible if
they have the same number of non-static data members and corresponding
non-static data members (in declaration order) have layout-compatible
types
This isn't quite what I'm looking for, I don't think.
You essentially are asking, given:
struct T {
U handle;
};
whether it's guaranteed that sizeof(T) == sizeof(U). No, it is not.
Section 9.2/17 of the ISO C++03 standard says:
A pointer to a POD-struct object, suitably converted using a
reinterpret_cast, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides) and vice versa.
Suppose you have an array of struct T. The vice versa part means that the address of any of the T::handle members must also be a valid address of a struct T. Now, suppose that these members are of type char and that your claim is true. This would mean that struct T would be allowed to have an unaligned address, which seems rather unlikely. The standard usually tries to not tie the hands of implementations in such a way. For your claim to be true, the standard would have to require that struct T be allowed to have unaligned addresses. And it would have to be allowed for all structures, because struct T could be a forward-declared, opaque type.
Furthermore, section 9.2/17 goes on to state:
[Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment.]
Which, taken a different way, means that there is no guarantee that there will never be padding.
I am used to Borland environments and for them:
T is a struct in your case so sizeof(T) is size of struct
that depends on #pragma pack and align setting of your compiler
so sometimes it can be greater than sizeof(unsigned) !!!
for the same reason if you have 4Byte struct (uint32) and 16Byte allign
struct T { uint32 u; };
then T a[100] is not the same as uint32 a[100];
because T is uint32 + 12 Byte empty space !!!
RETRACTION: The argument is erroneous. The proof of Lemma 2 relies on a hidden premise that the alignment of an aggregate type is determined strictly by the alignments of its member types. As Dyp points out in the commentary, that premise is not supported by the standard. It is therefore admissible for struct { Foo f } to have a more strict alignment requirement that Foo.
I'll play devil's advocate here, since no one else seems to be willing. I will argue that standard C++ (I'll refer to N3797 herein) guarantees that sizeof(T) == sizeof(U) when T is a standard layout class (9/7) with default alignment having a single default-aligned non-static data member U, e.g,
struct T { // or class, or union
U u;
};
It's well-established that:
sizeof(T) >= sizeof(U)
offsetof(T, u) == 0 (9.2/19)
U must be a standard layout type for T to be a standard layout class
u has a representation consisting of exactly sizeof(U) contiguous bytes of memory (1.8/5)
Together these facts imply that the first sizeof(U) bytes of the representation of T are occupied by the representation of u. If sizeof(T) > sizeof(U), the excess bytes must then be tail padding: unused padding bytes inserted after the representation of u in T.
The argument is, in short:
The standard details the circumstances in which an implementation may add padding to a standard-layout class,
None of those cirumstances applies in this particular instance, and hence
A conforming implementation may not add padding.
Potential Sources of Padding
Under what circumstances does the standard allow an implementation to add such padding to the representation of a standard layout class? When it's necessary for alignment: per 3.11/1, "An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated." There are two mentions of adding padding, both for alignment reasons:
5.3.3 Sizeof [expr.sizeof]/2 states "When applied to a reference or a reference type, the result is the size of the referenced type. When applied
to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero (1.8). The result of applying sizeof to a base class subobject is the size of the base class type.77 When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element."
9.2 Class members [class.mem]/13 states "Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1)."
(Notably the C++ standard does not contain a blanket statement allowing implementations to insert padding in structures as in the C standards, e.g., N1570 (C11-ish) §6.7.2.1/15 "There may be unnamed padding within a structure object, but not at its beginning." and /17 "There may be unnamed padding at the end of a structure or union.")
Clearly the text of 9.2 doesn't apply to our problem, since (a) T has only one member and thus no "adjacent members", and (b) T is standard layout and hence has no virtual functions or virtual base classes (per 9/7). Demonstrating that 5.3.3/2 doesn't allow padding in our problem is more challenging.
Some Prerequisites
Lemma 1: For any type W with default alignment, alignof(W) divides sizeof(W): By 5.3.3/2, the size of an array of n elements of type W is exactly n times sizeof(W) (i.e., there is no "external" padding between array elements). The addresses of consecutive array elements are then sizeof(W) bytes apart. By the definition of alignment, it must then be that alignof(W) divides sizeof(W).
Lemma 2: The alignment alignof(W) of a default-aligned standard layout class W with only default-aligned data members is the least common multiple LCM(W) of the alignments of the data members (or 1 if there are none): Given an address at which an object of W can be allocated, the address LCM(W) bytes away must also be appropriately aligned: the difference between the addresses of member subobjects would also be LCM(W) bytes, and the alignment of each such member subobject divides LCM(W). Per the definition of alignment in 3.11/1, we have that alignof(W) divides LCM(W). Any whole number of bytes n < LCM(W) must not be divisible by the alignment of some member v of W, so an address that is only n bytes away from an address at which an object of W can be allocated is consequently not appropriately aligned for an object of W, i.e., alignof(W) >= LCM(W). Given that alignof(W) divides LCM(W) and alignof(W) >= LCM(W), we have alignof(W) == LCM(W).
Conclusion
Application of this lemma to the original problem has the immediate consequence that alignof(T) == alignof(U). So how much padding might be "required for placing objects of that type in an array"? None. Since alignof(T) == alignof(U) by the second lemma, and alignof(U) divides sizeof(U) by the first, it must be that alignof(T) divides sizeof(U) so zero bytes of padding are required to place objects of type T in an array.
Since all possible sources of padding bytes have been eliminated, an implementation may not add padding to T and we have sizeof(T) == sizeof(U) as required.

C/C++ Pointer to a POD struct also points to the 1st struct member

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.