just a simple question... what the standard says about the structure members alignment?
for example with this one:
struct
{
uint8_t a;
uint8_t b;
/* other members */
} test;
It is guarateed that b is at offset 1 from the struct start?
Thanks
The standard (as of C99) doesn't really say anything.
The only real guarantees are that (void *)&test == (void *)&a, and that a is at a lower address than b. Everything else is up to the implementation.
C11 6.7.2.1 Structure and union specifiers p14 says
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type.
meaning that you can't make any portable assumptions about the difference between the addresses of a and b.
It should be possible to use offsetof to determine the offset of members.
For C the alignment is implementation defined, we can see that in the draft C99 standard section 6.7.2.1 Structure and union specifiers paragraph 12(In C11 it would be paragraph 14) which says:
Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.
and paragraph 13 says:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
and for C++ we have the following similar quotes from the draft standard section 9.2 Class members paragraph 13 says:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;
and paragraph 19 says:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. —end note ]
the case you're using is not really an edge case, both uint_8 are small enough to fit in the same word in memory and it would be no use to put each uint_8 in a uint_16.
A more critical case would be something like :
{
uint8_t a;
uint8_t b;
uint_32 c; // where is C, at &a+2 or &a+4 ?
/* other members */
} test;
and anyway this will always depend on the target architecture and your compiler...
K&R second edition (ANSI C) in chapter 6.4 (page 138) says:
Don't assume, however, that the size of a structure is the sum of the sizes of its memebers. Because of alignment requirements for different objects, there may be unnamed "holes" in a structure.
So no, ANSI C does not guarantee that b is at offset 1.
It is even likely that the compiler puts b at offset sizeof(int) so that it's aligned on the size of a machine word, which is easier to deal with.
Some compilers support pack-pragmas so that you can force that there are no such "holes" in the struct, but that is not portable.
What is guaranteed by the C-Standard already had been mentioned by other answers.
However, to make sure b is at offset 1 your compiler might offer options to "pack" the structure, will say to explicitly add no padding.
For gcc this can be achieved by the #pragma pack().
#pragma pack(1)
struct
{
uint8_t a; /* Guaranteed to be at offset 0. */
uint8_t b; /* Guaranteed to be at offset 1. */
/* other members are guaranteed to start at offset 2. */
} test_packed;
#pragma pack()
struct
{
uint8_t a; /* Guaranteed to by at offset 0. */
uint8_t b; /* NOT guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_unpacked;
A portable (and save) solution would be to simply use an array:
struct
{
uint8_t ab[2]; /* ab[0] is guaranteed to be at offset 0. */
/* ab[1] is guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_packed;
Related
Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.
I am currently working on a project in which I am provided the following
structre. My work is C++ but the project uses both C and C++. The same structure
definition is used by both C and C++.
typedef struct PacketHeader {
//Byte 0
uint8_t bRes :4;
uint8_t bEmpty :1;
uint8_t bWait :1;
uint8_t bErr :1;
uint8_t bEnable :1;
//Byte 1
uint8_t bInst :4;
uint8_t bCount :3;
uint8_t bRres :1;
//Bytes 2, 3
union {
uint16_t wId; /* Needed for Endian swapping */
struct{
uint16_t wMake :4;
uint16_t wMod :12;
};
};
} PacketHeader;
Depending on how instances of the structure are used, the required endianness of
the structure can be big or little endian. As the first two bytes of the
structure are each single bytes, these don't need altering when the endianness
changes.
Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to
swap to acheive the desired endianness. To acheive the endianness swap, we have
been performing the following:
//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);
uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;
myHeader.wId = packetIdFlipped;
The function BuildPacketHeader(uint8_t) assigns values to the members wMake and
wMod explicitly, and does not write to the member wId. My question is regarding
the safety of reading from the member wId inside the returned instance of the
structure.
Questions such as
Accessing inactive union member and undefined behavior?,
Purpose of Unions in C and C++,
and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.
Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:
[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Is the unnamed struct the active member as it was the last member written to in the function call?
Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)
Thanks in advance
The function BuildPacketHeader(uint8_t) assigns values to the members
wMake and wMod explicitly, and does not write to the member wId. My
question is regarding the safety of reading from the member wId inside
the returned instance of the structure.
Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.
and is this what is meant by common initial sequence?
The common initial sequence is when two standard layout types share the same members in the same order. In
struct foo
{
int a;
int b;
};
struct bar
{
int a;
int b;
int c;
};
a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.
This is not your case though since wId isn't a standard layout type struct.
What the C++ standard kind of says is given two structs A and B and the following untion:
union U
{
A a;
B b;
};
The following is valid code:
U u;
A a;
u.a = a;
a = u.a;
B b;
u.b = b;
b = u.b;
You read and write the same type. This is obviously correct code.
But the problem comes when you have the following code:
A a;
B b;
u.a = a;
b = u.b;
What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.
But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.
Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.
inside a definition like this
typedef struct
{
myType array[N];
} myStruct;
myStruct obj;
can I always assume that ([edit] assuming proper casting will happen which is not the focus of the question here [/edit])
(&obj == &obj.array[0])
will return TRUE or the should I worry about the compiler introducing extra padding to accomodate the myType alignment requisites? In theory this shouldn't happen as the struct has a single field but I'm not entirely sure about this.
With a proper cast, this will always return true.
From section 6.7.2.1 of the C standard:
13. Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase
in the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within a
structure object, but not at its beginning.
According to the current C++ standard draft [class.mem] §20 (N4527), emphasis added:
If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member. Otherwise, its address is the same as the address of its first base class
subobject (if any). [ Note: There might therefore be unnamed padding within a standard-layout struct
object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ]
Whether myStruct is standard-layout, depends on whether myType is standard-layout. If it is, then what you're asking is guaranteed by the c++ standard.
Note that &obj and &obj.array[0] have unrelated pointer types, so the expression is not legal in c++.
I've been poring over the draft standard and can't seem to find what I'm looking for.
If I have a standard-layout type
struct T {
unsigned handle;
};
Then I know that reinterpret_cast<unsigned*>(&t) == &t.handle for some T t;
The goal is to create some vector<T> v and pass &v[0] to a C function that expects a pointer to an array of unsigned integers.
So, does the standard define sizeof(T) == sizeof(unsigned) and does that imply that an array of T would have the same layout as an array of unsigned?
While this question addresses a very similar topic, I'm asking about the specific case where both the data member and the class are standard layout, and the data member is a fundamental type.
I've read some paragraphs that seem to hint that maybe it might be true, but nothing that hits the nail on the head. For example:
§ 9.2.17
Two standard-layout struct (Clause 9) types are layout-compatible if
they have the same number of non-static data members and corresponding
non-static data members (in declaration order) have layout-compatible
types
This isn't quite what I'm looking for, I don't think.
You essentially are asking, given:
struct T {
U handle;
};
whether it's guaranteed that sizeof(T) == sizeof(U). No, it is not.
Section 9.2/17 of the ISO C++03 standard says:
A pointer to a POD-struct object, suitably converted using a
reinterpret_cast, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides) and vice versa.
Suppose you have an array of struct T. The vice versa part means that the address of any of the T::handle members must also be a valid address of a struct T. Now, suppose that these members are of type char and that your claim is true. This would mean that struct T would be allowed to have an unaligned address, which seems rather unlikely. The standard usually tries to not tie the hands of implementations in such a way. For your claim to be true, the standard would have to require that struct T be allowed to have unaligned addresses. And it would have to be allowed for all structures, because struct T could be a forward-declared, opaque type.
Furthermore, section 9.2/17 goes on to state:
[Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment.]
Which, taken a different way, means that there is no guarantee that there will never be padding.
I am used to Borland environments and for them:
T is a struct in your case so sizeof(T) is size of struct
that depends on #pragma pack and align setting of your compiler
so sometimes it can be greater than sizeof(unsigned) !!!
for the same reason if you have 4Byte struct (uint32) and 16Byte allign
struct T { uint32 u; };
then T a[100] is not the same as uint32 a[100];
because T is uint32 + 12 Byte empty space !!!
RETRACTION: The argument is erroneous. The proof of Lemma 2 relies on a hidden premise that the alignment of an aggregate type is determined strictly by the alignments of its member types. As Dyp points out in the commentary, that premise is not supported by the standard. It is therefore admissible for struct { Foo f } to have a more strict alignment requirement that Foo.
I'll play devil's advocate here, since no one else seems to be willing. I will argue that standard C++ (I'll refer to N3797 herein) guarantees that sizeof(T) == sizeof(U) when T is a standard layout class (9/7) with default alignment having a single default-aligned non-static data member U, e.g,
struct T { // or class, or union
U u;
};
It's well-established that:
sizeof(T) >= sizeof(U)
offsetof(T, u) == 0 (9.2/19)
U must be a standard layout type for T to be a standard layout class
u has a representation consisting of exactly sizeof(U) contiguous bytes of memory (1.8/5)
Together these facts imply that the first sizeof(U) bytes of the representation of T are occupied by the representation of u. If sizeof(T) > sizeof(U), the excess bytes must then be tail padding: unused padding bytes inserted after the representation of u in T.
The argument is, in short:
The standard details the circumstances in which an implementation may add padding to a standard-layout class,
None of those cirumstances applies in this particular instance, and hence
A conforming implementation may not add padding.
Potential Sources of Padding
Under what circumstances does the standard allow an implementation to add such padding to the representation of a standard layout class? When it's necessary for alignment: per 3.11/1, "An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated." There are two mentions of adding padding, both for alignment reasons:
5.3.3 Sizeof [expr.sizeof]/2 states "When applied to a reference or a reference type, the result is the size of the referenced type. When applied
to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero (1.8). The result of applying sizeof to a base class subobject is the size of the base class type.77 When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element."
9.2 Class members [class.mem]/13 states "Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1)."
(Notably the C++ standard does not contain a blanket statement allowing implementations to insert padding in structures as in the C standards, e.g., N1570 (C11-ish) §6.7.2.1/15 "There may be unnamed padding within a structure object, but not at its beginning." and /17 "There may be unnamed padding at the end of a structure or union.")
Clearly the text of 9.2 doesn't apply to our problem, since (a) T has only one member and thus no "adjacent members", and (b) T is standard layout and hence has no virtual functions or virtual base classes (per 9/7). Demonstrating that 5.3.3/2 doesn't allow padding in our problem is more challenging.
Some Prerequisites
Lemma 1: For any type W with default alignment, alignof(W) divides sizeof(W): By 5.3.3/2, the size of an array of n elements of type W is exactly n times sizeof(W) (i.e., there is no "external" padding between array elements). The addresses of consecutive array elements are then sizeof(W) bytes apart. By the definition of alignment, it must then be that alignof(W) divides sizeof(W).
Lemma 2: The alignment alignof(W) of a default-aligned standard layout class W with only default-aligned data members is the least common multiple LCM(W) of the alignments of the data members (or 1 if there are none): Given an address at which an object of W can be allocated, the address LCM(W) bytes away must also be appropriately aligned: the difference between the addresses of member subobjects would also be LCM(W) bytes, and the alignment of each such member subobject divides LCM(W). Per the definition of alignment in 3.11/1, we have that alignof(W) divides LCM(W). Any whole number of bytes n < LCM(W) must not be divisible by the alignment of some member v of W, so an address that is only n bytes away from an address at which an object of W can be allocated is consequently not appropriately aligned for an object of W, i.e., alignof(W) >= LCM(W). Given that alignof(W) divides LCM(W) and alignof(W) >= LCM(W), we have alignof(W) == LCM(W).
Conclusion
Application of this lemma to the original problem has the immediate consequence that alignof(T) == alignof(U). So how much padding might be "required for placing objects of that type in an array"? None. Since alignof(T) == alignof(U) by the second lemma, and alignof(U) divides sizeof(U) by the first, it must be that alignof(T) divides sizeof(U) so zero bytes of padding are required to place objects of type T in an array.
Since all possible sources of padding bytes have been eliminated, an implementation may not add padding to T and we have sizeof(T) == sizeof(U) as required.
Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.