Is it legal to index into a struct? - c++

Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.

It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.

No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.

In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};

For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;

In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]

In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.

This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.

Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif

If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.

It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:

Related

C++ use address of member as an array [duplicate]

Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:

Is this a violation of strict aliasing rules? [duplicate]

Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:

How references in C++ are implemented with pointers

I've heared that C++ reference variables are implemented using pointers. I've been looking at some C code a while before i started with C++, I would because of that like to have something which i can relate to in the C language. Does anyone actually have an example on how it is implemented using pointers?
I've been doing some research on the question, this is what i've found. However, i'm not sure if it is legit or not as i've not been able to run this properly in a C++ file before. Any answers will be appreciated, thanks! :)
int& = *(int const *)
A reference has the same meaning as a constant pointer.
The code for references and pointers is the same, at least with MSVC2013.
For example:
int u=1,v=3, w;
int& x = u; // initialize reference
w = x; // x is synonym for u;
x++;
int *y = &u; // initalize pointer
w = *y; // y points to u
(*y)++;
The initialisation generates in both case:
lea eax, DWORD PTR _u$[ebp]
mov DWORD PTR _x$[ebp], eax ; of course in the case of the pointe _y$ instead of _x$.
The assignment, in both case generates:
mov eax, DWORD PTR _x$[ebp]
mov ecx, DWORD PTR [eax]
mov DWORD PTR _w$[ebp], ecx
And even the increment generates the same code.
The code generated is the same for whether it is a pointer, a const pointer, or a poitner to const (in the latter case, the ++ does not work). It's the compiler buisness to make sure that const is const.
Now there is a subtle difference in syntax:
int& x = u; // initialize reference
int *y = &u; // initalize pointer
int const *z = &u; // initialize pointer to "const int". Pointer itself is not const !
int * const a = &u; // initialize "const pointer" to "int", which is basically a reference.
With these definitions:
//(*z)++; // invalid because the it points to a const int.
z = &w; // valid because pointer intself is not const
and:
(*a)++; // valid because the it points to an int.
// a = &w; // invalid because it's a const pointer
So you could say int& is pretty much equivalent to int * const (I don't use '=' here, because it is reserved to the operator= and looks like a syntax error !)
Of course, with the reference you always use its name as if it was a real variable (ex: x++) while with the const pointer you always have to dereference (aka: (*x)++). So same meaning, but different syntax.
This is the best proof I can think of:
#include <iostream>
using namespace std;
double value;
struct pointer {
double *value; };
struct reference {
double& value; };
int main() {
pointer ptr { &value };
reference ref { value };
*ptr.value = 3.14;
cout << sizeof(value) << endl;
cout << sizeof(pointer) << endl;
cout << sizeof(reference) << endl;
cout << ref.value;
}
Output:
8
4
4
3.14
Can you see the sizes? sizeof(pointer) == sizeof(reference) but not sizeof(value).
Regarding that added/edited example in the question int& = *(int const *):
ref.value = *ptr.value; ref.value = *(double*)ref.value; //no const
ref.value = *(double const *)ptr.value; // yes, we can add const before *
int i = 1; const j = 2;
const int& cri = *(const int const *)&i; // same as *&i and simple i, notice two consts
const int& crj = j;
int& rj = j; // error: const cannot be removed that way
In other words: References are constant pointers (cannot be changed) with a different set of operators. The main advantage is that compilers can optimize them better (discard the reference/pointer completely), but that does not proof that they are any different because compilers can optimize/discard pointers as well (and gcc -O3 can do that well, I have seen and dissassembled good loop with pointers, where gcc could overcome so called aliasing problem).
A compiler can implement references any way that works, including that some references are completely optimized away.
You can think of the reference
T& r = o;
as
T* const pr = &o;
#define r (*pr)
which serves as a nice explanation of most properties of lvalue references, except that this conceptual picture does not include lifetime extension of temporaries and initialization with a temporary.
It “explains” that
a reference must be initialized in the declaration, except when it’s a formal argument,
a reference can’t be reassigned,
a reference can’t be a “null-reference” in valid code.
But keep in mind that it’s only a conceptual picture: while it’s very likely to be the underlying reality for much of the generated code, it is by no means a requirement that a compiler implements a reference this way.
You might find the section about references in the C++ FAQ instructive.
In C, when you use a variable as a parameter to a function, it is sent by value. If the function is supposed to change the variable, then the parameter must be sent as a pointer.
int x = 5;
void addtwo(int* param)
{
*param += 2;
}
addtwo(&x); // This will send a pointer to x to the function addtwo
In C++, you can send a reference:
void addtwo(int& param)
{
param += 2;
}
addtwo(x); // This will send a reference to x to the function addtwo
This doesn't look like much of a difference in this simple example, but in the second one, it looks like param is not being dereferenced. But it actually is, since it was sent as a reference. Even though param does not look like a pointer (and it isn't - it's a reference) the function addtwo is sent the address of x, very much as if it were a pointer. The difference is subtle, but concrete.

strict aliasing and alignment

I need a safe way to alias between arbitrary POD types, conforming to ISO-C++11 explicitly considering 3.10/10 and 3.11 of n3242 or later.
There are a lot of questions about strict aliasing here, most of them regarding C and not C++. I found a "solution" for C which uses unions, probably using this section
union type that includes one of the aforementioned types among its
elements or nonstatic data members
From that I built this.
#include <iostream>
template <typename T, typename U>
T& access_as(U* p)
{
union dummy_union
{
U dummy;
T destination;
};
dummy_union* u = (dummy_union*)p;
return u->destination;
}
struct test
{
short s;
int i;
};
int main()
{
int buf[2];
static_assert(sizeof(buf) >= sizeof(double), "");
static_assert(sizeof(buf) >= sizeof(test), "");
access_as<double>(buf) = 42.1337;
std::cout << access_as<double>(buf) << '\n';
access_as<test>(buf).s = 42;
access_as<test>(buf).i = 1234;
std::cout << access_as<test>(buf).s << '\n';
std::cout << access_as<test>(buf).i << '\n';
}
My question is, just to be sure, is this program legal according to the standard?*
It doesn't give any warnings whatsoever and works fine when compiling with MinGW/GCC 4.6.2 using:
g++ -std=c++0x -Wall -Wextra -O3 -fstrict-aliasing -o alias.exe alias.cpp
* Edit: And if not, how could one modify this to be legal?
This will never be legal, no matter what kind of contortions you perform with weird casts and unions and whatnot.
The fundamental fact is this: two objects of different type may never alias in memory, with a few special exceptions (see further down).
Example
Consider the following code:
void sum(double& out, float* in, int count) {
for(int i = 0; i < count; ++i) {
out += *in++;
}
}
Let's break that out into local register variables to model actual execution more closely:
void sum(double& out, float* in, int count) {
for(int i = 0; i < count; ++i) {
register double out_val = out; // (1)
register double in_val = *in; // (2)
register double tmp = out_val + in_val;
out = tmp; // (3)
in++;
}
}
Suppose that (1), (2) and (3) represent a memory read, read and write, respectively, which can be very expensive operations in such a tight inner loop. A reasonable optimization for this loop would be the following:
void sum(double& out, float* in, int count) {
register double tmp = out; // (1)
for(int i = 0; i < count; ++i) {
register double in_val = *in; // (2)
tmp = tmp + in_val;
in++;
}
out = tmp; // (3)
}
This optimization reduces the number of memory reads needed by half and the number of memory writes to 1. This can have a huge impact on the performance of the code and is a very important optimization for all optimizing C and C++ compilers.
Now, suppose that we don't have strict aliasing. Suppose that a write to an object of any type can affect any other object. Suppose that writing to a double can affect the value of a float somewhere. This makes the above optimization suspect, because it's possible the programmer has in fact intended for out and in to alias so that the sum function's result is more complicated and is affected by the process. Sounds stupid? Even so, the compiler cannot distinguish between "stupid" and "smart" code. The compiler can only distinguish between well-formed and ill-formed code. If we allow free aliasing, then the compiler must be conservative in its optimizations and must perform the extra store (3) in each iteration of the loop.
Hopefully you can see now why no such union or cast trick can possibly be legal. You cannot circumvent fundamental concepts like this by sleight of hand.
Exceptions to strict aliasing
The C and C++ standards make special provision for aliasing any type with char, and with any "related type" which among others includes derived and base types, and members, because being able to use the address of a class member independently is so important. You can find an exhaustive list of these provisions in this answer.
Furthermore, GCC makes special provision for reading from a different member of a union than what was last written to. Note that this kind of conversion-through-union does not in fact allow you to violate aliasing. Only one member of a union is allowed to be active at any one time, so for example, even with GCC the following would be undefined behavior:
union {
double d;
float f[2];
};
f[0] = 3.0f;
f[1] = 5.0f;
sum(d, f, 2); // UB: attempt to treat two members of
// a union as simultaneously active
Workarounds
The only standard way to reinterpret the bits of one object as the bits of an object of some other type is to use an equivalent of memcpy. This makes use of the special provision for aliasing with char objects, in effect allowing you to read and modify the underlying object representation at the byte level. For example, the following is legal, and does not violate strict aliasing rules:
int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
memcpy(a, &d, sizeof(d));
This is semantically equivalent to the following code:
int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
for(size_t i = 0; i < sizeof(a); ++i)
((char*)a)[i] = ((char*)&d)[i];
GCC makes a provision for reading from an inactive union member, implicitly making it active. From the GCC documentation:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:
int f() {
union a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}
Similarly, access by taking the address, casting the resulting pointer and dereferencing the result has undefined behavior, even if the cast uses a union type, e.g.:
int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}
Placement new
(Note: I'm going by memory here as I don't have access to the standard right now).
Once you placement-new an object into a storage buffer, the lifetime of the underlying storage objects ends implicitly. This is similar to what happens when you write to a member of a union:
union {
int i;
float f;
} u;
// No member of u is active. Neither i nor f refer to an lvalue of any type.
u.i = 5;
// The member u.i is now active, and there exists an lvalue (object)
// of type int with the value 5. No float object exists.
u.f = 5.0f;
// The member u.i is no longer active,
// as its lifetime has ended with the assignment.
// The member u.f is now active, and there exists an lvalue (object)
// of type float with the value 5.0f. No int object exists.
Now, let's look at something similar with placement-new:
#define MAX_(x, y) ((x) > (y) ? (x) : (y))
// new returns suitably aligned memory
char* buffer = new char[MAX_(sizeof(int), sizeof(float))];
// Currently, only char objects exist in the buffer.
new (buffer) int(5);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the underlying storage objects.
new (buffer) float(5.0f);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the int object that previously occupied the same memory.
This kind of implicit end-of-lifetime can only occur for types with trivial constructors and destructors, for obvious reasons.
Aside from the error when sizeof(T) > sizeof(U), the problem there could be, that the union has an appropriate and possibly higher alignment than U, because of T.
If you don't instantiate this union, so that its memory block is aligned (and large enough!) and then fetch the member with destination type T, it will break silently in the worst case.
For example, an alignment error occurs, if you do the C-style cast of U*, where U requires 4 bytes alignment, to dummy_union*, where dummy_union requires alignment to 8 bytes, because alignof(T) == 8. After that, you possibly read the union member with type T aligned at 4 instead of 8 bytes.
Alias cast (alignment & size safe reinterpret_cast for PODs only):
This proposal does explicitly violate strict aliasing, but with static assertions:
///#brief Compile time checked reinterpret_cast where destAlign <= srcAlign && destSize <= srcSize
template<typename _TargetPtrType, typename _ArgType>
inline _TargetPtrType alias_cast(_ArgType* const ptr)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(ptr) % alignof(_ArgType) == 0);
typedef typename std::tr1::remove_pointer<_TargetPtrType>::type target_type;
static_assert(std::tr1::is_pointer<_TargetPtrType>::value && std::tr1::is_pod<target_type>::value, "Target type must be a pointer to POD");
static_assert(std::tr1::is_pod<_ArgType>::value, "Argument must point to POD");
static_assert(std::tr1::is_const<_ArgType>::value ? std::tr1::is_const<target_type>::value : true, "const argument must be cast to const target type");
static_assert(alignof(_ArgType) % alignof(target_type) == 0, "Target alignment must be <= source alignment");
static_assert(sizeof(_ArgType) >= sizeof(target_type), "Target size must be <= source size");
//reinterpret cast doesn't remove a const qualifier either
return reinterpret_cast<_TargetPtrType>(ptr);
}
Usage with pointer type argument ( like standard cast operators such as reinterpret_cast ):
int* x = alias_cast<int*>(any_ptr);
Another approach (circumvents alignment and aliasing issues using a temporary union):
template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
//test argument alignment at runtime in debug builds
assert(uintptr_t(&x) % alignof(ArgType) == 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");
//assure, that we don't read garbage
static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");
union dummy_union
{
ArgType x;
ReturnType r;
};
dummy_union dummy;
dummy.x = x;
return dummy.r;
}
Usage:
struct characters
{
char c[5];
};
//.....
characters chars;
chars.c[0] = 'a';
chars.c[1] = 'b';
chars.c[2] = 'c';
chars.c[3] = 'd';
chars.c[4] = '\0';
int r = alias_value<int>(chars);
The disadvantage of this is, that the union may require more memory than actually needed for the ReturnType
Wrapped memcpy (circumvents alignment and aliasing issues using memcpy):
template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(&x) % alignof(ArgType) == 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");
//assure, that we don't read garbage
static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");
ReturnType r;
memcpy(&r,&x,sizeof(ReturnType));
return r;
}
For dynamic sized arrays of any POD type:
template<typename ReturnType, typename ElementType>
ReturnType alias_value(const ElementType* const array,const size_t size)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(array) % alignof(ElementType) == 0);
static const size_t min_element_count = (sizeof(ReturnType) / sizeof(ElementType)) + (sizeof(ReturnType) % sizeof(ElementType) != 0 ? 1 : 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ElementType>::value, "Array elements must be of POD type");
//check for minimum element count in array
if(size < min_element_count)
throw std::invalid_argument("insufficient array size");
ReturnType r;
memcpy(&r,array,sizeof(ReturnType));
return r;
}
More efficient approaches may do explicit unaligned reads with intrinsics, like the ones from SSE, to extract primitives.
Examples:
struct sample_struct
{
char c[4];
int _aligner;
};
int test(void)
{
const sample_struct constPOD = {};
sample_struct pod = {};
const char* str = "abcd";
const int* constIntPtr = alias_cast<const int*>(&constPOD);
void* voidPtr = alias_value<void*>(pod);
int intValue = alias_value<int>(str,strlen(str));
return 0;
}
EDITS:
Assertions to assure conversion of PODs only, may be improved.
Removed superfluous template helpers, now using tr1 traits only
Static assertions for clarification and prohibition of const value (non-pointer) return type
Runtime assertions for debug builds
Added const qualifiers to some function arguments
Another type punning function using memcpy
Refactoring
Small example
I think that at the most fundamental level, this is impossible and violates strict aliasing. The only thing you've achieved is tricking the compiler into not noticing.
My question is, just to be sure, is this program legal according to the standard?
No. The alignment may be unnatural using the alias you have provided. The union you wrote just moves the point of the alias. It may appear to work, but that program may fail when CPU options, ABI, or compiler settings change.
And if not, how could one modify this to be legal?
Create natural temporary variables and treat your storage as a memory blob (moving in and out of the blob to/from temporaries), or use a union which represents all your types (remember, one active element at a time here).

Why strange behavior with casting back pointer to the original class?

Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++