I need a safe way to alias between arbitrary POD types, conforming to ISO-C++11 explicitly considering 3.10/10 and 3.11 of n3242 or later.
There are a lot of questions about strict aliasing here, most of them regarding C and not C++. I found a "solution" for C which uses unions, probably using this section
union type that includes one of the aforementioned types among its
elements or nonstatic data members
From that I built this.
#include <iostream>
template <typename T, typename U>
T& access_as(U* p)
{
union dummy_union
{
U dummy;
T destination;
};
dummy_union* u = (dummy_union*)p;
return u->destination;
}
struct test
{
short s;
int i;
};
int main()
{
int buf[2];
static_assert(sizeof(buf) >= sizeof(double), "");
static_assert(sizeof(buf) >= sizeof(test), "");
access_as<double>(buf) = 42.1337;
std::cout << access_as<double>(buf) << '\n';
access_as<test>(buf).s = 42;
access_as<test>(buf).i = 1234;
std::cout << access_as<test>(buf).s << '\n';
std::cout << access_as<test>(buf).i << '\n';
}
My question is, just to be sure, is this program legal according to the standard?*
It doesn't give any warnings whatsoever and works fine when compiling with MinGW/GCC 4.6.2 using:
g++ -std=c++0x -Wall -Wextra -O3 -fstrict-aliasing -o alias.exe alias.cpp
* Edit: And if not, how could one modify this to be legal?
This will never be legal, no matter what kind of contortions you perform with weird casts and unions and whatnot.
The fundamental fact is this: two objects of different type may never alias in memory, with a few special exceptions (see further down).
Example
Consider the following code:
void sum(double& out, float* in, int count) {
for(int i = 0; i < count; ++i) {
out += *in++;
}
}
Let's break that out into local register variables to model actual execution more closely:
void sum(double& out, float* in, int count) {
for(int i = 0; i < count; ++i) {
register double out_val = out; // (1)
register double in_val = *in; // (2)
register double tmp = out_val + in_val;
out = tmp; // (3)
in++;
}
}
Suppose that (1), (2) and (3) represent a memory read, read and write, respectively, which can be very expensive operations in such a tight inner loop. A reasonable optimization for this loop would be the following:
void sum(double& out, float* in, int count) {
register double tmp = out; // (1)
for(int i = 0; i < count; ++i) {
register double in_val = *in; // (2)
tmp = tmp + in_val;
in++;
}
out = tmp; // (3)
}
This optimization reduces the number of memory reads needed by half and the number of memory writes to 1. This can have a huge impact on the performance of the code and is a very important optimization for all optimizing C and C++ compilers.
Now, suppose that we don't have strict aliasing. Suppose that a write to an object of any type can affect any other object. Suppose that writing to a double can affect the value of a float somewhere. This makes the above optimization suspect, because it's possible the programmer has in fact intended for out and in to alias so that the sum function's result is more complicated and is affected by the process. Sounds stupid? Even so, the compiler cannot distinguish between "stupid" and "smart" code. The compiler can only distinguish between well-formed and ill-formed code. If we allow free aliasing, then the compiler must be conservative in its optimizations and must perform the extra store (3) in each iteration of the loop.
Hopefully you can see now why no such union or cast trick can possibly be legal. You cannot circumvent fundamental concepts like this by sleight of hand.
Exceptions to strict aliasing
The C and C++ standards make special provision for aliasing any type with char, and with any "related type" which among others includes derived and base types, and members, because being able to use the address of a class member independently is so important. You can find an exhaustive list of these provisions in this answer.
Furthermore, GCC makes special provision for reading from a different member of a union than what was last written to. Note that this kind of conversion-through-union does not in fact allow you to violate aliasing. Only one member of a union is allowed to be active at any one time, so for example, even with GCC the following would be undefined behavior:
union {
double d;
float f[2];
};
f[0] = 3.0f;
f[1] = 5.0f;
sum(d, f, 2); // UB: attempt to treat two members of
// a union as simultaneously active
Workarounds
The only standard way to reinterpret the bits of one object as the bits of an object of some other type is to use an equivalent of memcpy. This makes use of the special provision for aliasing with char objects, in effect allowing you to read and modify the underlying object representation at the byte level. For example, the following is legal, and does not violate strict aliasing rules:
int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
memcpy(a, &d, sizeof(d));
This is semantically equivalent to the following code:
int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
for(size_t i = 0; i < sizeof(a); ++i)
((char*)a)[i] = ((char*)&d)[i];
GCC makes a provision for reading from an inactive union member, implicitly making it active. From the GCC documentation:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:
int f() {
union a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}
Similarly, access by taking the address, casting the resulting pointer and dereferencing the result has undefined behavior, even if the cast uses a union type, e.g.:
int f() {
double d = 3.0;
return ((union a_union *) &d)->i;
}
Placement new
(Note: I'm going by memory here as I don't have access to the standard right now).
Once you placement-new an object into a storage buffer, the lifetime of the underlying storage objects ends implicitly. This is similar to what happens when you write to a member of a union:
union {
int i;
float f;
} u;
// No member of u is active. Neither i nor f refer to an lvalue of any type.
u.i = 5;
// The member u.i is now active, and there exists an lvalue (object)
// of type int with the value 5. No float object exists.
u.f = 5.0f;
// The member u.i is no longer active,
// as its lifetime has ended with the assignment.
// The member u.f is now active, and there exists an lvalue (object)
// of type float with the value 5.0f. No int object exists.
Now, let's look at something similar with placement-new:
#define MAX_(x, y) ((x) > (y) ? (x) : (y))
// new returns suitably aligned memory
char* buffer = new char[MAX_(sizeof(int), sizeof(float))];
// Currently, only char objects exist in the buffer.
new (buffer) int(5);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the underlying storage objects.
new (buffer) float(5.0f);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the int object that previously occupied the same memory.
This kind of implicit end-of-lifetime can only occur for types with trivial constructors and destructors, for obvious reasons.
Aside from the error when sizeof(T) > sizeof(U), the problem there could be, that the union has an appropriate and possibly higher alignment than U, because of T.
If you don't instantiate this union, so that its memory block is aligned (and large enough!) and then fetch the member with destination type T, it will break silently in the worst case.
For example, an alignment error occurs, if you do the C-style cast of U*, where U requires 4 bytes alignment, to dummy_union*, where dummy_union requires alignment to 8 bytes, because alignof(T) == 8. After that, you possibly read the union member with type T aligned at 4 instead of 8 bytes.
Alias cast (alignment & size safe reinterpret_cast for PODs only):
This proposal does explicitly violate strict aliasing, but with static assertions:
///#brief Compile time checked reinterpret_cast where destAlign <= srcAlign && destSize <= srcSize
template<typename _TargetPtrType, typename _ArgType>
inline _TargetPtrType alias_cast(_ArgType* const ptr)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(ptr) % alignof(_ArgType) == 0);
typedef typename std::tr1::remove_pointer<_TargetPtrType>::type target_type;
static_assert(std::tr1::is_pointer<_TargetPtrType>::value && std::tr1::is_pod<target_type>::value, "Target type must be a pointer to POD");
static_assert(std::tr1::is_pod<_ArgType>::value, "Argument must point to POD");
static_assert(std::tr1::is_const<_ArgType>::value ? std::tr1::is_const<target_type>::value : true, "const argument must be cast to const target type");
static_assert(alignof(_ArgType) % alignof(target_type) == 0, "Target alignment must be <= source alignment");
static_assert(sizeof(_ArgType) >= sizeof(target_type), "Target size must be <= source size");
//reinterpret cast doesn't remove a const qualifier either
return reinterpret_cast<_TargetPtrType>(ptr);
}
Usage with pointer type argument ( like standard cast operators such as reinterpret_cast ):
int* x = alias_cast<int*>(any_ptr);
Another approach (circumvents alignment and aliasing issues using a temporary union):
template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
//test argument alignment at runtime in debug builds
assert(uintptr_t(&x) % alignof(ArgType) == 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");
//assure, that we don't read garbage
static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");
union dummy_union
{
ArgType x;
ReturnType r;
};
dummy_union dummy;
dummy.x = x;
return dummy.r;
}
Usage:
struct characters
{
char c[5];
};
//.....
characters chars;
chars.c[0] = 'a';
chars.c[1] = 'b';
chars.c[2] = 'c';
chars.c[3] = 'd';
chars.c[4] = '\0';
int r = alias_value<int>(chars);
The disadvantage of this is, that the union may require more memory than actually needed for the ReturnType
Wrapped memcpy (circumvents alignment and aliasing issues using memcpy):
template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(&x) % alignof(ArgType) == 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");
//assure, that we don't read garbage
static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");
ReturnType r;
memcpy(&r,&x,sizeof(ReturnType));
return r;
}
For dynamic sized arrays of any POD type:
template<typename ReturnType, typename ElementType>
ReturnType alias_value(const ElementType* const array,const size_t size)
{
//assert argument alignment at runtime in debug builds
assert(uintptr_t(array) % alignof(ElementType) == 0);
static const size_t min_element_count = (sizeof(ReturnType) / sizeof(ElementType)) + (sizeof(ReturnType) % sizeof(ElementType) != 0 ? 1 : 0);
static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
static_assert(std::tr1::is_pod<ElementType>::value, "Array elements must be of POD type");
//check for minimum element count in array
if(size < min_element_count)
throw std::invalid_argument("insufficient array size");
ReturnType r;
memcpy(&r,array,sizeof(ReturnType));
return r;
}
More efficient approaches may do explicit unaligned reads with intrinsics, like the ones from SSE, to extract primitives.
Examples:
struct sample_struct
{
char c[4];
int _aligner;
};
int test(void)
{
const sample_struct constPOD = {};
sample_struct pod = {};
const char* str = "abcd";
const int* constIntPtr = alias_cast<const int*>(&constPOD);
void* voidPtr = alias_value<void*>(pod);
int intValue = alias_value<int>(str,strlen(str));
return 0;
}
EDITS:
Assertions to assure conversion of PODs only, may be improved.
Removed superfluous template helpers, now using tr1 traits only
Static assertions for clarification and prohibition of const value (non-pointer) return type
Runtime assertions for debug builds
Added const qualifiers to some function arguments
Another type punning function using memcpy
Refactoring
Small example
I think that at the most fundamental level, this is impossible and violates strict aliasing. The only thing you've achieved is tricking the compiler into not noticing.
My question is, just to be sure, is this program legal according to the standard?
No. The alignment may be unnatural using the alias you have provided. The union you wrote just moves the point of the alias. It may appear to work, but that program may fail when CPU options, ABI, or compiler settings change.
And if not, how could one modify this to be legal?
Create natural temporary variables and treat your storage as a memory blob (moving in and out of the blob to/from temporaries), or use a union which represents all your types (remember, one active element at a time here).
Related
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:
We're playing some code golf at work. The purpose is to keep the signature of to_upper and return all arguments to upper. One of my colleague proposes this ~~ugly~~ brillant code:
#include <iostream>
#include <memory>
#include <stdexcept>
#include <string>
std::string operator+(std::string_view& a, int const& b) {
std::string res;
for (auto c : a) {
res += (c - b);
}
return (res);
}
struct Toto {
std::string data;
};
struct Result {
std::string a;
std::string b;
};
std::unique_ptr<Toto> to_upper(std::string_view input_a,
std::string_view input_b) {
auto* res = new Result;
res->a = (input_a + 32);
res->b = (input_b + 32);
auto* void_res = reinterpret_cast<void*>(res);
auto* toto_res = reinterpret_cast<Toto*>(void_res);
return std::unique_ptr<Toto>(toto_res);
}
int main() {
std::unique_ptr<Toto> unique_toto_res = to_upper("pizza", "ananas");
auto* toto_res = unique_toto_res.release();
auto* res = reinterpret_cast<Result*>(toto_res);
std::cout << res->a << std::endl;
std::cout << res->b << std::endl;
return 0;
}
Is this use of reinterpret_cast is fine in terms of portability and UB?
We think that it's ok because we just trick the compiler on types, but maybe there's something we missed.
std::string operator+(std::string_view& a, int const& b)
It might not be exactly disallowed, but defining an operator overload for a standard class in the global namespace is just asking for ODR violations. If you use any libraries and if everyone else thinks this will just be fine, then someone else may also define that overload. So, this is a bad idea.
auto* void_res = reinterpret_cast<void*>(res);
This is entirely unnecessary. You get exactly the same result by reinterpret casting directly to Toto*.
Valid (and portable)
Assuming that lower and upper case are 32 apart isn't an assumption that is portable to all character encodigs. The function also doesn't work as one might expect for characters outside the range of a...z.
Now about the main question. reinterpret_cast a pointer (or reference) to another itself never has UB. It's all about how you use the resulting pointer (or reference).
The example is a bit precarious while the unique pointer owns the reinterpreted pointer because if an exception is thrown, then it would attempt to delete it which would result in UB. But I don't think an exception can be thrown, so it should be OK. Otherwise, you just reinterpret cast back, which is explicitly well defined by the standard in the case the alignment requirement of the intermediate type isn't stricter than the original (which applies to this example).
The program does leak memory.
The only problem here is you have a memory leak. You never delete the pointer after you call release.
You are allowed to use reinterpret_cast to cast an object to an unrelated type. You are just not allowed to access that unrelated type. Going from Result* to Toto* and then back to Result* is okay, and you only access the Result object through a Result*.
When doing T* to U* and then back to T* both T and U need to be object types and U cannot have a stricter alignment then T. In this case both Result and Toto have the same alignment so you are okay. This is detailed in [expr.reinterpret.cast]/7
I've taken over some code, and came across a weird reallocation of an array. This is a function from within an Array class (used by the JsonValue)
void reserve( uint32_t newCapacity ) {
if ( newCapacity > length + additionalCapacity ) {
newCapacity = std::min( newCapacity, length + std::numeric_limits<decltype( additionalCapacity )>::max() );
JsonValue *newPtr = new JsonValue[newCapacity];
if ( length > 0 ) {
memcpy( newPtr, values, length * sizeof( JsonValue ) );
memset( values, 0, length * sizeof( JsonValue ) );
}
delete[] values;
values = newPtr;
additionalCapacity = uint16_t( newCapacity - length );
}
}
I get the point of this; it is just allocating a new array, and doing a copy of the memory contents from the old array into the new array, then zero-ing out the old array's contents. I also know this was done in order to prevent calling destructors, and moves.
The JsonValue is a class with functions, and some data which is stored in a union (string, array, number, etc.).
My concern is whether this is actually defined behaviour or not. I know it works, and has not had a problem since we began using it a few months ago; but if its undefined then it doesn't mean it is going to keep working.
EDIT:
JsonValue looks something like this:
struct JsonValue {
// …
~JsonValue() {
switch ( details.type ) {
case Type::Array:
case Type::Object:
array.destroy();
break;
case Type::String:
delete[] string.buffer;
break;
default: break;
}
}
private:
struct Details {
Key key = Key::Unknown;
Type type = Type::Null; // (0)
};
union {
Array array;
String string;
EmbedString embedString;
Number number;
Details details;
};
};
Where Array is a wrapper around an array of JsonValues, String is a char*, EmbedString is char[14], Number is a union of int, unsigned int, and double, Details contains the type of value it holds. All values have 16-bits of unused data at the beginning, which is used for Details. Example:
struct EmbedString {
uint16_t : 16;
char buffer[14] = { 0 };
};
Whether this code has well-defined behavior basically depends on two things: 1) is JsonValue trivially-copyable and, 2) if so, are a bunch of all-zero Bytes a valid object representation for a JsonValue.
If JsonValue is trivially-copyable, then the memcpy from one array of JsonValues to another will indeed be equivalent to copying all the elements over [basic.types]/3. If all-zeroes is a valid object representation for a JsonValue, then the memset should be ok (I believe this actually falls into a bit of a grey-area with the current wording of the standard, but I believe at least the intention would be that this is fine).
I'm not sure why you'd need to "prevent calling destructors and moves", but overwriting objects with zeroes does not prevent destructors from running. delete[] values will call the destructurs of the array members. And moving the elements of an array of trivially-copyable type should compile down to just copying over the bytes anyways.
Furthermore, I would suggest to get rid of these String and EmbedString classes and simply use std::string. At least, it would seem to me that the sole purpose of EmbedString is to manually perform small string optimization. Any std::string implementation worth its salt is already going to do exactly that under the hood. Note that std::string is not guaranteed (and will often not be) trivially-copyable. Thus, you cannot simply replace String and EmbedString with std::string while keeping the rest of this current implementation.
If you can use C++17, I would suggest to simply use std::variant instead of or at least inside this custom JsonValue implementation as that seems to be exactly what it's trying to do. If you need some common information stored in front of whatever the variant value may be, just have a suitable member holding that information in front of the member that holds the variant value rather than relying on every member of the union starting with the same couple of members (which would only be well-defined if all union members are standard-layout types that keep this information in their common initial sequence [class.mem]/23).
The sole purpose of Array would seem to be to serve as a vector that zeroes memory before deallocating it for security reasons. If this is the case, I would suggest to just use an std::vector with an allocator that zeros memory before deallocating instead. For example:
template <typename T>
struct ZeroingAllocator
{
using value_type = T;
T* allocate(std::size_t N)
{
return reinterpret_cast<T*>(new unsigned char[N * sizeof(T)]);
}
void deallocate(T* buffer, std::size_t N) noexcept
{
auto ptr = reinterpret_cast<volatile unsigned char*>(buffer);
std::fill(ptr, ptr + N, 0);
delete[] reinterpret_cast<unsigned char*>(buffer);
}
};
template <typename A, typename B>
bool operator ==(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return true; }
template <typename A, typename B>
bool operator !=(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return false; }
and then
using Array = std::vector<JsonValue, ZeroingAllocator<JsonValue>>;
Note: I fill the memory via volatile unsigned char* to prevent the compiler from optimizing away the zeroing. If you need to support overaligned types, you can replace the new[] and delete[] with direct calls to ::operator new and ::operator delete (doing this will prevent the compiler from optimizing away allocations). Pre C++17, you will have to allocate a sufficiently large buffer and then manually align the pointer, e.g., using std::align…
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v: