Struct offsets and pointer safety in C++

Struct offsets and pointer safety in C++ - c++

This question is about pointers derived using pointer arithmetic with struct offsets.
Consider the following program:
#include <cstddef>
#include <iostream>
#include <new>
struct A {
float a;
double b;
int c;
};
static constexpr auto off_c = offsetof(A, c);
int main() {
A * a = new A{0.0f, 0.0, 5};
char * a_storage = reinterpret_cast<char *>(a);
int * c = reinterpret_cast<int *>(a_storage + off_c));
std::cout << *c << std::endl;
delete a;
}
This program appears to work and give expected results on compilers that I tested, using default settings and C++11 standard.
(A closely related program, where we use void * instead of char * and static_cast instead of reinterpret_cast,
is not universally accepted. gcc 5.4 issues a warning about pointer arithmetic with void pointers, and clang 6.0 says that pointer
arithmetic with void * is an error.)
Does this program have well-defined behavior according to the C++ standard?
Does the answer depend on whether the implementation has relaxed or strict pointer safety ([basic.stc.dynamic.safety])?

There are no fundamental errors in your code.
If A isn't plain old data, the above is UB (prior to C++17) and conditionally supported (after C++17).
You might want to replace char* and int* with auto*, but that is a style thing.
Note that pointers to members do this exact same thing in a type-safe manner. Most compilers implement a pointer to member ... as the offset of the member in the type. They do, however, work everywhere even on non-pod structures.
Aside: I don't see a guarantee that offsetof is constexpr in the standard. ;)
In any case, replace:
static constexpr auto off_c = offsetof(A, c);
with
static constexpr auto off_c = &A::c;
and
auto* a_storage = static_cast<char *>(a);
auto* c = reinterpret_cast<int *>(a_storage + off_c));
with
auto* c = &(a->*off_c);
to do it the C++ way.

It is safe in your specific example, but only because your struct is a standard layout, which you can double-check using std::is_standard_layout<>.
Trying to apply this to a struct such as:
struct A {
float a;
double b;
int c;
std::string str;
};
Would be illegal, even if the string is past the part of the struct that's relevant.
Edit
Heres what im concerned abt: in 3.7.4.3 [basic.stc.dynamic.safety] it says pointer is safely derived only if (conditions) and if we have strict pointer safety then a pointer is invalid if it doesnt come from such a place. In 5.7 pointer arithmetic, it says i can do usual arithmetic within an array but i dont see anything there telling me struct offset arithmetic is ok. Im trying to figure out if this isnt relevant in the way i think it is, or if struct offset arithmetic is not ok in the hypothetical "strict" impls, or if i read 5.7 wrong (n4296)
When you are doing the pointer arithmatic, you are performing it on an array of char, the size of which is at least sizeof(A), so that's fine.
Then, when you cast back into the second member, you are covered under (2.4):
— the result of a well-defined pointer conversion (4.10, 5.4) of a safely-derived pointer value;

you should examine your assumptions.
Assumption #1) offsetof gives the correct offset in bytes. This is only guaranteed if the class is considered "standard-layout", which has a number of restrictions, such as no virtual methods, avoids multiple inheritances, etc. In this case, it should be fine, but in general you can't be sure.
Assumption #2) A char is the same size as a byte. In C, this is by definition, so you are safe.
Assumption #3) The offsetof gives the correct offset from the pointer to the class, not from the beginning of the data. This is basically the same as #1, but a vtable could certainly be a problem. Again, only works with standard-layout.

Related

Using memcpy to switch active member of union in C++

I know about the memcpy/memmove to a union member, does this set the 'active' member? question , but I guess my question is different. So:
Suppose sizeof( int ) == sizeof( float ) and I have the following code snippet:
union U{
int i;
float f;
};
U u;
u.i = 1; //i is the active member of u
::std::memcpy( &u.f, &u.i, sizeof( u ) ); //copy memory content of u.i to u.f
My questions:
Does the code lead to an undefined behaviour (UB)? If yes why?
If the code does not lead to an UB, what is the active member of u after the memcpy call and why?
What would be the answer to previous two questions if sizeof( int ) != sizeof( float ) and why?

Regardless of the union, the behaviour of std::memcpy is undefined if the source and destination overlap. This is the case for every member of the union, and it would not be different if the sizes weren't the same.
If you were to use std::memmove instead, there is no longer an issue due to the overlap, and it also doesn't matter that you copy from a member of a union. Since both types are trivially copyable, the behaviour is defined and u.f becomes the active member of the union, but the union holds the same bytes as before in practice.
The only issue would arise if sizeof(U) was larger than sizeof(int), because you would be copying potentially uninitialized bytes. This is undefined behaviour.

You are not allowed to use memcpy to copy overlapping regions of memory:
If the objects overlap, the behavior is undefined.
Your code has undefined behavior because of this violation of memcpy's precondition, as u.f and u.i occupy the same address in memory.

You can use std::bit_cast to switch the active members in a union. Additionally, it's constexpr so you can even use the union in a core constant calculation.
#include <memory>
#include <bit>
union U {
int i;
float f;
constexpr void switch_to_int() { this->i = std::bit_cast<int>(f); }
constexpr void switch_to_float() { this->f = std::bit_cast<float>(i); }
};
constexpr int foo() {
U u{};
u.f = 2.0f;
u.switch_to_int();
return u.i;
}
int main()
{
constexpr int i = foo();
}
Compiler Explorer
std::bit_cast uses memcpy under the hood and compilers do a great job of optimizing the code. In this case memcpy is used to create an r-value that is then written to the new, active union element. This is what's left of the call to foo() at runtime.
mov eax,40000000h

Yes, it's undefined behaviour (UB). Because &u.f and &u.i points to the same start address. See the definition of memcpy:
void * memcpy (void *__restrict, const void *__restrict, size_t);
The C99 keyword restrict is an indication to the compiler that different object pointer types and function parameter arrays do not point to overlapping regions of memory.
This enables the compiler to perform optimizations that might otherwise be prevented because of possible aliasing.
It is your responsibility to ensure that restrict-qualified pointers do not point to overlapping regions of memory.
__restrict, permitted in C90 and C++, is a synonym for restrict.
Because it is UB. The result are undefined.
Also UB. Because &u.f and &u.i always points to the same start address, regardless equality of their lengths. sizeof(U) will get the maximum size of all the members of the union.

Is it legal to use address of one field of a union to access another field?

Consider following code:
union U
{
int a;
float b;
};
int main()
{
U u;
int *p = &u.a;
*(float *)p = 1.0f; // <-- this line
}
We all know that addresses of union fields are usually same, but I'm not sure is it well-defined behavior to do something like this.
So, question is: Is it legal and well-defined behavior to cast and dereference a pointer to union field like in the code above?
P.S. I know that it's more C than C++, but I'm trying to understand if it's legal in C++, not C.

All members of a union must reside at the same address, that is guaranteed by the standard. What you are doing is indeed well-defined behavior, but it shall be noted that you cannot read from an inactive member of a union using the same approach.
Accessing inactive union member - undefined behavior?
Note: Do not use c-style casts, prefer reinterpret_cast in this case.
As long as all you do is write to the other data-member of the union, the behavior is well-defined; but as stated this changes which is considered to be the active member of the union; meaning that you can later only read from that you just wrote to.
union U {
int a;
float b;
};
int main () {
U u;
int *p = &u.a;
reinterpret_cast<float*> (p) = 1.0f; // ok, well-defined
}
Note: There is an exception to the above rule when it comes to layout-compatible types.
The question can be rephrased into the following snippet which is semantically equivalent to a boiled down version of the "problem".
#include <type_traits>
#include <algorithm>
#include <cassert>
int main () {
using union_storage_t = std::aligned_storage<
std::max ( sizeof(int), sizeof(float)),
std::max (alignof(int), alignof(float))
>::type;
union_storage_t u;
int * p1 = reinterpret_cast< int*> (&u);
float * p2 = reinterpret_cast<float*> (p1);
float * p3 = reinterpret_cast<float*> (&u);
assert (p2 == p3); // will never fire
}
What does the Standard (n3797) say?
9.5/1 Unions [class.union]
In a union, at most one of the non-static data members can be
active at any time, that is, the value of at most one of the
non-static dat amembers ca nbe stored in a union at any time.
[...] The size of a union is sufficient to contain the largest of
its non-static data members. Each non-static data member is
allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
Note: The wording in C++11 (n3337) was underspecified, even though the intent has always been that of C++14.

Yes, it is legal. Using explicit casts, you can do almost anything.
As other comments have stated, all members in a union start at the same address / location so casting a pointer to a different member is pointless.
The assembly language will be the same. You want to make the code easy to read so I don't recommend the practice. It is confusing and there is no benefit.
Also, I recommend a "type" field so that you know when the data is in float format versus int format.

Strict aliasing rule

I'm reading notes about reinterpret_cast and it's aliasing rules ( http://en.cppreference.com/w/cpp/language/reinterpret_cast ).
I wrote that code:
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A *ptr = reinterpret_cast<A*>(buf);
ptr->t = 1;
A *ptr2 = reinterpret_cast<A*>(buf);
cout << ptr2->t;
I think these rules doesn't apply here:
T2 is the (possibly cv-qualified) dynamic type of the object
T2 and T1 are both (possibly multi-level, possibly cv-qualified at each level) pointers to the same type T3 (since C++11)
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
T2 is the (possibly cv-qualified) signed or unsigned variant of the dynamic type of the object
T2 is a (possibly cv-qualified) base class of the dynamic type of the object
T2 is char or unsigned char
In my opinion this code is incorrect. Am I right? Is code correct or not?
On the other hand what about connect function (man 2 connect) and struct sockaddr?
int connect(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
Eg. we have struct sockaddr_in and we have to cast it to struct sockaddr. Above rules also doesn't apply, so is this cast incorrect?

Yeah, it's invalid, but not because you're converting a char* to an A*: it's because you are not obtaining a A* that actually points to an A* and, as you've identified, none of the type aliasing options fit.
You'd need something like this:
#include <new>
#include <iostream>
struct A
{
int t;
};
char *buf = new char[sizeof(A)];
A* ptr = new (buf) A;
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A *ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
Now type aliasing doesn't come into it at all (though keep reading because there's more to do!).
(live demo with -Wstrict-aliasing=2)
In reality, this is not enough. We must also consider alignment. Though the above code may appear to work, to be fully safe and whatnot you will need to placement-new into a properly-aligned region of storage, rather than just a casual block of chars.
The standard library (since C++11) gives us std::aligned_storage to do this:
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
Or, if you don't need to dynamically allocate it, just:
Storage data;
Then, do your placement-new:
new (buf) A();
// or: new(&data) A();
And to use it:
auto ptr = reinterpret_cast<A*>(buf);
// or: auto ptr = reinterpret_cast<A*>(&data);
All in it looks like this:
#include <iostream>
#include <new>
#include <type_traits>
struct A
{
int t;
};
int main()
{
using Storage = std::aligned_storage<sizeof(A), alignof(A)>::type;
auto* buf = new Storage;
A* ptr = new(buf) A();
ptr->t = 1;
// Also valid, because points to an actual constructed A!
A* ptr2 = reinterpret_cast<A*>(buf);
std::cout << ptr2->t;
}
(live demo)
Even then, since C++17 this is somewhat more complicated; see the relevant cppreference pages for more information and pay attention to std::launder.
Of course, this whole thing appears contrived because you only want one A and therefore don't need array form; in fact, you'd just create a bog-standard A in the first place. But, assuming buf is actually larger in reality and you're creating an allocator or something similar, this makes some sense.

The C aliasing rules from which the rules of C++ were derived included a footnote specifying that the purpose of the rules was to say when things may alias. The authors of the Standard didn't think it necessary to forbid implementations from applying the rules in needlessly restrictive fashion in cases where things don't alias, because they thought compiler writers would honor the proverb "Don't prevent the programmer from doing what needs to be done", which the authors of the Standard viewed as part of the Spirit of C.
Situations where it would be necessary to use an lvalue of an aggregate's member type to actually alias a value of the aggregate type are rare, so it's entirely reasonable that the Standard doesn't require compilers to recognize such aliasing. Applying the rules restrictively in cases that don't involve aliasing, however, would cause something like:
union foo {int x; float y;} foo;
int *p = &foo.x;
*p = 1;
or even, for that matter,
union foo {int x; float y;} foo;
foo.x = 1;
to invoke UB since the assignment is used to access the stored values of a union foo and a float using an int, which is not one of the allowed types. Any quality compiler, however, should be able to recognize that an operation done on an lvalue which is visibly freshly derived from a union foo is an access to a union foo, and an access to a union foo is allowed to affect the stored values of its members (like the float member in this case).
The authors of the Standard probably declined to make the footnote normative because doing so would require a formal definition of when an access via freshly-derived lvalue is an access to the parent, and what kinds of access patterns constitute aliasing. While most cases would be pretty clear cut, there are some corner cases which implementations intended for low-level programming should probably interpret more pessimistically than those intended for e.g. high-end number crunching, and the authors of the Standard figured that anyone who could figure out how to handle the harder cases should be able to handle the easy ones.

C++ invalid cast from type ‘void*’ to type ‘double’

How can I cast a void pointer to a double, retaining the exact binary stored in the void pointer? I thought this could be done with reinterpret_cast<double>(voidp), but g++ doesn't let me. I know you can cast void pointers to integers, so I tried reinterpret_cast<double>(reinterpret_cast<long>(voidp)), but apparently that's also invalid. sizeof(double) and sizeof(void*) are both 8, so it can't be a size thing. Is there anything I can do to accomplish this?
EDIT: The double in this case is not pointed to by the void pointer, but /is/ the void pointer - the pointer itself contains the data I want, it does not point to the data I want.

Direct memory reinterpretation, by definition, means working with lvalues. The most straightforward approach would be to do it though a cast to reference type
double d = reinterpret_cast<double &>(voidp);
You can also do it through a pointer cast, as other answers suggested, although it "overloads" the procedure with a number of completely unnecessary operator applications. Both approaches are equivalent, since by definition reinterpret_cast to reference type reinterpret_cast<T &>(v) is equivalent to the pointer version *reinterpret_cast<T *>(&v).
However, the above approaches suffer from type-punning issues. Formally, doing this is simply illegal. You are not allowed to read void * objects as double objects in C++. Direct memory reinterpretation exists in C++ for re-interpreting objects as arrays of chars, not for arbitrary type-punning like the above. Even if we ignore the formal issue and stick to purely "practical" considerations, trying to directly reinterpret a void * value as double value might produce completely unexpected and meaningless results in a compiler that follows strict-aliasing semantics when performing optimizations.
A better idea might be to memcpy the void * object to the double object
double d;
assert(sizeof d == sizeof voidp); // <- a static assert would be even better
memcpy(&d, &voidp, sizeof d);
Alternatively, in C you are now allowed to use unions for that purpose. I'm not sure the formal permission made into C++ yet, but it will typically work in practice.

The memcpy() method should be your preferred method of type punning:
double d = 100;
void *x;
std::memcpy(&x, &d, sizeof x);
std::cout << x << '\n';
double d2;
std::memcpy(&d2, &x, sizeof d2);
std::cout << d2 << '\n';
You might think this would be slower than a cast, but in fact compilers are smart enough to recognize what's going on here and generate optimal code: http://blog.regehr.org/archives/959
In addition, this method cannot result in undefined behavior due to aliasing violations as can happen with casts or union methods.
You can write a bit_cast operator to make this more convienent and more safe:
http://pastebin.com/n4yDjBde
template <class Dest, class Source>
inline Dest bit_cast(Source const &source) {
static_assert(sizeof(Dest)==sizeof(Source), "size of destination and source objects must be equal");
static_assert(std::is_trivially_copyable<Dest>::value, "destination type must be trivially copyable.");
static_assert(std::is_trivially_copyable<Source>::value, "source type must be trivially copyable");
Dest dest;
std::memcpy(&dest, &source, sizeof(dest));
return dest;
}
Example usage:
void *p = ...;
double d = bit_cast<double>(p);
If you do type punning you ought to be aware of trap values for the involved types and your compiler's behavior with traps and unspecified values.

This is not recommended at all, but if you have to, use:
*reinterpret_cast<double*>(&voidp)

void* some_void_data = NULL;
double* as_double = static_cast<double*>(some_void_data);
double as_double_copy = *as_double;
double as_double_copy_one_line = *static_cast<double*>(some_void_data);
Obviously you'd want to apply the cast to a non-null pointer in real usage. static_cast can only be used to convert from one pointer type to another pointer type. If you want a copy of the pointer you're expected to perform the copy yourself by dereferencing the pointer returned by the cast.
This is a good resource on casting.

Accessing struct members directly

I have a testing struct definition as follows:
struct test{
int a, b, c;
bool d, e;
int f;
long g, h;
};
And somewhere I use it this way:
test* t = new test; // create the testing struct
int* ptr = (int*) t;
ptr[2] = 15; // directly manipulate the third word
cout << t->c; // look if it really affected the third integer
This works correctly on my Windows - it prints 15 as expected, but is it safe? Can I be really sure the variable is on the spot in memory I want it to be - expecially in case of such combined structs (for example f is on my compiler the fifth word, but it is a sixth variable)?
If not, is there any other way to manipulate struct members directly without actually having struct->member construct in the code?

It looks like you are asking two questions
Is it safe to treat &test as a 3 length int arrray?
It's probably best to avoid this. This may be a defined action in the C++ standard but even if it is, it's unlikely that everyone you work with will understand what you are doing here. I believe this is not supported if you read the standard because of the potential to pad structs but I am not sure.
Is there a better way to access a member without it's name?
Yes. Try using the offsetof macro/operator. This will provide the memory offset of a particular member within a structure and will allow you to correctly position a point to that member.
size_t offset = offsetof(mystruct,c);
int* pointerToC = (int*)((char*)&someTest + offset);
Another way though would be to just take the address of c directly
int* pointerToC = &(someTest->c);

No you can't be sure. The compiler is free to introduce padding between structure members.

To add to JaredPar's answer, another option in C++ only (not in plain C) is to create a pointer-to-member object:
struct test
{
int a, b, c;
bool d, e;
int f;
long g, h;
};
int main(void)
{
test t1, t2;
int test::*p; // declare p as pointing to an int member of test
p = &test::c; // p now points to 'c', but it's not associating with an object
t1->*p = 3; // sets t1.c to 3
t2->*p = 4; // sets t2.c to 4
p = &test::f;
t1->*p = 5; // sets t1.f to 5
t2->*p = 6; // sets t2.f to 6
}

You are probably looking for the offsetof macro. This will get you the byte offset of the member. You can then manipulate the member at that offset. Note though, this macro is implementation specific. Include stddef.h to get it to work.

It's probably not safe and is 100% un-readable; thus making that kind of code unacceptable in real life production code.

use set methods and boost::bind for create functor which will change this variable.

Aside from padding/alignment issues other answers have brought up, your code violates strict aliasing rules, which means it may break for optimized builds (not sure how MSVC does this, but GCC -O3 will break on this type of behavior). Essentially, because test *t and int *ptr are of different types, the compiler may assume they point to different parts of memory, and it may reorder operations.
Consider this minor modification:
test* t = new test;
int* ptr = (int*) t;
t->c = 13;
ptr[2] = 15;
cout << t->c;
The output at the end could be either 13 or 15, depending on the order of operations the compiler uses.

According to paragraph 9.2.17 of the standard, it is in fact legal to cast a pointer-to-struct to a pointer to its first member, providing the struct is POD:
A pointer to a POD-struct object,
suitably converted using a
reinterpret_cast, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [Note:
There might therefore be unnamed
padding within a POD-struct object,
but not at its beginning, as necessary
to achieve appropriate alignment. ]
However, the standard makes no guarantees about the layout of structs -- even POD structs -- other than that the address of a later member will be greater than the address of an earlier member, provided there are no access specifiers (private:, protected: or public:) between them. So, treating the initial part of your struct test as an array of 3 integers is technically undefined behaviour.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Struct offsets and pointer safety in C++ - c++

Related

Using memcpy to switch active member of union in C++

Is it legal to use address of one field of a union to access another field?

Strict aliasing rule

C++ invalid cast from type ‘void*’ to type ‘double’

Accessing struct members directly

Categories

Resources