Memory layout of globals - c++

Consider
Foo data[]={{...},{...},{...}};
Foo data_end={...};
If the end marker is defined right after the array, is it guaranteed that
&data[3]==&data_end
I do not want having to count the number of elements in data manually.
It happens to look OK in gdb without any optimization option, but before using it I need to know that the compiler cannot move data_end. If it can, how can I do instead.

Not only what you request is not guaranteed, but moreover, a C++03-compliant implementation must ensure that &data[3] != &data_end:
5.10 Equality operators[expr.eq]
1 … Two pointers of the same type compare equal if and only if they are both null, both point to the same object or function, or both point one past the end of the same array.
In C++11, it is a little more complicated:
Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address (3.9.2).
3.9.2 notes:
… If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. … —end note]
Thus, the new standard allows that a complying implementation might yield true in your comparison; still, there are no guarantees.
If you need to count the number of elements of an array, use the following stock macro:
#define countof(ARR) (sizeof (ARR) / sizeof *(ARR))
Or if you don't like macros, use the following function:
template<class T, std::size_t N>
constexpr std::size_t countof(T (&)[N])
{
return N;
}
The latter option, however, needs that your compiler support the constexpr keyword to be full functional equivalent of the former.

No. It is not a reliable assumption that the compiler places the variables in any order or any alignment. There can be gaps between the variables for alignment and I've already seen compilers that order variables alphabetically.
If you want to know the pointer to the element that is not in the array you need to know the number of array elements.
#define _dimof(a) (sizeof(a)/sizeof(a[0]))
Foo data[] = ... ;
// The type of the variable has been changed.
Foo* data_end = &data[_dimof(data)];
You may remove this paragraph since it has been added in order to fix a syntax error in the code.

No it is not reliable, in C++11, just do
for (/* const */ Foo& foo : data) {
// stuff with foo.
}

Related

Pointer-Interconvertible Types and Arrays of Unions

If I have a union.
struct None {};
template<typename T>
union Storage {
/* Ctors and Methods */
T obj;
None none;
};
pointer-interconvertible types means it is legal to perform the following conversion:
Storage<T> value(/* ctor args */);
T* obj = static_cast<T*>(static_cast<void*>(&value));
It is legal to treat an array of Storage<T> as an array of T?
Storage<T> values[20] = { /* initialisation */ };
T* objs = static_cast<T*>(static_cast<void*>(values));
for(auto i = 0; i < 20; ++i) {
objs[i].method(); // Is this pointer access legal?
}
No, it is not legal. The only thing that may be treated as an array with regards to pointer-arithmetic is an array (and the hypothetical single-element array formed by an object that is not element of an array). So the "array" relevant to the pointer arithmetic in objs[i] here is the hypothetical single-element array formed by obj of the first array element, since it is not itself element of an array. For i >= 1, objs[i] will not point to an object and so method may not be called on it.
Practically, there will be an issue in particular if T's size and the size of the union don't coincide, since even the arithmetic on the addresses will be off in this case. There is no guarantee that the two sizes coincide (even if None has sizeof and alignof equal to 1).
Aside from that issue, I doubt that compilers actually make use of this undefined behavior for optimization purposes. I can't guarantee it though.
Also note that you are only allowed to access obj through the pointer obtained by the cast if obj is the active member of the union, meaning that obj is the member which was initialized in the example.
You indicate that you intend to use this in a constant expression, in which case the compiler is required to diagnose the undefined behavior and is likely to reject such a program, regardless of the practical considerations about the optimizer.
Also, in a constant expression a cast from void* to a different object type (or a reinterpret_cast) is not allowed. So static_cast<T*>(static_cast<void*>(values)); will cause that to fail anyway. Although that is simply remedied by just taking a pointer to the union member directly (e.g. &values[0].obj). There is no reason to use the casts here.

Odd usage of special pointer values

I am using a C++ implementation of an algorithm which makes odd usage of special pointer values, and I would like to known how safe and portable is this.
First, there is some structure containing a pointer field. It initializes an array of such structures by zeroing the array with memset(). Later on, the code relies on the pointer fields initialized that way to compare equal to NULL; wouldn't that fail on a machine whose internal representation of the NULL pointer is not all-bits-zero?
Subsequently, the code sets some pointers to, and laters compares some pointers being equal to, specific pointer values, namely ((type*) 1) and ((type*) 2). Clearly, these pointers are meant to be some flags, not supposed to be dereferenced. But can I be sure that some genuine valid pointer would not compare equal to one of these? Is there any better (safe, portable) way to do that (i.e. use specific pointer values that can be taken by pointer variables only through explicit assignment, in order to flag specific situations)?
Any comment is welcome.
To sum up the comments I received, both issues raised in the question are indeed expected to work on "usual" setup, but comes with no guarantee.
Now if I want absolute guarantees, it seems my best option is, for the NULL pointers, set them either manually or with a proper constructor, and for the special pointer values, to create manually sentinel pointer values.
For the latter, in a C++ class I guess the most elegant solution is to use static members
class The_class
{
static const type reserved;
static const type* const sentinel;
};
provided that they can be initialized somewhere:
const type The_class::reserved = foo; // 'foo' is a constant expression of type 'type'
const type* const The_class::sentinel = &The_class::reserved;
If type is templated, either the above initialization must be instantiated for each type intended, or one must resort to non-static (less elegant but still usefull) "reserved" and "sentinel" members.
template <typename type>
class The_class
{
type reserved; // cannot be static anymore, nor const for complicated 'type' without adapted constructor
const type* const sentinel;
public:
The_class() : sentinel(&reserved);
};

Does the "cast to first member of standard layout" type punning rule extend to arrays?

Specifically, I am wrapping a C API in a friendly C++ wrapper. The C API has this fairly standard shape:
struct foo {...};
void get_foos(size_t* count, foo* dst);
And what I'd like to do, is save myself an extra copy by passing a typed-punned wrapper array directly to the C api with a bunch of sanity checking static_assert().
class fooWrapper {
foo raw_;
public:
[...]
};
std::vector<fooWrapper> get_foo_vector() {
size_t count = 0;
get_foos(&count, nullptr);
std::vector<fooWrapper> result(count);
// Is this OK?
static_assert(sizeof(foo) == sizeof(fooWrapper), "");
static_assert(std::is_standard_layout<fooWrapper>::value, "");
get_foos(&count, reinterpret_cast<foo*>(result.data()));
return result;
}
My understanding is that it is valid code, since all accessed memory locations individually qualify under the rule, but I'd like confirmation on that.
Edit: Obviously, as long as reinterpret_cast<char*>(result.data() + n) == reinterpret_cast<char*>(result.data()) + n*sizeof(foo) is true, it'll work under all major compilers today. But I'm wondering if the standard agrees.
First, this is not type punning. The reinterpret_cast you're doing is just an over-written way of doing &result.data().foo_. Type punning is accessing an object of one type through a pointer/reference to another type. You're accessing a subobject of the other type.
Second, this doesn't work. Pointer arithmetic is based on having an array (a single object acts as an array of 1 element for the purposes of pointer arithmetic). And vector<T> is defined by fiat to produce an array of Ts. But an array of T is not equivalent to an array of some subobject of T, even if that subobject is the same size as T and T is standard layout.
Therefore, if get_foos performs pointer arithmetic on its given array of foos, that's UB. Oh sure, it will almost certainly work. But the language's answer is UB.

Pointer interconvertibility vs having the same address

The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.

Why do c++ standard not guarantee pointer-interconvertibility between an array of objects and its first element? [duplicate]

The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.