Using offsetof for template classes - c++

From the C++ standard:
A standard-layout class is a class that:
— has no non-static data members of type non-standard-layout class (or
array of such types) or reference,
— has no virtual functions (10.3) and no virtual base classes (10.1),
— has the same access control (Clause 11) for all non-static data members,
— has no non-standard-layout base classes,
— either has no non-static data members in the most derived class and
at most one base class with non-static data members, or has no base
classes with non-static data members, and
— has no base classes of the same type as the first non-static data
member
The macro offsetof(type, member-designator) accepts a restricted set
of type arguments in this International Standard. If type is not a
standard-layout class (Clause 9), the results are undefined
Considering these statements is there any safe way of using offsetof for members that depend on template parameters? If not, how may I get the offset of a member in template classes? What might be unsafe when using something like:
//MS Visual Studio 2013 definition
#define offsetof(s,m) (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
on non standard layout classes?
Following a sample where is NOT SAFE according to the standard:
#include <cstddef>
#include <iostream>
template<typename T>
struct Test
{
int a;
T b;
};
struct NonStdLayout
{
virtual void f(){};
};
int main()
{
std::cout << offsetof(Test<int>, b) << std::endl;
std::cout << offsetof(Test<NonStdLayout>, b) << std::endl;
return 0;
}

offsetof cannot be used on non-standard-layout classes simply because their layout in memory is unknown. For example, the standard does not specify how virtual member functions are implemented. One common way of doing so is to add a pointer to the vtable as the first data member of a class, but it's not the only way.
As to your definition of offsetof: there is no guarantee that a null pointer converts to a 0 via reinterpret_cast (or via a C-style cast), neither is there any semantics specified for other values of pointers cast to integers.
So if you know that your definition makes sense in the underlying addressing scheme used by your compiler for your platform, it can work. But it's an if you have to be aware of.

The answer is that it is perfectly safe to use offsetof in templates. No harm will be done thereby. However, if you choose to do so then you impose a restriction on the type of parameter for the template. It will work correctly for standard layout classes, and in principle at least the compiler should tell you when the parameter is of a type for which it won't work.
There is no way under the standard to obtain the offset for a member of a non-standard-layout class, regardless of whether any template is involved. It will probably work in individual compilers, but it may not. It is likely to work on all non-virtual classes (although this is not a requirement of the standard). Maybe you just have to experiment.
We are frequently forced to write non-standards compliant code to solve problems like this, so we test it carefully on individual compilers. It just means more hard work in research and testing.

Related

C++ Union, two active members only differing by CV-qualification

If I have a union with two data members of the same type, differing only by CV-qualification:
template<typename T>
union A
{
private:
T x_priv;
public:
const T x_publ;
public:
// Accept-all constructor
template<typename... Args>
A(Args&&... args) : x_priv(args...) {}
// Destructor
~A() { x_priv.~T(); }
};
And I have a function f that declares a union A, thus making x_priv the active member and then reads x_publ from that union:
int f()
{
A<int> a {7};
return a.x_publ;
}
In every compiler I tested there were no errors compiling nor at runtime for both int types and other, more complex, types such as std::string and std::thread.
I went to see on the standard if this was legal behavior and I started on looking at the difference of T and const T:
6.7.3.1 [basic.type.qualifier]
The cv-qualified or cv-unqualified versions of a type are distinct types; however, they shall have the same representation and alignment requirements ([basic.align]).
This means that when declaring a const T it has the exact same representation in memory as a T. But then I found that the standard actually disallows this for some types, which I found weird, as I see no reason for it.
I started my search on accessing non-active members.
It is only legal to access the common initial sequence of T and const T if both are standard-layout types.
10.4.1[class.union]
At most one of the non-static data members of an object of union type can be active at any time [...] [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem]. — end note ]
The initial sequence is basically the order of the non-static data members with a few exceptions, but since T and const T have the exact same members in the same layout, this means that the common initial sequence of T and const T is all of the members of T.
10.3.22 [class.mem]
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field. [ Example:
And here is where the restrictions come in, it restricts some types from being accessed, even though they have the exact same representation in memory:
10.1.3 [class.prop]
A class S is a standard-layout class if it:
(3.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
(3.2) has no virtual functions and no virtual base classes,
(3.3) has the same access control for all non-static data members,
(3.4) has no non-standard-layout base classes,
(3.5) has at most one base class subobject of any given type,
(3.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
(3.7) has no element of the set M(S) of types as a base class, where for any type X, M(X) is defined as follows.108 [ Note: M(X) is the set of the types of all non-base-class subobjects that may be at a zero offset in X. — end note ]
(3.7.1) If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
(3.7.2) If X is a non-union class type with a non-static data member of type X_0 that is either of zero size or is the first non-static data member of X (where said member may be an anonymous union), the set M(X) consists of X_0 and the elements of M(X_0).
(3.7.3) If X is a union type, the set M(X) is the union of all M(U_i) and the set containing all U_i, where each U_i is the type of the ith non-static data member of X.
(3.7.4) If X is an array type with element type X_e , the set M(X) consists of X e and the elements of M (X_e).
(3.7.5) If X is a non-class, non-array type, the set M(X) is empty.
My questions is is there any reason for this to not be valid behavior?.
Essentially is it that:
The standard makers forgot to account for this particular case?
I haven't read some part of the standard that allows this behavior?
There's some more specific reason for this not to be valid behavior?
A reason for this to be valid syntax is, for example, having a 'readonly' variable in a class, as such:
struct B;
struct A
{
... // Everything that struct A had before
friend B;
}
struct B
{
A member;
void f() { member.x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.member.x_priv = 100; // Invalid, x_priv is private
int x = b.member.x_publ; // Fine, x_publ is public
}
This way you don't need a getter function, which can cause performance overhead and although most compiler would optimize that away it still increases your class, and to get the variable you'd have to write int x = b.get_x().
Nor would you need a const reference to that variable (as described in this question), which while it works great, it adds size to your class, which can be bad for sufficiently big classes or classes that need to be as small as possible.
And it is weird having to write b.member.x_priv instead of b.x_priv but this would be fixable if we could have private members in anonymous unions then we could rewrite it like this:
struct B
{
union
{
private:
int x_priv;
public:
int x_publ;
friend B;
};
void f() { x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.x_priv = 100; // Invalid, x_priv is private
int x = b.x_publ; // Fine, x_publ is public
}
Another use case might be to give various names to the same data member, lie for example in a Shape, the user might want to refer to the position as either shape.pos, shape.position, shape.cur_pos or shape.shape_pos.
Although this would probably create more problems than it is worth, such a use case might be favorable when for example a name should be deprecated .
Code like this:
struct A { int i; };
struct B { int j; };
union U {
struct A a;
struct B b;
};
int main() {
union U u;
u.a.i = 1;
printf("%d\n", u.b.j);
}
is valid in C. For the sake of backward compatibility, it was considered desirable to ensure that it is also valid in C++. The special rules about common initial sequences of standard-layout structs ensure this backward compatibility. Extending the rule to allow more cases to be well-defined—ones involving non-standard-layout structs—is not necessary for C compatibility, since all structs that can be defined in the common subset of C and C++ are automatically standard-layout structs in C++.
Actually, the C++ rules are a little bit more permissive than required for C compatibility. They allow some cases involving base classes too:
struct A { int i; };
struct B { int j; };
struct C : A { };
struct D : B { };
// C and D have a common initial sequence consisting of C::i and D::j
But in general, structs in C++ can be much more complicated than their C counterparts. They can, for example, have virtual functions and virtual base classes, and those can affect their layout in an implementation-defined manner. For this reason, it's not so easy to make more cases of type punning through unions well-defined in C++. You would really have to sit down with implementers and discuss what the conditions would be such that the committee should mandate that two classes have the same layout for their common initial sequence and not leave it up to the implementation. Currently, that mandate applies only to standard-layout classes.
There are various rules in the standard that are strong enough to imply that T and const T always have the exact same layout even if T is not a standard-layout class. For this reason, it would be possible to make certain forms of type punning between a T member and a const T member of a union well-defined even if T is not standard-layout. However, adding only this very special case to the language is of dubious value and I think it's unlikely that the committee would accept such a proposal unless you have a really compelling use case. Not wanting to provide a getter that returns a const reference, simply because you don't want to write the () to call the getter each time you need access, is unlikely to convince the committee.

Does a nested enum inside POD class makes it not POD?

By definition here, POD is a simple class with no user-defined constructors, non static members, and containing simple data types only.
The question is, will these 2 classes below be equivalent as POD types (in terms of memory footprint):
class pod
{
public:
int x;
double y;
};
class pod1
{
public:
int x;
double y;
enum POD_TYPE
{
POD1 = 0,
POD = 1
};
};
In other words does adding enum to the class only affects scope resolution of enum and does not affect properties of the class itself? By observation, it seems that class is still pod, but I would like to confirm based on the standard.
Yes, that type is still POD. The definition is given by C++11 9/10:
A POD struct is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types).
Trivial means that it doesn't do any funny business when creating, destroying or copying objects. Standard-layout means that it doesn't do any funny business with the layout of data members: no polymorphism, and restrictions on what you can do with access specifiers and inheritance. These terms are fully defined in C++11 9/6 and 9/7, if you want more detail.
Nested types (such as your enumeration), static data members and non-virtual member functions (apart from constructors etc. which would make it non-trivial) will not effect any of those things, so it is still POD.
UPDATE: Since you say you're interested in historic definitions, C++03 defined:
9/4 A POD-struct is an aggregate class that has no non-static members of type non-POD-struct, non-POD-union (or array of such types), and has no user-defined copy assignment operator and no user-defined destructor"
8.5.1/1 An aggregate is an array or class with no user-declared constructors, no private or protected non-static data members, no base classes and no virtual functions.
So there were more restrictions; but nested types were still allowed. I don't have a copy of C++98, but I'm sure that would be identical to C++03.
It doesn't make any difference regarding POD status because defining a nested enum does not add data members to the class. In fact, you could also define a nested class that is not POD inside pod1 and it would still not make a difference regarding the PODness of pod1.
The enum POD_TYPE is a type and won't effect the layout, we can see this from the draft C++ standard section 9.2 Class members paragraph 1 which says:
[...]Members of a class are data members, member functions (9.3), nested types, and
enumerators. Data members and member functions are static or non-static; see 9.4. Nested types are classes (9.1, 9.7) and enumerations (7.2) [...]
as opposed to data members and we can further see that the definition of a standard layout class depends only on data member from paragraph 16 which says:
Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).
and we further see going back to section 9 Classes paragraph 10 which says(emphasis mine):
A POD struct108 is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). Similarly, a POD union is a union that is both a trivial class and a standard layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). A POD class is a class that is either a POD struct or a POD union.
As far as I can tell the pre C++11 standard does not diff much in the above items.

Unions used like Classes/Structs

I was trying to learn more about unions and their usefulness, when I was surprised that the following code is perfectly valid and works exactly as expected:
template <class T>
union Foo
{
T a;
float b;
Foo(const T& value)
: a(value)
{
}
Foo(float f)
: b(f)
{
}
void bar()
{
}
~Foo()
{
}
};
int main(int argc, char* argv[])
{
Foo<int> foo1(12.0f);
Foo<int> foo2((int) 12);
foo1.bar();
foo2.bar();
int s = sizeof(foo1); // s = 4, correct
return 0;
}
Until now, I had no idea that it is legal to declare unions with templates, constructors, destructor, and even member functions. In case it's relevant, I'm using Visual Studio 2012.
When I searched the internet to find more about using unions in this manner, I found nothing. Is this a new feature of C++, or something specific to MSVC? If not, I'd like to learn more about unions, specifically examples of them used like classes (above). If someone could point me to a more detailed explanation of unions and their usage as data structures, it'd be much appreciated.
Is this a new feature of C++, or something specific to MSVC?
No, as BoBtFish said, the 2003 C++ standard section 9.5 Unions paragraph 1 says:
[...] A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class. An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects. If a union contains a static data member, or a member of reference type, the program is ill-formed.
unions do come under section 9 Classes and the grammar for class-key is as follows:
class-key:
class
struct
union
So acts like a class but has many more restrictions. The key restriction being that unions can only have one active non-static member at a time, which is also covered in paragraph 1:
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...]
The wording in the C++11 draft standard is similar so it has not changed too much since 2003.
As for the use of a union, there are two common reasons which are covered from different angles in this previous thread C/C++: When would anyone use a union? Is it basically a remnant from the C only days? to summarize:
To implement your own Variant type, a union gives you the ability to represent all the varying types without wasting memory. This answer to the thread gives a good example.
Type punning but I would read Understanding Strict Aliasing as well since there are many cases where type punning is undefined behavior.
This answer to Unions cannot be used as Base class gives some really great insight into why unions are implemented as they are in C++.

struct alignment C/C++

In c/c++ (I am assuming they are the same in this regard), if I have the following:
struct S {
T a;
.
.
.
} s;
Is the following guaranteed to be true?
(void*)&s == (void*)&s.a;
Or in other words, is there any kind of guarantee that there will be no padding before the first member?
In C, yes, they're the same address. Simple, and straightforward.
In C++, no, they're not the same address. Base classes can (and I would suspect, do) come before all members, and virtual member functions usually add hidden data to the struct somewhere. Even more confusing, a C++ compiler may also rearrange members at will, unless the class is a standard layout type (though I don't know that any compiler does so)
Finally, if the C++ struct is composed of standard layout types, contains no base classes nor virtual functions and all members have the same visibility, and possibly other limitations I forgot, then it falls back on the C rules, and requires the first member to be at the same address as the object itself.
§ 9.2/7
A standard-layout class is a class that:
— has no non-static data members of type non-standard-layout class (or array of such types) or reference,
— has no virtual functions (10.3) and no virtual base classes (10.1),
— has the same access control (Clause 11) for all non-static data members,
— has no non-standard-layout base classes,
— either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
— has no base classes of the same type as the first non-static data member.
§ 9.2/20
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]
Yes, it is.
It is guaranteed there is no padding before the first struct member in C and in C++ (if it is a POD).
C quote:
(C11, 6.7.2.1p15) "There may be unnamed padding within a structure object, but not at its beginning."
C++ quote:
(C++11, 9.2p20) "There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment"

C++ optimise away private variable

Does ISO C++ (11) permit a private non-static class member variable to be optimised away?
This could be detected:
class X { int x; };
assert (sizeof(X) >= sizeof(int));
but I am not aware of a clause that demands the assertion above.
To clarify: (a) Is there a clause in the C++ Standard that ensure the assertion above.
(b) Can anyone think of any other way to detect the elision of x?
[offsetof?]
(c) Is the optimisation permitted anyhow, despite (a) and (b)?
I have a feeling the optimisation could be possible if the class is local to a function but not otherwise (but I'd like to have a definitive citation).
I do not think it is forbidden, but I think it is impractical.
§9 Classes [class]
7/ A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.107
8/ A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class.
... thus class X { int x; }; is a standard-layout struct.
§9.2 Class members [class.mem]
16/ Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).
... thus class X { int x; }; is layout-compatible with struct Y { int y; };.
The unfortunate thing is that layout-compatible is not formally defined in the Standard. However given the use of the word layout it seems the intent is to declare that two layout-compatible types should have the same underlying representation.
Therefore, to be able to remove the x in X one would have to prove that all structures that are layout-compatible (such as Y) are amenable to the same optimization (to keep the layout compatibility). It seems quite... improbable... in any non-trivial program.