Initializing mutually-referencing objects - c++

Consider the following pair of mutually referencing types:
struct A;
struct B { A& a; };
struct A { B& b; };
This can be initialized with aggregate initialization in GCC, Clang, Intel, MSVC, but not SunPro which insists that user-defined ctors are required.
struct {A first; B second;} pair = {pair.second, pair.first};
Is this initialization legal?
slightly more elaborate demo: http://ideone.com/P4XFw
Now, heeding Sun's warning, what about classes with user-defined constructors? The following works in GCC, clang, Intel, SunPro, and MSVC, but is it legal?
struct A;
struct B { A& ref; B(A& a) : ref(a) {} };
struct A { B& ref; A(B& b) : ref(b) {} };
struct {B first; A second;} pair = {pair.second, pair.first};
demo: http://ideone.com/QQEpA
And finally, what if the container is not trivial either, e.g. (works in G++, Intel, Clang (with warnings), but not MSVC ("pair" unknown in initializer) or SunPro ("pair is not a structure")
std::pair<A, B> pair(pair.second, pair.first);
From what I can see, §3.8[basic.life]/6 forbids access to a non-static data member before lifetime begins, but is lvalue evaluation of pair.second "access" to second? If it is, then are all three initializations illegal? Also, §8.3.2[dcl.ref]/5 says "reference shall be initialized to refer to a valid object" which probably makes all three illegal as well, but perhaps I'm missing something and the compilers accept this for a reason.
PS: I realize these classes are not practical in any way, hence the language-lawyer tag. Related and marginally more practical old discussion here: Circular reference in C++ without pointers

This one was warping my mind at first but I think I got it now. As per 12.6.2.5 of 1998 Standard, C++ guarantees that data members are initialized in the order they are declared in the class, and that the constructor body is executed after all members have been initialized. This means that the expression
struct A;
struct B { A& a; };
struct A { B& b; };
struct {A first; B second;} pair = {pair.second, pair.first};
makes sense since pair is an auto (local, stack) variable, so its relative address and address of members are known to the compiler, AND there are no constructors for first and second.
Why the two conditions mean the code above makes sense: when first, of type A, is constructed (before any other data member of pair), first's data member b is set to reference pair.second, the address of which is known to the compiler because it is a stack variable (space already exists for it in the program, AFAIU). Note that pair.second as an object, ie memory segment, has not been initialized (contains garbage), but that doesn't change the fact that the address of that garbage is known at compile time and can be used to set references. Since A has no constructor, it can't attempt to do anything with b, so behavior is well defined. Once first has been initialized, it is the turn of second, and same: its data member a references pair.first, which is of type A, and pair.first address is known by compiler.
If the addresses were not known by compiler (say because using heap memory via new operator), there should be compile error, or if not, undefined behavior. Though judicious use of the placement new operator might allow it to work, since then again the addresses of both first and second could be known by the time first is initialized.
Now for the variation:
struct A;
struct B { A& ref; B(A& a) : ref(a) {} };
struct A { B& ref; A(B& b) : ref(b) {} };
struct {B first; A second;} pair = {pair.second, pair.first};
The only difference from first code example is that B constructor is explicitly defined, but the assembly code is surely identical as there is no code in the constructor bodies. So if first code sample works, the second should too.
HOWEVER, if there is code in the constructor body of B, which is getting a reference to something (pair.second) that hasn't been initialized yet (but for which address is defined and known), and that code uses a, well clearly you're looking for trouble. If you're lucky you'll get a crash, but writing to a will probably fail silently as the values get later overwritten when A constructor is eventually called. of

From compiler point of view references are nothing else but const pointers. Rewrite your example with pointers and it becomes clear how and why it works:
struct A;
struct B { A* a; };
struct A { B* b; };
struct {A first; B second;} pair = {&(pair.second), &(pair.first)}; //parentheses for clarity
As Schollii wrote: memory is allocated beforehand, thus addressable. There is no access nor evaluation because of references/pointers. That's merely taking addresses of "second" and "first", simple pointer arithmetics.
I could rant about how using references in any place other than operator is language abuse, but I think this example highlights the issue well enough :)
(From now on I write all the ctors manually. Your compiler may or may not do this automagically for you.)
Try using new:
struct A;
struct B { A& a; B(A& arg):a(arg){;} };
struct A { B& b; A(B& arg):b(arg){;} };
typedef struct PAIR{A first; B second; PAIR(B& argB, A& argA):first(argB),second(argA){;}} *PPAIR, *const CPPAIR;
PPAIR pPair = NULL;// just to clean garbage or 0xCDCD
pPair = new PAIR(pPair->second, pPair->first);
Now it depends on order of execution. If assignment is made last (after ctor) the second.p will point to 0x0000 and first.ref to e.g. 0x0004.
Actually, http://codepad.org/yp911ug6 here it's the ctors which are run last (makes most sense!), therefore everything works (even though it appears it shouldn't).
Can't speak about templates, though.
But your question was "Is that legal?". No law forbids it.
Will it work? Well, I don't trust compiler makers enough to make any statements about that.

Related

Mutual referencing among objects in C++

I want to create two objects with mutual member-references between them. Later it can be extended to e.g. closed loop of N referencing objects, where N is known in compile time.
The initial attempt was with the simplest struct A lacking any constructors, which make it an aggregate (v simulates some payload):
struct A {
const A & a;
int v = 0;
};
struct B {
A a1, a2;
};
consteval bool f()
{
B z{ z.a2, z.a1 };
return &z.a1 == &z.a2.a;
}
static_assert( f() );
Unfortunately it is not accepted by the compilers due to the error:
accessing uninitialized member 'B::a2'
which is actually strange, because no real read access is done, only remembering of its address. Demo: https://gcc.godbolt.org/z/cGzYx1Pea
The problem is solved after adding constructors in A, making it not-aggregate any more:
struct A {
constexpr A(const A & a_) : a(a_) {}
constexpr A(const A & a_, int v_) : a(a_), v(v_) {}
const A & a;
int v = 0;
};
Now all compilers accept the program, demo: https://gcc.godbolt.org/z/bs17xfxEs
It is surprising that seemingly equivalent modification of the program makes it valid. Is it really some wording in the standard preventing the usage of aggregates in this case? And what exactly makes the second version safe and accepted?
B z{ z.a2, z.a1 }; attempts to copy-construct a1 and a2, rather than aggregate-initialize them with z.a2, z.a1 as first fields.1
B z{{z.a2, 0}, {z.a1, 0}}; works in GCC and Clang. MSVC gives error C2078: too many initializers, which looks like a bug.
1 Here, direct-list-initialization is performed for z, which in this case resolves to aggregate initialization, which in turn performs copy-initialization for each member, and:
[dcl.init.general]/15.6.2
... if it is copy-initialization where the cv-unqualified version of the source type is the same class as, or a derived class of, the class of the destination, constructors are considered.
So, because initializers z.a2, z.a1 have the same type as the corresponding members, the aggregate-ness of the members is ingored, and copy constructors are used.

C++ - Refactor a long parameter list with references into a struct

I like to have classes which have a valid state simply after calling their constructor - i.e. all required dependencies are passed into the constructor.
I also like required dependencies to be passed as references, because then nullptr is simply forbidden at compile-time as a value for these arguments.
Example:
class B;
class A
{
public:
A(B& b) : b(b) {}
private:
B& b;
}
After instantiating A, you are (almost) guaranteed that the instance is in a valid state. I find that code style to be very safe from programming mistakes.
My question relates to refactoring such classes when they have lots of dependencies.
Example:
// Includes for B, C, D, E, F...
class A
{
public:
A(B b, C c, D d, E e, F f) : b(b), c(c), d(d), e(e), f(f) {}
private:
B b;
C c;
D d;
E e;
F f;
}
Usually, I put long lists of parameters in structs, like this:
struct Deps
{
B b;
C c;
D d;
E e;
F f;
}
class A
{
public:
A(Deps deps) : b(deps.b), c(deps.c), d(deps.d), e(deps.e), f(deps.f) {}
private:
B b;
C c;
D d;
E e;
F f;
}
That way, it makes the call sites more explicit and less error prone as well: since all parameters must be named, you are not at risk of mistakenly switching two of them by having them in a wrong order.
Sadly, that technique works badly with references. Having references in the Deps struct forwards the problem to that struct: then, the Deps struct needs to have a constructor which initializes the references, and then that constructor will have a long parameter list, essentially solving nothing.
Now for the question: is there a way to refactor long parameter lists in constructors containing references, such that no function results in having a long parameter list, all parameters are always valid, and no instance of the class is ever in an invalid state (i.e. with some dependencies not initialized or null)?
You can't have a cake and eat it, too. Well, unless you'd use magic (also known as more powerful types).
The key idea of having the constructor take all necessary dependencies is to make sure they are all provided because the construction happens, and enforcing this statically. If you move this burden to a structure, this structure should only be passed to a constructor if all fields have been filled. If you have unwrapped references, it's obviously impossible to have this structure be only partially filled, and you can't really prove to the compiler that you'll provide the required parameters later.
You can do a run-time check, of course, but that's not what we're after. Ideally, we'd be able to encode which parameters have been initialized in the type itself. This is quite hard to implement in a generic way and only slightly easier if you make some concessions and hand-write it for specific types.
Consider a simplified example in which the types don't repeat in a signature (e.g. the signature of the constructor is ctor(int, bool, string)). We can then use std::tuple to represnt partially filled argument list like so:
auto start = tuple<>();
auto withIntArg = push(42, start);
auto withStringArg = push("xyz"s, withIntArg);
auto withBoolArg = push(true, withStringArg);
I've used auto, but if you think about types of those variables, you'll realize that it will reach the desired tuple<int, string, bool> only after all of those have been executed (albeit in random order). Then you can write the class constructor as a template accepting only tuples that indeed have all required types, write the push function and voila!
Of course, this is a lot of boilerplate and a potential for very nasty errors, unless you take a lot of care writing the above. Any other solution you'd like to do would need to effectively do the same thing; modifying the type of the partially filled argument list until it fits the desired set.
Is it worth it? Well, you decide for yourself.
Actually, there is a pretty elegant/simple solution using std::tuple:
#include <tuple>
struct A{};
struct B{};
struct C{};
struct D{};
struct E{};
struct F{};
class Bar
{
public:
template<class TTuple>
Bar(TTuple refs)
: a(std::get<A&>(refs))
, b(std::get<B&>(refs))
, c(std::get<C&>(refs))
, d(std::get<D&>(refs))
, e(std::get<E&>(refs))
, f(std::get<F&>(refs))
{
}
private:
A& a;
B& b;
C& c;
D& d;
E& e;
F& f;
};
void test()
{
A a; B b; C c; D d; E e; F f;
// Different ways to incrementally build the reference holder:
auto tac = std::tie(a, c); // This is a std::tuple<A&, C&>.
auto tabc = std::tuple_cat(tac, std::tie(b));
auto tabcdef = std::tuple_cat(tabc, std::tie(d, f), std::tie(e));
// We have everything, let's build the object:
Bar bar(tabcdef);
}
https://godbolt.org/z/pG1R7U
std::tie exists precisely to create a tuple of references. We can combine reference tuples using std::tuple_cat. And std::get<T> allows retrieving exactly the reference we need from a given tuple.
This has:
Minimum boilerplate: You only have to write std::get<X&> in the member initializer list for each referenced type. Nothing else needs to be provided/repeated to use this for more referenced or reference-containing types.
Complete compile-time safety: If you forget to provide a reference or provide it twice, the compiler will complain. The type system encodes all the necessary information.
No constraints on the order in which references are added.
No hand-written template machinery. Using standard facilities instead of hand-written template machinery means you don't introduce bugs/forget corner-cases. It also means users/readers of this approach have nothing they need to wade through (and might run away from, screaming).
I think this is a really simple solution, if only because std::tuple and friends already implement all the meta-programming needed here. It's still slightly more complex than "everything in one long list", but I'm pretty sure it'll be worth the tradeoff.
(My previous hand-written template version exists in edit history. But I realized that std::tuple does everything we need here already.)
If you are fine with a run-time check of completeness, I would recommend storing pointers in Deps and checking in the constructor of A that all the pointers are non-null. That allows you to build Deps incrementally and be exactly as safe as before. To perform the non-null check before dereferencing the pointers you might need some ugliness (like the comma operator). You might as well store pointers instead of references for the A members because (if the constructor checks for null-ness) it is exactly as safe but allows e.g. assignment operators. And makes the null-check easier:
struct Deps
{
B* b;
C* c;
D* d;
E* e;
F* f;
};
template<class ... Ts>
bool allNonNull(Ts* ... ts)
{
return ((ts != nullptr) && ...);
}
class A
{
public:
A(Deps deps) : b(deps.b), c(deps.c), d(deps.d), e(deps.e), f(deps.f)
{
assert(allNonNull(b, c, d, e, f));
if (!allNonNull(b, c, d, e, f))
/*whatever error handling you want*/;
}
private:
B* b;
C* c;
D* d;
E* e;
F* f;
};
The disadvantages are of course that there is no more compile-time check and that there is a lot of code duplication. One might also forget to update the parameters of the null check function.

Make only const copies of a const object

I have a class which contains references, like:
class A {
A(B &b) : b(b) {} // constructor
B &b;
}
Sometimes b must be read-only, sometimes it is writeable. When I make a const A a(b); object, it's obvious that I want to protect the data inside it as const.
But - by accident - it's easy to make a non-const copy of the object which will make the data inside it vulnerable.
const A a(b); // b object protected here
A a_non_const(a);
a_non_const.b.non_const_function(...); // b not protected now
I think that I should somehow prevent copies of the object when it is const like this:
const A a(b);
const A a2(a); // OK!
A a_non_const(a); // Compiler error
Is this possible at all?
flaw in your code: your data isn't "protected" even with const
The const type qualifier manages access to the member functions of a type as well as the access to its members. Since your member B & b is a reference, const doesn't do much for you here: A reference cannot be changed after initialization either way. How you access the target of that reference isn't even considered:
const A a(b);
a.b.non_const_function(); // OOPS, no problem!
solution with templates
Instead of (ab)using the const type qualifier you could add a "flag" to your type, to differentiate between cases where you need to be able to have non-const access and case where you don't:
#include <type_traits>
struct B {
void danger() {
}
void all_fine() const {
}
};
template<bool Writeable>
struct A {
using BRef = typename std::conditional<Writeable, B &, B const &>::type;
BRef b;
A (BRef b) : b(b) {};
};
using ConstA = A<false>;
using NonConstA = A<true>;
int main() {
B b;
ConstA a(b);
//NonConstA nc_a(a);
ConstA another_a(a);
//another_a.b.danger();
another_a.b.all_fine();
NonConstA a2(b);
a2.b.danger();
}
With some std::enable_if you can then selectively enable / disable member functions of A depending on whether they need "writeable" b or not.
real solution: refactor your design
BUT I'd like to highlight this comment even more:
"Sometimes b must be read-only, sometimes it is writeable." All your problems stem from this weird duality. I suggest picking one set of semantics for your class, not two
From Lightness Races in Orbit
You should probably instead consider splitting your class such that you have a CommonA with functionality used by both a WriteableA and a NonWriteableA (the names are terrible, but I hope you understand what I mean).
You can do it for the heap:
static const A *constCopy(const A &a); // and of course implement it somewhere
Then you will not accidentally modify the object via the pointer you get (which has to be stored in const A *, otherwise the compiler will complain).
However it will not work with stack-based objects, as returning const A & of a local variable is a rather deadly action, and "const constructor" has not been invented yet (related: Why does C++ not have a const constructor?)

Safety of reinterpret_cast on pointer to template aggregate type

I would like to know if my following use of reinterpret_cast is undefined behaviour.
Given a template aggregate such as ...
template<typename T>
struct Container
{
Container(T* p) : ptr(p) { }
...
T* ptr;
};
... and a type hierarchy like ...
struct A { };
struct B : A { };
Is the following cast safe, given that B is a dynamic type of A ...
Container<B>* b = new Container<B>( new B() );
Container<A>* a = reinterpret_cast<Container<A>*>(b);
... in so far as that I can now safely use a->ptr and its (possibly virtual) members?
The code where I use this compiles and executes fine (Clang, OS X) but I'm concerned that I've placed a ticking bomb. I guess every instance of Container<T> shares the same layout and size so it shouldn't be a problem, right?
Looking at what cppreference.com says about reinterpret_cast, there seems to be a statement for legal use that covers what I'm trying to do ...
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true:
...
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
I appreciate that it looks like I'm going the wrong way about this. That's not what I'm concerned about. I'd just like to know if what I'm doing is safe / legal or not. Thanks in advance for any help.
there seems to be a statement for legal use that covers what I'm trying to do ...
That's not what that exception says or means. That exception says that given
struct S { int i; } s;
you can use *reinterpret_cast<int *>(&s) to access s.i.
There is no similar exception for what you're trying to do. What you're trying to do is simply not valid in C++. Even the below is invalid:
struct S { int i; };
struct T { int i; };
int f(S s) { return ((T &) s).i; }
and compilers optimise based on the assumption that you don't write code like that.
For an actual example that fails at run-time with a current compiler:
#include <cstdlib>
struct S { int i; };
struct T { int i; };
void f(S *s, T *t) { int i = s->i; t->i++; if (s->i == i) std::abort(); }
Here, GCC optimises away the check s->i == i (GCC 4.9.2, with -O2 in the command-line options), and unconditionally calls std::abort(), because the compiler knows that s and t cannot possibly point to the same region of memory. Even though you might try to call it as
int main() { S s = { 0 }; f(&s, reinterpret_cast<T *>(&s)); }
Whether or not the type aliasing is legal according to the standard, you may have other issues.
I guess every instance of Container<T> shares the same layout and
size so it shouldn't be a problem, right?
Actually, not every instance of Container<T> shares the same layout! As explained in this question, template members are only created if they are used, so your Container<A> and Container<B> might have different memory layouts if different members are used for each type.

c++ variable assignment, is this a normal way..?

This may be a silly question, but still I'm a bit curious...
Recently I was working on one of my former colleague projects, and I've noticed that he really loved to use something like this:
int foo(7);
instead of:
int foo = 7;
Is this a normal/good way to do in C++ language?
Is there some kind of benefits to it? (Or is this just some silly programming style that he was into..?)
This really reminds me a bit of a good way how class member variables can be assigned in the class constructor... something like this:
class MyClass
{
public:
MyClass(int foo) : mFoo(foo)
{ }
private:
int mFoo;
};
instead of this:
class MyClass
{
public:
MyClass(int foo)
{
mFoo = foo;
}
private:
int mFoo;
};
For basic types there's no difference. Use whichever is consistent with the existing code and looks more natural to you.
Otherwise,
A a(x);
performs direct initialization, and
A a = x;
performs copy initialization.
The second part is a member initializer list, there's a bunch of Q&As about it on StackOverflow.
Both are valid. For builtin types they do the same thing; for class types there is a subtle difference.
MyClass m(7); // uses MyClass(int)
MyClass n = 3; // uses MyClass(int) to create a temporary object,
// then uses MyClass(const MyClass&) to copy the
// temporary object into n
The obvious implication is that if MyClass has no copy constructor, or it has one but it isn't accessible, the attempted construction fails. If the construction would succeed, the compiler is allowed to skip the copy constructor and use MyClass(int) directly.
All the answers above are correct. Just add that to it that C++11 supports another way, a generic one as they say to initialize variables.
int a = {2} ;
or
int a {2} ;
Several other good answers point out the difference between constructing "in place" (ClassType v(<constructor args>)) and creating a temporary object and using the copy constructor to copy it (ClassType v = <constructor arg>). Two additional points need to be made, I think. First, the second form obviously has only a single argument, so if your constructor takes more than one argument, you should prefer the first form (yes, there are ways around that, but I think the direct construction is more concise and readable - but, as has been pointed out, that's a personal preferance).
Secondly, the form you use matters if your copy constructor does something significantly different than your standard constructor. This won't be the case most of the time, and some will argue that it's a bad idea to do so, but the language does allow for this to be the case (all surprises you end up dealing with because of it, though, are your own fault).
It's a C++ style of initializing variables - C++ added it for fundamental types so the same form could be used for fundamental and user-defined types. this can be very important for template code that's intended to be instantiated for either kind of type.
Whether you like to use it for normal initialization of fundamental types is a style preference.
Note that C++11 also adds the uniform initialization syntax which allows the same style of initialization to be used for all types - even aggregates like POD structs and arrays (though user defined types may need to have a new type of constructor that takes an initialization list to allow the uniform syntax to be used with them).
Yours is not a silly question at all as things are not as simple as they may seem. Suppose you have:
class A {
public:
A() {}
};
and
class B {
public:
class B(A const &) {}
};
Writing
B b = B(A());
Requires that B's copy constructor be accessible. Writing
B b = A();
Requires also that B's converting constructor B(A const &) be not declared explicit. On the other hand if you write
A a;
B b(a);
all is well, but if you write
B b(A());
This is interpreted by the compiler as the declaration of a function b that takes a nameless argument which is a parameterless function returning A, resulting in mysterious bugs. This is known as C++'s most vexing parse.
I prefer using the parenthetical style...though I always use a space to distinguish from function or method calls, on which I don't use a space:
int foo (7); // initialization
myVector.push_back(7); // method call
One of my reasons for preferring using this across the board for initialization is because it helps remind people that it is not an assignment. Hence overloads to the assignment operator will not apply:
#include <iostream>
class Bar {
private:
int value;
public:
Bar (int value) : value (value) {
std::cout << "code path A" << "\n";
}
Bar& operator=(int right) {
value = right;
std::cout << "code path B" << "\n";
return *this;
}
};
int main() {
Bar b = 7;
b = 7;
return 0;
}
The output is:
code path A
code path B
It feels like the presence of the equals sign obscures the difference. Even if it's "common knowledge" I like to make initialization look notably different than assignment, since we are able to do so.
It's just the syntax for initialization of something :-
SomeClass data(12, 134);
That looks reasonable, but
int data(123);
Looks strange but they are the same syntax.