This is my first post. I believe I am aware of best practices on stackoverflow but probably not 100%. I believe there is no specific post that addresses my interrogation; also I hope it's not too vague.
I am trying to figure out good practices for writing C++ constructors
that do medium-to-heavy-duty work.
Pushing (all?) init work into initialization lists seems a good idea
for two reasons that cross my mind, namely:
Resource Acquisition Is Initialization
As far as I know, the simplest way of guaranteeing that members
are initialized correctly at resource acquisition is to make sure that
what's inside the parentheses of the initialization list is correct
when it is evaluated.
class A
{
public:
A(const B & b, const C & c)
: _c(c)
{
/* _c was allocated and defined at the same time */
/* _b is allocated but its content is undefined */
_b = b;
}
private:
B _b;
C _c;
}
const class members
Using initialization lists is the only correct way of using
const members which can hold actual content.
class A
{
public:
A(int m, int n, int p)
: _m(m) /* correct, _m will be initialized to m */
{
_n = n; /* incorrect, _n is already initialized to an undefined value */
*(const_cast<int*>(&_p)) = p; /* technically valid, but ugly and not particularly RAII-friendly */
}
private:
const int _m, _n, _p;
}
However some problems seem to affect over usage of initialization lists:
Order
Member variables are always initialized in the order they are declared in the class definition, so write them in that order in the constructor initialization list. Writing them in a different order just makes the code confusing because it won't run in the order you see, and that can make it hard to see order-dependent bugs.
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-discussion
This is important if you initialize a value using a value
initialized previously in the list. For example:
A(int in) : _n(in), _m(_n) {}
If _m is defined before _n, its value at initialization is undefined.
I am ready to apply this rule in my code, but when working
with other people it causes code redundancy and forces reading
two files at once.
That is not acceptable and somewhat error-prone.
Solution — initialize using only data from ctor arguments.
Solution's problem — keeping work in the init list without
inner dependency means repeating operations. For example:
int in_to_n(int in)
{
/* ... */
return n;
}
int modify(int n)
{
/* ... */
return modified;
}
A::A(int in)
: _n(in_to_n(in))
, _n_modified(modify(in_to_n(in)))
{}
For tiny bits of repeated operations I believe compilers
can reuse existing data but I don't think one should rely on that
for significant work (and I don't even think it's done if calling
noninlined separate code).
How much work can you put in the list?
In the previous example, I called functions to compute what the
attributes are to be initialized to. These can be plain/lambda
functions or static/nonstatic methods,
of the current class or of another.
(I don't suggest using nonstatic methods of the current class,
it might even be undefined usage according to the standard, not sure.)
I guess this is not in itself a big problem, but one needs to make
special efforts in clarity to keep the intent of the code clear if
writing big classes that do big work that way.
Also, when trying to apply the solution to the previous problem,
there is only so much independent work you can do when initializing
your instance... This usually gets big if you have a long sequence
of attributes to initialize with inner dependencies.
It's starting to look like just the program, translated into an
initialization list; I guess this is not what C++ is supposed to be
transitioning into?
Multiple inits
One often computes two variables at once. Setting two variables
at once in an init list means either:
using an ugly intermediate attribute
struct InnerAData
{
B b;
C c;
};
/* must be exported with the class definition (ugly) */
class A
{
public:
A(const D & input)
: _inner(work(input))
, _b(_inner.b)
, _c(_inner.c) {}
private:
B _b;
C _c;
InnerAData _inner;
}
This is awful and forces extra useless copies.
or some ugly hack
class A
{
public:
A(const D & input) : _b(work(input)) {}
private:
B _b;
C _c;
B work(const D & input)
{
/* ... work ... */
_c = ...;
}
}
This is even more awful and doesn't even work with const
or non-builtin type attributes.
keeping stuff const
Sometimes it can take most of the ctor to figure out the value
to give to an attribute, so that making sure it is const,
and therefore moving the work to the initialization list,
can seem constrained. I won't give a full example, but think
something like computing data from a default filename, then
computing the full filename from that data, then checking if
the corresponding file exists to set a const boolean, etc.
I guess it's not a fundamental problem, but all that seems
intuitively more legible in the body of the ctor, and moving
it to the init list just to do a correct initialization of
a const field seems overkill. Maybe I'm just imagining things.
So here's the hard part: asking a specific question!
Have you faced similar problems, did you find a better solution,
if not what's the lesson to learn — or is there something I'm
missing here?
I guess my problem is I'm pretty much trying to move all the work
to the init list when I could search for a compromise of what state
is initiated and leave some work for later. I just feel like init list
could play a bigger role in making modern C++ code than it does but
I haven't seen them pushed further than basic usage yet.
Additionally, I'm really not convinced as to why the values are
initialized in that order, and not in the order of the list.
I've been orally told it's because attributes are in order on the stack and
the compiler must guarantee that stack data is never above the SP.
I'm not sure how that's a final answer... pretty sure one could
implement safe arbitrarily reordered initialization lists,
correct me if I'm wrong.
In your code:
class A
{
public:
A(const B & b, const C & c)
: _c(c)
{
/* _c was allocated and defined at the same time */
/* _b is allocated but its content is undefined */
_b = b;
}
private:
B _b;
C _c;
}
the constructor calls B::B() and then B::operator= which may be a problem if any of these doesn't exist, is expensive or is not implemented correctly to the RAII and rule-of-three guidelines. The rule of thumb is to always prefer initializer list if it is possible.
Since c++11, an alternative is to use delegating constructors:
struct InnerData;
InnerData (work(const D&);
class A
{
public:
A(const D & input) : A(work(input)) {}
private:
A(const InnerAData&);
private:
const B _b;
const C _c;
};
And (that can be inlined, but then visible in header)
struct InnerAData
{
B b;
C c;
};
A::A(const InnerAData& inner) : _b(inner.b), _c(inner.c) {}
Related
I want to write a template function that receives parameter by move or by copy.
The most efficient way that I use is:
void setA(A a)
{
m_a = std::move(a);
}
Here, when we use is
A a;
setA(a); // <<---- one copy ctor & one move ctor
setA(std::move(a)); // <<---- two move ctors
I recently found out that defining it this way, with two functions:
void setA(A&& a)
{
m_a = std::move(a);
}
void setA(const A& a)
{
m_a = a; // of course we can so "m_a = std::move(a);" too, since it will do nothing
}
Will save a lot!
A a;
setA(a); // <<---- one copy ctor
setA(std::move(a)); // <<---- one move ctor
This is great! for one parameter... what is the best way to create a function with 10 parameters?!
void setAAndBAndCAndDAndEAndF...()
Any one has any ideas?
Thanks!
The two setter versions setA(A&& a) and setA(const A& a) can be combined into a single one using a forwarding reference (a.k.a. perfect forwarding):
template<typename A>
void setA(A&& a)
{
m_a = std::forward<A>(a);
}
The compiler will then synthesize either the rvalue- or lvalue-reference version as needed depending on the value category.
This also solves the issue of multi-value setters, as the right one will be synthesized depending on the value category of each parameter.
Having said that, keep in mind that setters are just regular functions; the object is technically already constructed by the time any setter can be called. In case of setA, if A has a non-trivial constructor, then an instance m_a would already have been (default-)constructed and setA would actually have to overwrite it.
That's why in modern C++, the focus is often not so much on move- vs. copy-, but on in-place construction vs. move/copy.
For example:
struct A {
A(int x) : m_x(x) {}
int m_x;
};
struct B {
template<typename T>
B(T&& a) : m_a(std::forward<T>(a)) {}
A m_a;
};
int main() {
B b{ 1 }; // zero copies/moves
}
The standard library also often offers "emplace"-style calls in addition to more traditional "push"/"add"-style calls. For example, vector::emplace takes the arguments needed to construct an element, and constructs one inside the vector, without having to copy or move anything.
The best would be to construct a in-place within the constructor. About setters, there is no single best. Taking by value and moving seems to work fine in most cases, but can sometimes be less efficient. Overloading as you showed is maximally efficient, but causes lots of code duplication. templates can avoid code duplication with the help of universal-references, but then you have to roll out your own type checking and it gets complicated. Unless you've detected this as a bottleneck with a profiler, I suggest you stick with take-by-value-then-move as it's the simplest, causes minimal code duplication and provides good exception-safety.
After a lot of research, I have found an answer!
I made an efficient wrapper class that allows you to hold both options and lets you decide in the inner function whether you want to copy or not!
#pragma pack(push, 1)
template<class T>
class CopyOrMove{
public:
CopyOrMove(T&&t):m_move(&t),m_isMove(true){}
CopyOrMove(const T&t):m_reference(&t),m_isMove(false){}
bool hasInstance()const{ return m_isMove; }
const T& getConstReference() const {
return *m_reference;
}
T extract() && {
if (hasInstance())
return std::move(*m_move);
else
return *m_reference;
}
void fastExtract(T* out) && {
if (hasInstance())
*out = std::move(*m_move);
else
*out = *m_reference;
}
private:
union
{
T* m_move;
const T* m_reference;
};
bool m_isMove;
};
#pragma pack(pop)
Now you can have the function:
void setAAndBAndCAndDAndEAndF(CopyOrMove<A> a, CopyOrMove<B> b, CopyOrMove<C> c, CopyOrMove<D> d, CopyOrMove<E> e, CopyOrMove<F> f)
With zero code duplication! And no redundant copy or move!
Short answer:
It's a compromise between verbosity and speed. Speed is not everything.
defining it this way, with two functions ... Will save a lot!
It will save a single move-assignment, which often isn't a lot.
Unless you need this specific piece of code to be as fast as possible (e.g. you're writing a custom container), I'd prefer passing by value because it's less verbose.
Other possible approaches are:
Using a forwarding reference, as suggested in the other answers. It'll give you the same amount of copies/moves as a pair of overloads (const T & + T &&), but it makes passing more than one parameter easier, because you only have to write a single function instead of 2N of them.
Making the setter behave like emplace(). This will give you no performance benefit (because you're assigning to an existing object instead of creating a new one), so it doesn't make much sense.
This question already has answers here:
Order of member initialization list [duplicate]
(2 answers)
Closed 5 years ago.
All old and modern C++ books and experts state to initialize class members by their declaration order. But neither explains what if I don't??
I am not talking about classes with members of const types or smth.. just plain simple class.
Consider the sample:
class A
{
int n;
std::vector<double> VD;
char c;
public:
A():
VD(std::vector<double>(3)),
c('a'),
n(44)
{
}
};
Whats the difference of written code and the one with same order in which they are declared???
Whats the difference of written code and the one with same order in which they are declared???
If members don't depend on each other's initialization order, there is no difference whatsoever. But if they do, then a member initialization list may be telling a lie.
Many a programmer were bitten by this, when they thought their constructors were written correctly, but in fact they had undefined behavior on their hands.
Consider this simple case:
struct foo {
int _a;
int _b;
foo(int b) : _b(b), _a(2 * _b) {}
};
What's _a in the above example? If you answer anything but "the behavior is undefined because _b is used initialized", you'd be wrong.
But neither explains what if I don't?
Programmers have no control over it: the order in which you list members in the initialization list has no effect on the actual order of initialization. The compiler ignores the order of items on the list, and re-orders the expressions to match declaration order.
Here is a short example to illustrate this point:
struct Foo {
Foo(const char *s) { cout << "Init " << s << endl; }
};
struct Bar {
Foo a;
Foo b;
Foo c;
Bar() : c("c"), b("b"), a("a") {
}
};
The above prints
Init a
Init b
Init c
even though initialization lists the items in opposite order.
Demo.
There ought to be absolutely no difference in the generated assembly, although the "as-if" rule might get in the way.
Conceptually at least, n is initialised before c.
Reference: http://en.cppreference.com/w/cpp/language/as_if
You can't change initialisation order - it's the order the members appear in the class - the order in the initialisation list is not significant, though compilers may warn if the two orders don't match up.
I think there are two reasons to order them properly.
The same order as they are declared makes the code more readable, especially when you want to add or remove some more variables.
The order they are declared indicates their order in the memory. You have more locality when you initialize the variables.
It depends on how the members are used. If there is a dependency then order must be followed.
Consider below example.
class x{
size_t n;
char * ch; // the size of dynamic char array depends on n
}
Here, initializing in different order will result in undefined behavior
Other than this reason, of course readability and uniformity matters from coding guidelines POV.
Is there a best practice for deferred initialization of a private class member M of class C? For example:
class C {
public:
C();
// This works properly without m, and maybe called at any time,
// even before startWork was called.
someSimpleStuff();
// Called single time, once param is known and work can be started.
startWork(int param);
// Uses m. Called multiple times.
// Guaranteed to only be called after startWork was called
doProcessing();
private:
M m;
};
class M {
M(int param);
};
Objects of class C can't be constructed because M doesn't have a default initializer.
If you can modify M's implementation, it's possible to add an init method to M, and make its constructor accept no arguments, which would allow constructing objects of class C.
If not, you can wrap the C's member m in std::unique_ptr, and construct it when it becomes possible.
However, both these solutions are prone to errors which would be caught in run-time. Is there some practice to make sure at compile-time that m is only used after its been initialized?
Restriction: An object of class C is handed to external code which makes use of its public interface, so C's public methods can't be split into multiple classes.
The best practice is to never used deferred initialisation.
In your case, ditch the default constructor for C and replace it with C(int param) : m(param){}. That is, class members get initialised at the point of construction using base member initialisation.
Using deferred initialisation means your object is potentially in an undefined state, and achieving things like concurrency is harder.
#define ENABLE_THREAD_SAFETY
class C {
public:
C();
// This works properly without m, and maybe called at any time,
// even before startWork was called.
someSimpleStuff();
// Called single time, once param is known and work can be started.
startWork(int param);
// Uses m. Called multiple times.
// Guaranteed to only be called after startWork was called
doProcessing();
M* mptr()
{
#ifdef ENABLE_THREAD_SAFETY
std::call_once(create_m_once_flag, [&] {
m = std::make_unique<M>(mparam);
});
#else
if (m == nullptr)
m = std::make_unique<M>(mparam);
#endif
return m.get();
}
private:
int mparam;
std::unique_ptr<M> m;
#ifdef ENABLE_THREAD_SAFETY
std::once_flag create_m_once_flag;
#endif
};
class M {
M(int param);
};
Now all you have to do is stop using m directly, and access it through mptr() instead. It will only create the M class once, when it's first used.
I would go with unique_ptr... Where do you see issues with that? When using M, you can easily check:
if(m)
m->foo();
I know that this is not a compile-time check but as far as I know, there is no check possible with current compilers. Code analysis would have to be quite complicated to see something like this because you can initialize m whenever you are willing to in any method or - if public/protected - even in another file. A compile time check would mean that lazy initialization is done at compile time but the very concept of lazy initialization is runtime-based.
Ok from what I understand of your problem, would this be a solution?
You put the functionality that does not require M into class D. You create D object and use it. Once you need M and you want to do the doProcessing() code, you create object of C, pass D to it and initialize it with param that you now have.
The below code is made just to illustrate the idea. You probably don't need startWork() to be a separate function in this case and its code could be written in the constructor of C
Note: I have made all the functions empty, so I could compile the code to check for syntax errors :)
class M
{
public:
M(int param) {}
};
class D
{
public:
D() {}
// This works properly without m, and maybe called at any time,
// even before startWork was called.
void someSimpleStuff() {}
};
class C
{
public:
C(D& d, int param) : d(d), m(param) { startWork(param); }
// Uses m. Called multiple times.
// Guaranteed to only be called after startWork was called
void doProcessing() {}
private:
// Called single time, once param is known and work can be started.
void startWork(int param) {}
D& d;
M m;
};
int main()
{
D d;
d.someSimpleStuff();
C c(d, 1337);
c.doProcessing();
c.doProcessing();
}
The question is "Is it possible to check at compile time that m is only used after it has been initialized without spliting the interface of C?"
The answer is No, you must use the type system to ensure that an object M is not used before initialized which would implies to split C interface. At compile time, compilers only know the type of objects, and the value of constant expressions. C cannot be a literal type. So you must use the type system: you must split C interface to ensure at compile time that M is only used after initialization.
I recently came across classes that use a configuration object instead of the usual setter methods for configuration. A small example:
class A {
int a, b;
public:
A(const AConfiguration& conf) { a = conf.a; b = conf.b; }
};
struct AConfiguration { int a, b; };
The upsides:
You can extend your object and easily guarantee reasonable default values for new values without your users ever needing to know about it.
You can check a configuration for consistency (e.g. your class only allows some combinations of values)
You save a lot of code by ommiting the setters.
You get a default constructor for specifying a default constructor for your Configuration struct and use A(const AConfiguration& conf = AConfiguration()).
The downside(s):
You need to know the configuration at construction time and can't change it later on.
Are there more downsides to this that I'm missing? If there aren't: Why isn't this used more frequently?
Whether you pass the data individually or per struct is a question of style and needs to be decided on a case-by-case basis.
The important question is this: Is the object is ready and usable after construction and does the compiler enforce that you pass all necessary data to the constructor or do you have to remember to call a bunch of setters after construction who's number might increase at any time without the compiler giving you any hint that you need to adapt your code. So whether this is
A(const AConfiguration& conf) : a(conf.a), b(conf.b) {}
or
A(int a_, int b_) : a(a_), b(b_) {}
doesn't matter all that much. (There's a number of parameters where everyone would prefer the former, but which number this is - and whether such a class is well designed - is debatable.) However, whether I can use the object like this
A a1(Configuration(42,42));
A a2 = Configuration(4711,4711);
A a3(7,7);
or have to do this
A urgh;
urgh.setA(13);
urgh.setB(13);
before I can use the object, does make a huge difference. Especially so, when someone comes along and adds another data field to A.
Using this method makes binary compatibility easier.
When the library version changes and if the configuration struct contains it, then constructor can distinguish whether "old" or "new" configuration is passed and avoid "access violation"/"segfault" when accessing non-existant fields.
Moreover, the mangled name of constructor is retained, which would have changed if it changed its signature. This also lets us retain binary compatibility.
Example:
//version 1
struct AConfiguration { int version; int a; AConfiguration(): version(1) {} };
//version 2
struct AConfiguration { int version; int a, b; AConfiguration(): version(2) {} };
class A {
A(const AConfiguration& conf) {
switch (conf.version){
case 1: a = conf.a; b = 0; // No access violation for old callers!
break;
case 2: a = conf.a; b = conf.b; // New callers do have b member
break;
}
}
};
The main upside is that the A object can be unmutable. I don't know if having the AConfiguration stuct actualy gives any benefit over just an a and a b parameter to the constructor.
Using this method makes binary compatability harder.
If the struct is changed (one new optional field is added), all code using the class might need a recompile. If one new non-virtual setter function is added, no such recompilation is necessary.
I would support the decreased binary compatibility here.
The problem I see comes from the direct access to a struct fields.
struct AConfig1 { int a; int b; };
struct AConfig2 { int a; std::map<int,int> b; }
Since I modified the representation of b, I am screwed, whereas with:
class AConfig1 { public: int getA() const; int getB() const; /* */ };
class AConfig2 { public: int getA() const; int getB(int key = 0) const; /* */ };
The physical layout of the object might have change, but my getters have not and the offset to the functions have not either.
Of course, for binary compatibility, one should check out the PIMPL idiom.
namespace details { class AConfigurationImpl; }
class AConfiguration {
public:
int getA() const;
int getB() const;
private:
AConfigurationImpl* m_impl;
};
While you do end up writing more code, you have the guarantee here of backward compatibility of your object as long as you add supplementary methods AFTER the existing ones.
The representation of an instance in memory does not depend on the number of methods, it only depends on:
the presence or absence of virtual methods
the base classes
the attributes
Which is what is VISIBLE (not what is accessible).
And here we guarantee that we won't have any change in the attributes. The definition of AConfigurationImpl might change without any problem and the implementation of the methods might change too.
The more code means: constructor, copy constructor, assignment operator and destructor, which is a fair amount, and of course the getters and setters. Also note that these methods can no longer be inlined, since their implementation are defined in a source file.
Whether or not it suits you, you're on your own to decide.
So I'm working with this huge repository of code and have realized that one of the structs lack an important field. I looked at the code (which uses the struct) as closely as I could and concluded that adding an extra field isn't going to break it.
Any ideas on where I could've screwed up?
Also: design advice is welcome - what's the best way I can accomplish this?
E.g. (if I wasn't clear):
typedef struct foo
{
int a;
int b;
}
foo;
Now it's :
typedef struct foo
{
int a;
int b;
int c;
}
foo;
If that structure is being serialized/deserialized anywhere, be sure to pay attention to that section of the code.
Double check areas of the code where memory is being allocated.
From what you've written above I can't see anything wrong. Two things I can think of:
Whenever you change code and recompile you introduce the ability to find "hidden" bugs. That is, uninitialized pointers which your new data structure could be just big enough to be corrupted.
Are you making sure you initialize c before it gets used?
Follow Up:
Since you haven't found the error yet I'd stop looking at your struct. Someone once wrote look for horses first, zebras second. That is, the error is probably not an exotic one. How much coverage do you have in your unit tests? I'm assuming this is legacy code which almost invariably means 0% or at least that's been my experience. Is this accurate?
If you are using sizeof(struct) to allocate memory at all places and are accessing the members using -> or . operators, I don't think you should face any problem. But, it also depends on where you are trying to add the member, it might screw up your structure alignment if you are not careful.
Any ideas on where I could've screwed up?
Nothing. Everything. It all depends on how, where and why this is used.
Assuming this structure you talk about is a C-style POD and the code is but the simplest, you'll get away with it. But, the moment you are trying something more ambitious, you are dealing with alignment issues (depending on how and where you create objects) and padding at least. If this is C++ and your POD contains custom operators/ctors etc -- you're getting into a lot of trouble. Cross-platform issues may arise, if you rely on the endianness ever etc.
If the code had a robust set of unit tests, it would probably be much easier to track down the problem (you asked for design advice ;) )
I assume you don't need to use the new 'c' variable everywhere in this giant codebase, you're just adding it so you can use it in some code you're adding or modifying? Instead of adding c to foo, you could make a new struct, bar, which contains a foo object and c. Then use bar where it's needed.
As for the actual bug, it could be anything with so little information to go on, but if I had to guess, I'd say someone used a magic number instead of sizeof() somewhere.
Look for memcpy, memset, memcmp. These functions are not member-wise. If they were used using the previous structure length, you may have problems.
Also search the files for every instance of the struct. There may be functions or methods that do not use the new important field. As others have said, if you find the structure in a #define or typedef, you'll have to search those too.
Since you tagged your question C++:
For the future, Pimpl/d-Pointer is a strategy that allows you much greater freedom in extending or re-designing your classes without breaking compatibility.
For example, if you had originally written
// foo.h
class Foo {
public:
Foo();
Foo(const Foo &);
~Foo();
int a() const;
void a(int);
int b() const;
void b(int);
private:
class FooPrivate *const d;
};
// foo.c
class FooPrivate {
public:
FooPrivate() : a(0), b(0) {}
FooPrivate(const FooPrivate &o) : a(o.a), b(o.b) {}
int a;
int b;
};
Foo::Foo() : d(new FooPrivate()) {}
Foo::Foo(const Foo &o) : d(new FooPrivate(*o->d)) {}
Foo::~Foo() { delete d; }
int Foo::a() const { return d->a; }
void Foo::a(int a) { d->a = a; }
// ...
you can easily extend this to
// foo.h
class Foo {
public:
// ...
int a() const;
void a(int);
int b() const;
void b(int);
int c() const;
void c(int);
// ...
};
// foo.c
class FooPrivate {
// ...
int a;
int b;
int c;
};
// ...
without breaking any existing (compiled!) code using Foo.
If the code is used to transfer data across the network, you could be breaking things.
If adding a structure member anywhere other than as the first member breaks anything, then the code has undefined behaviour and it's wrong. So at least you have someone else (or your earlier self) to blame for the breakage. But yes, undefined behaviour includes "happens to do what we'd like it to do", so as the other guys say, watch out for memory allocation, serialization (network and file IO).
As an aside, I always cringe when I see typedef FOO ... struct FOO, as if one is trying to make C code look like C++. I realize I'm in a minority here :)
Its always safe to add new elements at the end of a C struct. Event if that struct is passed to different processes. The code which has been recompiled will see the new struct member and the code which hasn't been will just be aware of the old struct size and just read the old members its knows about.
The caveat here is that new member has to be added at the end of the structure and not in the middle.