In C++11, we have two ways to initialize the data members of a class/struct as illustrated in the following examples:
struct A
{
int n = 7;
};
struct B
{
int n;
B() : n(7) {}
};
Question 1:
Which way is better?
Question 2:
Is the traditional way (the latter) not encouraged from the view of a modern-C++-style checker?
You can actually mix both styles. This is useful if you have multiple constructors, but the variables is only specifically initialized by one or a few of the constructors.
Example
struct A
{
int n = 7;
A() {} // n will be initialized to 7
A(int n_): n{n_} {} // Initialize n to something else
};
I am not sure, but I think that the first case is possible only with C++ primitive types. In most of the books, especially in book 55 Ways to improve your C++ code by Scott Meyers, it is recommended to go via first way, so I would stick with that. :-)
Don't forget, that order of evaluation and initialization is determined how members in classes are sorted.
I prefer just the second style of initialization.
Neither way is better, however, the new uniform initialization has the perk of being similar to other languages and over-all more understandable. Uniform initialization does not only apply to struct members, but also across the board for initializer lists and constructor arguments.
This is my first post. I believe I am aware of best practices on stackoverflow but probably not 100%. I believe there is no specific post that addresses my interrogation; also I hope it's not too vague.
I am trying to figure out good practices for writing C++ constructors
that do medium-to-heavy-duty work.
Pushing (all?) init work into initialization lists seems a good idea
for two reasons that cross my mind, namely:
Resource Acquisition Is Initialization
As far as I know, the simplest way of guaranteeing that members
are initialized correctly at resource acquisition is to make sure that
what's inside the parentheses of the initialization list is correct
when it is evaluated.
class A
{
public:
A(const B & b, const C & c)
: _c(c)
{
/* _c was allocated and defined at the same time */
/* _b is allocated but its content is undefined */
_b = b;
}
private:
B _b;
C _c;
}
const class members
Using initialization lists is the only correct way of using
const members which can hold actual content.
class A
{
public:
A(int m, int n, int p)
: _m(m) /* correct, _m will be initialized to m */
{
_n = n; /* incorrect, _n is already initialized to an undefined value */
*(const_cast<int*>(&_p)) = p; /* technically valid, but ugly and not particularly RAII-friendly */
}
private:
const int _m, _n, _p;
}
However some problems seem to affect over usage of initialization lists:
Order
Member variables are always initialized in the order they are declared in the class definition, so write them in that order in the constructor initialization list. Writing them in a different order just makes the code confusing because it won't run in the order you see, and that can make it hard to see order-dependent bugs.
http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-discussion
This is important if you initialize a value using a value
initialized previously in the list. For example:
A(int in) : _n(in), _m(_n) {}
If _m is defined before _n, its value at initialization is undefined.
I am ready to apply this rule in my code, but when working
with other people it causes code redundancy and forces reading
two files at once.
That is not acceptable and somewhat error-prone.
Solution — initialize using only data from ctor arguments.
Solution's problem — keeping work in the init list without
inner dependency means repeating operations. For example:
int in_to_n(int in)
{
/* ... */
return n;
}
int modify(int n)
{
/* ... */
return modified;
}
A::A(int in)
: _n(in_to_n(in))
, _n_modified(modify(in_to_n(in)))
{}
For tiny bits of repeated operations I believe compilers
can reuse existing data but I don't think one should rely on that
for significant work (and I don't even think it's done if calling
noninlined separate code).
How much work can you put in the list?
In the previous example, I called functions to compute what the
attributes are to be initialized to. These can be plain/lambda
functions or static/nonstatic methods,
of the current class or of another.
(I don't suggest using nonstatic methods of the current class,
it might even be undefined usage according to the standard, not sure.)
I guess this is not in itself a big problem, but one needs to make
special efforts in clarity to keep the intent of the code clear if
writing big classes that do big work that way.
Also, when trying to apply the solution to the previous problem,
there is only so much independent work you can do when initializing
your instance... This usually gets big if you have a long sequence
of attributes to initialize with inner dependencies.
It's starting to look like just the program, translated into an
initialization list; I guess this is not what C++ is supposed to be
transitioning into?
Multiple inits
One often computes two variables at once. Setting two variables
at once in an init list means either:
using an ugly intermediate attribute
struct InnerAData
{
B b;
C c;
};
/* must be exported with the class definition (ugly) */
class A
{
public:
A(const D & input)
: _inner(work(input))
, _b(_inner.b)
, _c(_inner.c) {}
private:
B _b;
C _c;
InnerAData _inner;
}
This is awful and forces extra useless copies.
or some ugly hack
class A
{
public:
A(const D & input) : _b(work(input)) {}
private:
B _b;
C _c;
B work(const D & input)
{
/* ... work ... */
_c = ...;
}
}
This is even more awful and doesn't even work with const
or non-builtin type attributes.
keeping stuff const
Sometimes it can take most of the ctor to figure out the value
to give to an attribute, so that making sure it is const,
and therefore moving the work to the initialization list,
can seem constrained. I won't give a full example, but think
something like computing data from a default filename, then
computing the full filename from that data, then checking if
the corresponding file exists to set a const boolean, etc.
I guess it's not a fundamental problem, but all that seems
intuitively more legible in the body of the ctor, and moving
it to the init list just to do a correct initialization of
a const field seems overkill. Maybe I'm just imagining things.
So here's the hard part: asking a specific question!
Have you faced similar problems, did you find a better solution,
if not what's the lesson to learn — or is there something I'm
missing here?
I guess my problem is I'm pretty much trying to move all the work
to the init list when I could search for a compromise of what state
is initiated and leave some work for later. I just feel like init list
could play a bigger role in making modern C++ code than it does but
I haven't seen them pushed further than basic usage yet.
Additionally, I'm really not convinced as to why the values are
initialized in that order, and not in the order of the list.
I've been orally told it's because attributes are in order on the stack and
the compiler must guarantee that stack data is never above the SP.
I'm not sure how that's a final answer... pretty sure one could
implement safe arbitrarily reordered initialization lists,
correct me if I'm wrong.
In your code:
class A
{
public:
A(const B & b, const C & c)
: _c(c)
{
/* _c was allocated and defined at the same time */
/* _b is allocated but its content is undefined */
_b = b;
}
private:
B _b;
C _c;
}
the constructor calls B::B() and then B::operator= which may be a problem if any of these doesn't exist, is expensive or is not implemented correctly to the RAII and rule-of-three guidelines. The rule of thumb is to always prefer initializer list if it is possible.
Since c++11, an alternative is to use delegating constructors:
struct InnerData;
InnerData (work(const D&);
class A
{
public:
A(const D & input) : A(work(input)) {}
private:
A(const InnerAData&);
private:
const B _b;
const C _c;
};
And (that can be inlined, but then visible in header)
struct InnerAData
{
B b;
C c;
};
A::A(const InnerAData& inner) : _b(inner.b), _c(inner.c) {}
This may be a silly question, but still I'm a bit curious...
Recently I was working on one of my former colleague projects, and I've noticed that he really loved to use something like this:
int foo(7);
instead of:
int foo = 7;
Is this a normal/good way to do in C++ language?
Is there some kind of benefits to it? (Or is this just some silly programming style that he was into..?)
This really reminds me a bit of a good way how class member variables can be assigned in the class constructor... something like this:
class MyClass
{
public:
MyClass(int foo) : mFoo(foo)
{ }
private:
int mFoo;
};
instead of this:
class MyClass
{
public:
MyClass(int foo)
{
mFoo = foo;
}
private:
int mFoo;
};
For basic types there's no difference. Use whichever is consistent with the existing code and looks more natural to you.
Otherwise,
A a(x);
performs direct initialization, and
A a = x;
performs copy initialization.
The second part is a member initializer list, there's a bunch of Q&As about it on StackOverflow.
Both are valid. For builtin types they do the same thing; for class types there is a subtle difference.
MyClass m(7); // uses MyClass(int)
MyClass n = 3; // uses MyClass(int) to create a temporary object,
// then uses MyClass(const MyClass&) to copy the
// temporary object into n
The obvious implication is that if MyClass has no copy constructor, or it has one but it isn't accessible, the attempted construction fails. If the construction would succeed, the compiler is allowed to skip the copy constructor and use MyClass(int) directly.
All the answers above are correct. Just add that to it that C++11 supports another way, a generic one as they say to initialize variables.
int a = {2} ;
or
int a {2} ;
Several other good answers point out the difference between constructing "in place" (ClassType v(<constructor args>)) and creating a temporary object and using the copy constructor to copy it (ClassType v = <constructor arg>). Two additional points need to be made, I think. First, the second form obviously has only a single argument, so if your constructor takes more than one argument, you should prefer the first form (yes, there are ways around that, but I think the direct construction is more concise and readable - but, as has been pointed out, that's a personal preferance).
Secondly, the form you use matters if your copy constructor does something significantly different than your standard constructor. This won't be the case most of the time, and some will argue that it's a bad idea to do so, but the language does allow for this to be the case (all surprises you end up dealing with because of it, though, are your own fault).
It's a C++ style of initializing variables - C++ added it for fundamental types so the same form could be used for fundamental and user-defined types. this can be very important for template code that's intended to be instantiated for either kind of type.
Whether you like to use it for normal initialization of fundamental types is a style preference.
Note that C++11 also adds the uniform initialization syntax which allows the same style of initialization to be used for all types - even aggregates like POD structs and arrays (though user defined types may need to have a new type of constructor that takes an initialization list to allow the uniform syntax to be used with them).
Yours is not a silly question at all as things are not as simple as they may seem. Suppose you have:
class A {
public:
A() {}
};
and
class B {
public:
class B(A const &) {}
};
Writing
B b = B(A());
Requires that B's copy constructor be accessible. Writing
B b = A();
Requires also that B's converting constructor B(A const &) be not declared explicit. On the other hand if you write
A a;
B b(a);
all is well, but if you write
B b(A());
This is interpreted by the compiler as the declaration of a function b that takes a nameless argument which is a parameterless function returning A, resulting in mysterious bugs. This is known as C++'s most vexing parse.
I prefer using the parenthetical style...though I always use a space to distinguish from function or method calls, on which I don't use a space:
int foo (7); // initialization
myVector.push_back(7); // method call
One of my reasons for preferring using this across the board for initialization is because it helps remind people that it is not an assignment. Hence overloads to the assignment operator will not apply:
#include <iostream>
class Bar {
private:
int value;
public:
Bar (int value) : value (value) {
std::cout << "code path A" << "\n";
}
Bar& operator=(int right) {
value = right;
std::cout << "code path B" << "\n";
return *this;
}
};
int main() {
Bar b = 7;
b = 7;
return 0;
}
The output is:
code path A
code path B
It feels like the presence of the equals sign obscures the difference. Even if it's "common knowledge" I like to make initialization look notably different than assignment, since we are able to do so.
It's just the syntax for initialization of something :-
SomeClass data(12, 134);
That looks reasonable, but
int data(123);
Looks strange but they are the same syntax.
Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.