I've started to pick up this pattern:
template<typename T>
struct DefaultInitialize
{
DefaultInitialize():m_value(T()){}
// ... conversions, assignments, etc ....
};
So that when I have classes with primitive members, I can set them to be initialized to 0 on construction:
struct Class
{
...
DefaultInitialize<double> m_double;
...
};
The reason I do this is to avoid having to remember to initialize the member in each constructor (if there are multiple constructors). I'm trying to figure out if:
This is a valid pattern?
I am using the right terminology?
This is a valid pattern?
It's a known "valid" pattern, i would say. Boost has a class template called value_initialized that does exactly that, too.
I am using the right terminology?
Well, your template can be optimized to have fewer requirements on the type parameter. As of now, your type T requires a copy constructor, unfortunately. Let's change the initializer to the following
DefaultInitialize():m_value(){}
Then, technically this kind of initialization is called value initialization, starting with C++03. It's a little bit weird, since no kind of value is provided in the first place. Well, this kind of initialization looks like default initialization, but is intended to fill things with zero, but respecting any user defined constructor and executing that instead.
To summarize, what you did was to value initialize an object having type T, then to copy that object to m_value. What my version of above does it to value initialize the member directly.
Seems like a lot of work to avoid having to type m_double(0). I think it's harder to understand at first glance, but it does seem fine as long as everything is implemented properly.
But is it worth it? Do you really want to have to #include "DefaultInitialize.h" everywhere?
To clarify, basically, you're:
Making your compile times longer because of the includes.
Your code base larger because you have to manage the deceptively simple DefaultInitialize class
Increase the time it takes other people to read your code. If you have a member of a class that's a double, that's natural to me, but when I see DefaultInitialize, I have to learn what that is and why it was created
All that because you don't like to type out a constructor. I understand that it seems very nice to not have to do this, but most worth-while classes I've ever written tend to need to have a constructor written anyway.
This is certainly only my opinion, but I think most other people will agree with it. That is: it would be handy to not have to explicitly initialize members to 0, but the alternative (your class) isn't worth it.
Not to mention that in C++0x, you can do this;
class Foo
{
private:
int i = 0; // will be initialized to 0
}
Some compilers don't properly implement value initialization. For example, see Microsoft Connect, Value-initialization in new-expression, reported by Pavel Kuznetsov.
Fernando Cacciola's boost::value_initialized (mentioned already here by litb) offers a workaround to such compiler bugs.
If you're just initializing basic types to zero, you can override new and have it memset allocated memory to zero. May be simpler. There are pros and cons to doing this.
Related
Let's consider the following:
#include <iostream>
#include <initializer_list>
class Foo {
public:
Foo(int) {
std::cout << "with int\n";
}
};
int main() {
Foo a{10}; // new style initialization
Foo b(20); // old style initialization
}
Upon running it prints:
with int
with int
All good. Now due to new requirements I have added a constructor which takes an initializer list.
Foo(std::initializer_list<int>) {
std::cout << "with initializer list\n";
}
Now it prints:
with initializer list
with int
So my old code Foo a{10} got silently broken. a was supposed to be initialized with an int.
I understand that the language syntax is considering {10} as a list with one item. But how can I prevent such silent breaking of old code?
Is there any compiler option which will give us warning on such cases? Since this will be compiler specific I'm mostly interested with gcc. I have already tried -Wall -Wextra.
If there is no such option then do we always need to use old style construction, i.e. with () Foo b(20), for other constructors and use {} only when we really meant an initializer list?
It's impossible to generate any warning in these cases, because presented behaviour of choosing std::initializer_list constructor over direct matches is well defined and compliant with a standard.
This issue is described in detail in Scott Meyers Effective Modern C++ book
Item 7:
If, however, one or more constructors declare a parameter of type
std::initializer_list, calls using the braced initialization syntax
strongly prefer the overloads taking std::initializer_lists. Strongly.
If there’s any way for compilers to construe a call using a braced
initializer to be to a constructor taking a std::initializer_list,
compilers will employ that interpretation.
He also presents a few of edge cases of this issue, I strongly recommend reading it.
I couldn't find such an option, so apparently in such cases you should use parentheses for classes that have initializer_list constructor, and uniform initialization for all other classes as you wish.
Some useful insights can be found in this answer and comments to it: https://stackoverflow.com/a/18224556/2968646
There are no compiler warnings and there never will be. It just doesn't make sense to warn on code doing something common like
std::vector vec{1};
Remember that the compiler only warns about really unwanted stuff, like undefined behavior. It has no way of knowing that in the definition above, you meant to call the constructor taking a size argument. For all it knows you actually want to have a vector with one element! It can't read your mind :)
The answer to your second question is basically yes. You can always add a dummy parameter like struct {} dummy; to avoid using the constructor with initializer list, but really, the only same solution is just to use parentheses instead of braces (or don't break the interface suddenly).
If you want to change every portion of code that uses list initialization, you can delete the initializer list constructor, change them to braces, and then implement the constructor correctly. I would consider such change a breaking one, and deal with it appropriately. The other idea would have been to come up with the initializer list use case beforehand, and implement it right away.
Suppose I have a class like this:
class Foo
{
public:
Foo(int something) {}
};
And I create it using this syntax:
Foo f{10};
Then later I add a new constructor:
class Foo
{
public:
Foo(int something) {}
Foo(std::initializer_list<int>) {}
};
What happens to the construction of f? My understanding is that it will no longer call the first constructor but instead now call the init list constructor. If so, this seems bad. Why are so many people recommending using the {} syntax over () for object construction when adding an initializer_list constructor later may break things silently?
I can imagine a case where I'm constructing an rvalue using {} syntax (to avoid most vexing parse) but then later someone adds an std::initializer_list constructor to that object. Now the code breaks and I can no longer construct it using an rvalue because I'd have to switch back to () syntax and that would cause most vexing parse. How would one handle this situation?
What happens to the construction of f? My understanding is that it will no longer call the first constructor but instead now call the init list constructor. If so, this seems bad. Why are so many people recommending using the {} syntax over () for object construction when adding an initializer_list constructor later may break things silently?
On one hand, it's unusual to have the initializer-list constructor and the other one both be viable. On the other hand, "universal initialization" got a bit too much hype around the C++11 standard release, and it shouldn't be used without question.
Braces work best for like aggregates and containers, so I prefer to use them when surrounding some things which will be owned/contained. On the other hand, parentheses are good for arguments which merely describe how something new will be generated.
I can imagine a case where I'm constructing an rvalue using {} syntax (to avoid most vexing parse) but then later someone adds an std::initializer_list constructor to that object. Now the code breaks and I can no longer construct it using an rvalue because I'd have to switch back to () syntax and that would cause most vexing parse. How would one handle this situation?
The MVP only happens with ambiguity between a declarator and an expression, and that only happens as long as all the constructors you're trying to call are default constructors. An empty list {} always calls the default constructor, not an initializer-list constructor with an empty list. (This means that it can be used at no risk. "Universal" value-initialization is a real thing.)
If there's any subexpression inside the braces/parens, the MVP problem is already solved.
Retrofitting classes with initializer lists in updated code is something that sounds like it will be a common thing to happen. So people start using {} syntax for existing constructors before the class is updated, and we want to automatically catch any old uses, especially those used in templates where they may be overlooked.
If I had a class like vector that took a size, then arguably using {} syntax is "wrong", but for the transition we want to catch that anyway. Constructing C c1 {val} means take some (one, in this case) values for the collection, and C c2 (arg) means use val as a descriptive piece of metadata for the class.
In order to support both uses, when the type of element happens to be compatible with the descriptive argument, code that used C c2 {arg} will change meaning. There seems to be no way around it in that case if we want to support both forms with different meanings.
So what would I do? If the compiler provides some way to issue a warning, I'd make the initializer list with one argument give a warning. That sounds tricky not to mention compiler specific, so I'd make a general template for that, if it's not already in Boost, and promote its use.
Other than containers, what other situations would have initializer list and single argument constructors with different meanings where the single argument isn't something of a very distinct type from what you'd be using with the list? For non-containers, it might suffice to notice that they won't be confused because the types are different or the list will always have multiple elements. But it's good to think about that and take additional steps if they could be confused in this manner.
For a non-container being enhanced with initializer_list features, it might be sufficient to specifically avoid designing a one-argument constructor that can be mistaken. So, the one-arg constructor would be removed in the updated class, or the initializer list would require other (possibly tag) arguments first. That is, don't do that, under penalty of pie-in-face at the code review.
Even for container-like classes, a class that's not a standard library class could impose that the one-arg constructor form is no longer available. E.g. C c3 (size); would have to be written as C c3 (size, C()); or designed to take an enumeration argument also, which is handy to specify initialized to one value vs. reserved size, so you can argue it's a feature and point out code that begins with a separate call to reserve. So again, don't do that if I can reasonably avoid it.
It recently occurred to me that more often than I'd like to admit, the fix for a "strange bug that only materializes occasionally" is to simply initialize the one member variable of a class I forgot to add to the initializer list.
To prevent wasting time on such bugs in the future, I've been toying with the idea to completely abandon built-in primitive types and replacing them with wrapper classes that act exactly like their primitive counterparts, except that they'll always be initialized.
I'm proficient enough in C++ to know that I can write classes that get very close to that goal. I.e. I'm confident I can write a MyInt class that behaves very much like a real int. But I also know C++ well enough to know that there's probably an arcane thing or two I'd be missing ;)
Has anyone done something like this before? Any pointers to relevant documentation, or list of pitfalls to watch out for? Is this even a good idea, or are there any downsides I'm not seeing?
Thanks!
EDIT: Thanks everyone for your comments, here's an update. I started playing with Jarod42's wrapper snippet and see if I could convert a small hobby project codebase. Not completely unexpectedly, it's quite a PITA, but would probably be doable. It does start feeling like a very big hammer for this problem though.
Compiler warnings are not a real option, since only a single one (-Weffc++) seems to find the problem and it only exists for gcc, i.e. this is not a safe, portable solution.
My conclusion so far is to start using C++11's in-class initialization for all primitive members as suggested by Praetorian below. I.e. something like
struct C {
int x = 0; // yay C++11!
};
.. and hope that after some time, ommitting such initialized feels as "naked" as declaring an unititialized variable in code within a function (which I've stopped doing a long time ago). This seems much less error-prone than trying to keep the initializer list(s) up-to-date, because it's right there with the declaration.
C++11 makes it quite easy to avoid this pitfall by allowing in-class initialization of non-static data members. For example:
struct foo
{
foo(int i) : i(i) {} // here i will be set to argument value
foo() {} // here i will be zero
int i = {}; // value (zero) initialize i
};
This feature is not restricted to trivial types either. So just start initializing data members as soon you declare them within the class definition.
If your compiler doesn't support this feature, try cranking the warning level up to see if it'll tell you about uninitialized data members. For instance, g++ has the -Weffc++ flag that will warn you about uninitialized data members (amongst other things).
Next thing to try would be a static analysis tool to catch these mistakes.
In conclusion, there are several things I'd try before going down the path of boxing every trivial data type.
It's a lot easier to just turn on compiler warnings. Most decent compilers will warn you about using uninitialized variables - granted there are a few edge cases that compilers can miss.
Try better debugging.
You can switch on compiler warnings for uninitialized variables (see this SO question).
You can also use programs that do this and other checking of your code (static analysis), like cppcheck, which lists uninitialised variables.
Also try changing how you code. In C++ you have control over when memory is allocated, what constructors are used, and so on. If you are coding in a style where you construct objects with partial data, and then later populate other pieces, then you are likely to run into uninitialised variables a lot. But if you make sure that all the constructors construct a valid object in a valid state, and avoid having multiple points of truth (see Single Point Of Truth principle), then your errors are more likely to be caught by the compiler - you would have to pass an uninitialised variable as a value (which VC++ will warn you on), or have the wrong number or type of things in your constructor call (compile error), etc.
Might I suggest you pick out your most recent source of this kind of thing, complete with the chain of structures that got you there, and ask how you might restructure it better? There is a particularly disciplined style of coding in C++ which makes maximum use of the compiler and thus tips you off as early as possible. Really the bugs you create when using that style shouldn't be anything less than multi-threading issues, resource issues etc.
I worry that if you initialise everything just to prevent such errors, you'll miss out on learning that disciplined style which extends so much further than uninitialised variables.
Following may help:
template <typename T>
class My
{
public:
constexpr My() : t() {}
constexpr My(const T&t) : t(t) {}
operator T& () { return t; }
constexpr operator const T& () const { return t; }
const T* operator& () const { return &t; }
T* operator& () { return &t; }
private:
T t;
};
Note sure if it is better to check that My<int> is used in place of each possible uninitialized int...
But note that you have to do special job for union anyway.
Writing code like
struct S
{
this() // compile-time error
{
}
}
gives me an error message saying
default constructor for structs only allowed with #disable and no body.
Why??
This is one of cases much more tricky than one can initially expect.
One of important and useful features D has over C++ is that every single type (including all user types) has some initial non-garbage value that can be evaluated at compile-time. It is used as T.init and has two important use cases:
Template constraints can use T.init value to check if certain operations can be done on given type (quoting Kenji Hara's snippet):
template isSomething(T) {
enum isSomething = is(typeof({
//T t1; // not good if T is nested struct, or has #disable this()
//T t2 = void; auto x = t2; // not good if T is non-mutable type
T t = T.init; // avoid default construct check
...use t...
}));
}
Your variables are always initialized properly unless you explicitly use int i = void syntax. No garbage possible.
Given that, difficult question arises. Should we guarantee that T() and T.init are the same (as many programmers coming from C++ will expect) or allow default construction that may easily destroy that guarantee. As far as I know, decision was made that first approach is safer, despite being surprising.
However, discussions keep popping with various improvements proposed (for example, allowing CTFE-able default constructor). One such thread has appeared very recently.
It stems from the fact that all types in D must have a default value. There are quite a few places where a type's init value gets used, including stuff like default-initializing member variables and default-initializing every value in an array when it's allocated, and init needs to be known at compile for a number of those cases. Having init provides quite a few benefits, but it does get in the way of having a default constructor.
A true default constructor would need to be used in all of the places that init is used (or it wouldn't be the default), but allowing arbitrary code to run in a number of the cases that init is used would be problematic at best. At minimum, you'd probably be forced to make it CTFE-able and possibly pure. And as soon as you start putting restrictions like that on it, pretty soon, you might as well just directly initialize all of the member variables to what you want (which is what happens with init), as you wouldn't be gaining much (if anything) over that, which would make having default constructors pretty useless.
It might be possible to have both init and a default constructor, but then the question comes up as to when one is used over the other, and the default constructor wouldn't really be the default anymore. Not to mention, it could become very confusing to developers as to when the init value was used and when the default constructor was used.
Now, we do have the ability to #disable the init value of a struct (which causes its own set of problems), in which case, it would be illegal to use that struct in any situation that required init. So, it might be possible to then have a default constructor which could run arbitrary code at runtime, but what the exact consequences of that would be, I don't know. However, I'm sure that there are cases where people would want to have a default constructor that would require init and therefore wouldn't work, because it had been #disabled (things like declaring arrays of the type would probably be one of them).
So, as you can see, by doing what D has done with init, it's made the whole question of default constructors much more complicated and problematic than it is in other languages.
The normal way to get something akin to default construction is to use a static opCall. Something like
struct S
{
static S opCall()
{
//Create S with the values that you want and return it.
}
}
Then whenever you use S() - e.g.
auto s = S();
then the static opCall gets called, and you get a value that was created at runtime. However, S.init will still be used any place that it was before (including S s;), and the static opCall will only be used when S() is used explicitly. But if you couple that with #disable this() (which disables the init property), then you get something akin to what I described earlier where we might have default constructors with an #disabled init.
We may or may not end up with default constructors being added to the language eventually, but there are a number of technical problems with adding them due to how init and the language work, and Walter Bright doesn't think that they should be added. So, for default constructors to be added to the language, someone would have to come up with a really compelling design which appropriately resolves all of the issues (including convincing Walter), and I don't expect that to happen, but we'll see.
The standard defines that Unions cannot be used as Base class, but is there any specific reasoning for this? As far as I understand Unions can have constructors, destructors, also member variables, and methods to operate on those varibales. In short a Union can encapsulate a datatype and state which might be accessed through member functions. Thus it in most common terms qualifies for being a class and if it can act as a class then why is it restricted from acting as a base class?
Edit: Though the answers try to explain the reasoning I still do not understand how Union as a Derived class is worst than when Union as just a class. So in hope of getting more concrete answer and reasoning I will push this one for a bounty. No offence to the already posted answers, Thanks for those!
Tony Park gave an answer which is pretty close to the truth. The C++ committee basically didn't think it was worth the effort to make unions a strong part of C++, similarly to the treatment of arrays as legacy stuff we had to inherit from C but didn't really want.
Unions have problems: if we allow non-POD types in unions, how do they get constructed? It can certainly be done, but not necessarily safely, and any consideration would require committee resources. And the final result would be less than satisfactory, because what is really required in a sane language is discriminated unions, and bare C unions could never be elevated to discriminated unions in way compatible with C (that I can imagine, anyhow).
To elaborate on the technical issues: since you can wrap a POD-component only union in a struct without losing anything, there's no advantage allowing unions as bases. With POD-only union components, there's no problem with explicit constructors simply assigning one of the components, nor with using a bitblit (memcpy) for compiler generated copy constructor (or assignment).
Such unions, however, aren't useful enough to bother with except to retain them so existing C code can be considered valid C++. These POD-only unions are broken in C++ because they fail to retain a vital invariant they possess in C: any data type can be used as a component type.
To make unions useful, we must allow constructable types as members. This is significant because it is not acceptable to merely assign a component in a constructor body, either of the union itself, or any enclosing struct: you cannot, for example, assign a string to an uninitialised string component.
It follows one must invent some rules for initialising union component with mem-initialisers, for example:
union X { string a; string b; X(string q) : a(q) {} };
But now the question is: what is the rule? Normally the rule is you must initialise every member and base of a class, if you do not do so explicitly, the default constructor is used for the remainder, and if one type which is not explicitly initialised does not have a default constructor, it's an error [Exception: copy constructors, the default is the member copy constructor].
Clearly this rule can't work for unions: the rule has to be instead: if the union has at least one non-POD member, you must explicitly initialise exactly one member in a constructor. In this case, no default constructor, copy constructor, assignment operator, or destructor will be generated and if any of these members are actually used, they must be explicitly supplied.
So now the question becomes: how would you write, say, a copy constructor? It is, of course quite possible to do and get right if you design your union the way, say, X-Windows event unions are designed: with the discriminant tag in each component, but you will have to use placement operator new to do it, and you will have to break the rule I wrote above which appeared at first glance to be correct!
What about default constructor? If you don't have one of those, you can't declare an uninitialised variable.
There are other cases where you can determine the component externally and use placement new to manage a union externally, but that isn't a copy constructor. The fact is, if you have N components you'd need N constructors, and C++ has a broken idea that constructors use the class name, which leaves you rather short of names and forces you to use phantom types to allow overloading to choose the right constructor .. and you can't do that for the copy constructor since its signature is fixed.
Ok, so are there alternatives? Probably, yes, but they're not so easy to dream up, and harder to convince over 100 people that it's worthwhile to think about in a three day meeting crammed with other issues.
It is a pity the committee did not implement the rule above: unions are mandatory for aligning arbitrary data and external management of the components is not really that hard to do manually, and trivial and completely safe when the code is generated by a suitable algorithm, in other words, the rule is mandatory if you want to use C++ as a compiler target language and still generate readable, portable code. Such unions with constructable members have many uses but the most important one is to represent the stack frame of a function containing nested blocks: each block has local data in a struct, and each struct is a union component, there is no need for any constructors or such, the compiler will just use placement new. The union provides alignment and size, and cast free component access.
[And there is no other conforming way to get the right alignment!]
Therefore the answer to your question is: you're asking the wrong question. There's no advantage to POD-only unions being bases, and they certainly can't be derived classes because then they wouldn't be PODs. To make them useful, some time is required to understand why one should follow the principle used everywhere else in C++: missing bits aren't an error unless you try to use them.
Union is a type that can be used as any one of its members depending on which member has been set - only that member can be later read.
When you derive from a type the derived type inherits the base type - the derived type can be used wherever the base type could be. If you could derive from a union the derived class could be used (not implicitly, but explicitly through naming the member) wherever any of the union members could be used, but among those members only one member could be legally accessed. The problem is the data on which member has been set is not stored in the union.
To avoid this subtle yet dangerous contradiction that in fact subverts a type system deriving from a union is not allowed.
Bjarne Stroustrup said 'there seems little reason for it' in The Annotated C++ Reference Manual.
The title asks why unions can't be a base class, but the question appears to be about unions as a derived class. So, which is it?
There's no technical reason why unions can't be a base class; it's just not allowed. A reasonable interpretation would be to think of the union as a struct whose members happen to potentially overlap in memory, and consider the derived class as a class that inherits from this (rather odd) struct. If you need that functionality, you can usually persuade most compilers to accept an anonymous union as a member of a struct. Here's an example, that's suitable for use as a base class. (And there's an anonymous struct in the union for good measure.)
struct V3 {
union {
struct {
float x,y,z;
};
float f[3];
};
};
The rationale for unions as a derived class is probably simpler: the result wouldn't be a union. Unions would have to be the union of all their members, and all of their bases. That's fair enough, and might open up some interesting template possibilities, but you'd have a number of limitations (all bases and members would have to be POD -- and would you be able to inherit twice, because a derived type is inherently non-POD?), this type of inheritance would be different from the other type the language sports (OK, not that this has stopped C++ before) and it's sort of redundant anyway -- the existing union functionality would do just as well.
Stroustrup says this in the D&E book:
As with void *, programmers should know that unions ... are inherently dangerous, should be avoided wherever possible, and should be handled with special care when actually needed.
(The elision doesn't change the meaning.)
So I imagine the decision is arbitrary, and he just saw no reason to change the union functionality (it works fine as-is with the C subset of C++), and so didn't design any integration with the new C++ features. And when the wind changed, it got stuck that way.
I think you got the answer yourself in your comments on EJP's answer.
I think unions are only included in C++ at all in order to be backwards compatible with C. I guess unions seemed like a good idea in 1970, on systems with tiny memory spaces. By the time C++ came along I imagine unions were already looking less useful.
Given that unions are pretty dangerous anyway, and not terribly useful, the vast new opportunities for creating bugs that inheriting from unions would create probably just didn't seem like a good idea :-)
Here's my guess for C++ 03.
As per $9.5/1, In C++ 03, Unions can not have virtual functions. The whole point of a meaningful derivation is to be able to override behaviors in the derived class. If a union cannot have virtual functions, that means that there is no point in deriving from a union.
Hence the rule.
You can inherit the data layout of a union using the anonymous union feature from C++11.
#include <cstddef>
template <size_t N,typename T>
struct VecData {
union {
struct {
float x;
float y;
float z;
};
float a[N];
};
};
template <size_t N, typename T>
class Vec : public VecData<N,T> {
//methods..
};
In general its almost always better not work with unions directly but enclose them within a struct or class. Then you can base your inheritance off the struct outer layer and use unions within if you need to.