I have a class that represents a data stream, it basically
reads or writes into a file, but first the data are being encrypted/decrypted and there is also an underlying codec object that handles the media being accessed.
I'm trying to write this class in a RAII way and I'd like a clean, nice, usable design.
What bothers me is that right now there is a lot of work being done in the constructor.
Before the object's I/O routines can be safely used, first of all the codec needs to initialized (this isn't very demanding), but then a key is taken into account and crypto and other things are intialized - these require some analysis of the media which takes quite a lot of computation.
Right now I'm doing all this in the constructor, which makes it take a long time. I'm thinking of moving the crypto init stuff (most work) out of the ctor into a separate method (say, Stream::auth(key)), but then again, this would move some responsibility to the user of the class, as they'd be required to run auth() before they call any I/O ops. This also means I'd have to place a check in the I/O calls to verify that auth() had been called.
What do you think is a good design?
P.S. I did read similar question but I wasn't really able to apply the answers on this case. They're mostly like "It depens"... :-/
Thanks
The only truly golden unbreakable rule is that the class must be in a valid, consistent, state after the constructor has executed.
You can choose to design the class so that it is in some kind of "empty"/"inactive" state after the constructor has run, or you can put it directly into the "active" state that it is intended to be in.
Generally, it should be preferred to have the constructor construct your class. Usually, you wouldn't consider a class fully "constructed", until it's actually ready to be used, but exceptions do exist.
However, keep in mind that in RAII, one of the key ideas is that the class shouldn't exist unless it is ready, initalized and usable. That's why its destructor does the cleanup, and that's why its constructor should do the setup.
Again, exceptions do exist (for example, some RAII objects allow you to release the resource and perform cleanup early, and then have the destructor do nothing.)
So at the end of the day, it depends, and you'll have to use your own judgment.
Think of it in terms of invariants. What can I rely on if I'm given an instance of your class? The more I can safely assume about it, the easier it is to use. If it might be ready to use, and might be in some "constructed, but not initialized" state, and might be in a "cleaned up but not destroyed" state, then using it quickly becomes painful.
On the other hand, if it guarantees that "if the object exists, it can be used as-is", then I'll know that I can use it without worrying about what was done to it before.
It sounds like your problem is that you're doing too much in the constructor.
What if you split the work up into multiple smaller classes? Have the codec be initialized separately, then I can simply pass the already-initialized codec to your constructor. And all the authentication and cryptography stuff and whatnot could possibly be moved out into separate objects as well, and then simply passed to "this" constructor once they're ready.
Then the remaining constructor doesn't have to do everything from scratch, but can start from a handful of helper objects which are already initialized and ready to be used, so it just has to connect the dots.
you could just place the check in the IO calls to see if auth has been called, and if it has, then continue, if not, then call it.
this removes the burden from the user, and delays the expense until needed.
Basically, this all boils down to which design to choose from the following three:
Designs
Disclaimer: this post is not encouraging the use of exception specifications or exceptions for that matter. The errors may equivalently be reported using error codes if you wish. Exception specifications as used here are just meant to illustrate when different errors can occur using a concise syntax.
Design 1
This is the most recurring design out there, and totally non-RAII. The constructor just puts the object in some stale state and each instance must be initialized manually after construction takes place.
class SecureStream
{
public:
SecureStream();
void initialize(Stream&,const Key&) throw(InvalidKey,AlreadyInitialized);
std::size_t get( void*,std::size_t) throw(NotInitialized,IOError);
std::size_t put(const void*,std::size_t) throw(NotInitialized,IOError);
};
Pros:
Users have control over when to invoke the "heavy" initialization process
The object can be created before the key exists. This is important for frameworks such as COM, where all objects must have a default constructor (the CoCreateObject() does not allow you to forward extra arguments the object constructor). Sometimes, there are still workarounds, such as a builder object.
Cons:
Objects must be checked for the stale state before using the object. This may be enforced by the object by returning an error code or throwing an exception. Personally, I hate objects that allow me to use them and just appear to ignore my calls (e.g. a failed std::ostream).
Design 2
This is the RAII approch. Make sure the object is 100% usable with no extra artefacts (e.g. manually calling stream.initialize(...); on each instance.
class SecureStream
{
public:
SecureStream(Stream&,const Key&) throw(InvalidKey);
std::size_t get( void*,std::size_t) throw(IOError);
std::size_t put(const void*,std::size_t) throw(IOError);
};
Pros:
The object can always be assumed to be in a valid state. This is so much simpler to use.
Cons:
Constructor might take a long time to execute.
All required arguments must be available at the instance construction. This has once in a while been a problem for me, especially if most other objects in the code base use design #1.
Design 3
Somewhat of a compromise between the two previous cases. Don't initialize yet, but have the other methods lazily invoke the internal .initialize(...) method when necessary.
class SecureStream
{
public:
SecureStream(Stream&,const Key&);
std::size_t get( void*,std::size_t) throw(InvalidKey,IOError);
std::size_t put(const void*,std::size_t) throw(InvalidKey,IOError);
private:
void initialize() throw(InvalidKey);
};
Pros:
Almost as easy to use as design #1. Almost (see below).
Cons:
If the initialization step may fail, it may now fail anywhere there is a first call to any of the public methods. Proper error handling for this scenario is extremely difficult.
Discussion
If you absolutely must pay for the initialization for every instance, then design #1 is out of the question as it just results in more bugs in the software.
The question is just about when to pay for the initialization cost. Do you prefer paying it upfront, or on first use? In most scenarios, I prefer paying upfront because I don't want to assume users can handle errors later in the program. However, there might be specific threading semantics in your program, and you might not be able to stall threads at creation time (or, conversely, at use time).
In any case, you can still get the benefits of design #3 by using dynamic allocation of the class in design #2.
Conclusion
Basically, if the only reason you are hesitating is for some philosophical ideal where constructors execute quickly, I would just go with the pure RAII design.
There's no hard and fast rule on this, but in general it's best to avoid heavy constructors for two reasons that come to mind (maybe others as well):
The order of the objects created intializer list can give rise to subtle bugs
What to do with exceptions in the constructor? Will you need to handle partially-constructed objects in your app?
Related
After many years of coding scientific software in C++, I still can't seem to get used to exceptions and I've no idea when I should use them. I know that using them for controlling program flow is a big no-no, but otherwise than that... consider the following example (excerpt from a class that represents an image mask and lets the user add areas to it as polygons):
class ImageMask
{
public:
ImageMask() {}
ImageMask(const Size2DI &imgSize);
void addPolygon(const PolygonI &polygon);
protected:
Size2DI imgSize_;
std::vector<PolygonI> polygons_;
};
The default constructor for this class creates a useless instance, with an undefined image size. I don't want the user to be able to add polygons to such an object. But I'm not sure how to handle that situation. When the size is undefined, and addPolygon() is called, should I:
Silently return,
assert(imgSize_.valid) to detect violations in code using this class and fix them before a release,
throw an exception?
Most of the time I go either with 1) or 2) (depending on my mood), because it seems to me exceptions are costly, messy and simply overkill for such a simple scenario. Some insight please?
The general rule is that you throw an exception when you cannot perform the desired operation. So in your case, yes, it does make sense to throw an exception when addPolygon is called and the size is undefined or inconsistent.
Silently returning is almost always the wrong thing to do. assert is not a good error-handling technique (it is more of a design/documentation technique).
However, in your case a redesign of the interface to make an error condition impossible or unlikely may be better. For example, something like this:
class ImageMask
{
public:
// Constructor requires collection of polygons and size.
// Neither can be changed after construction.
ImageMask(std::vector<PolygonI>& polygons, size_t size);
}
or like this
class ImageMask
{
public:
class Builder
{
public:
Builder();
void addPolygon();
};
ImageMask(const Builder& builder);
}
// used like this
ImageMask::Builder builder;
builder.addPolygon(polyA);
builder.addPolygon(polyB);
ImageMask mask(builder);
I would try to avoid any situation where it's possible to create data that is in some kind of useless state. If you need a polygon that is not empty, than don't let empty polygons be created and you save yourself much trouble because the compiler will enforce that there are no empty polygons.
I never use silent returns, because they hide bugs and this makes finding bugs much more complicated than it have to be.
I use asserts when I detect that the program is in a state that it only can be in, if there is a bug in the software. In your example, if you check in the c'tor that takes a Size2DI, that this size is not empty, than asserting if the size stored is not empty, is useful to detect bugs. Asserts should not have side effect and it must be possible to remove them, without changing the behavior of the software. I find them very useful, to find my own bugs and to document, the current state of the object / function etc.
If it's very likely, that a runtime error will be handled directly by a caller of a function, I would use conventional return values. If it's very likely, that this error situation have to be communicated over several function calls at the call stack, I prefer exceptions. In doubt I offer two function.
kind regards
Torsten
To me, 1 is a no option. Whether it is 2 or 3 depends on the design of your program/library, whether you consider (and document) default-constructing image mask and then adding polygons a valid or invalid usage of your component. This is an important design decision. I recommend reading this article by Matthew Wilson.
Note that you have more options:
Invent your own assert that always calls std::terminate and does additional logging
Disable the default constructor (as others already pointed out) -- this is my favourite
"Silently return" - that's real 'the big no-no'. The program should know what's wrong.
"assert" - the second rule is that asserts using only if normal program's flow couldn't be restored.
"throw exception" - yes, this right and good technique. Just take care about exception-safety. There are many articles about exception-safe coding on GotW.
Don't afraid exceptions. They don't bite. :) If you'll take this technique enough, you'll be a strong coder. ;)
This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).
You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).
This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.
I'd look at using boost or STL smart pointers.
I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)
The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).
I think that this could be a scenario where the Singleton pattern could be applied.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How much work should be done in a constructor?
I'm strugging with some advice I have in the back of my mind but for which I can't remember the reasoning.
I seem to remember at some point reading some advice (can't remember the source) that C++ constructors should not do real work. Rather, they should initialize variables only. The advice went on to explain that real work should be done in some sort of init() method, to be called separately after the instance was created.
The situation is I have a class that represents a hardware device. It makes logical sense to me for the constructor to call the routines that query the device in order to build up the instance variables that describe the device. In other words, once new instantiates the object, the developer receives an object which is ready to be used, no separate call to object->init() required.
Is there a good reason why constructors shouldn't do real work? Obviously it could slow allocation time, but that wouldn't be any different if calling a separate method immediately after allocation.
Just trying to figure out what gotchas I not currently considering that might have lead to such advice.
I remember that Scott Meyers in More Effective C++ recommends against having a superfluous default constructor. In that article, he also touched on using methods liked Init() to 'create' the objects. Basically, you have introduced an extra step which places the responsibility on the client of the class. Also, if you want to create an array of said objects, each of them would have to manually call Init(). You can have an Init function which the constructor can call inside for keeping the code tidy, or for the object to call if you implement a Reset(), but from experiences it is better to delete an object and recreate it rather than try to reset its values to default, unless the objects is created and destroyed many times real-time (say, particle effects).
Also, note that constructors can perform initialization lists which normal functions could not.
One reasons why one may caution against using constructors to do heavy allocation of resources is because it can be hard to catch exceptions in constructors. However, there are ways around it. Otherwise, I think constructors are meant to do what they are supposed to do - prepare an object for its initial state of execution (important for object creation is resource allocation).
The one reason not to do "work" in the constructor is that if an exception is thrown from there, the class destructor won't get called. But if you use RAII principles and don't rely on your destructor to do clean up work, then I feel it's better not to introduce a method which isn't required.
Depends on what you mean by real work. The constructor should put the object into a usable state, even if that state is a flag meaning it hasn't yet been initialised :-)
The only rationale I've ever come across for not doing real work would be the fact that the only way a constructor can fail is with an exception (and the destructor won't be called in that case). There is no opportunity to return a nice error code.
The question you have to ask yourself is:
Is the object usable without calling the init method?
If the answer to that is "No", I would be doing all that work in the constructor. Otherwise you'll have to catch the situation when a user has instantiated but not yet initialised and return some sort of error.
Of course, if you can re-initialise the device, you should provide some sort of init method but, in that case, I would still call that method from the constructor if the condition above is met.
In addition to the other suggestions regarding exception handling, one thing to consider when connecting to a hardware device is how your class will handle the situation where a device is not present or communication fails.
In the situation where you can't communicate with the device, you may need to provide some methods on your class to perform later initialization anyway. In that case, it may make more sense to just instantiate the object and then run through an initialization call. If the initialization fails, you can just keep the object around and try to initialize communication again at a later time. Or you may need to handle the situation where communication is lost after initialization. In either case, you will probably want to think about how you will design the class to handle communication problems in general and that may help you decide what you want to do in the constructor versus an initialization method.
When I've implemented classes that communicate with external hardware, I've found it easier to instantiate a "disconnected" object and provide methods for connecting and setting up initial status. This has generally provide more flexibility connecting/disconnecting/reconnecting with the device.
The only real reason is Testability. If your constructors are full of "real work", that usually means the objects can only be instantiated within a fully initialized, running application. It's a sign the object/class needs further decomposition.
When using a constructor and an Init() method you have a source of error. In my experience you will encounter situation where someone forgets to call it, and you might have a subtle bug in your hands. I would say you shouldn't do much work in your constructor but if any init method is needed, then you have a non-trivial construction scenario, and it is about time to look at the creational patterns. A builder function or a factory be wise to have a look at. With a private constructor making sure that no one except your factory or builder function actually build the objects, so you can be sure that it is always constructed correctly.
If your design allow for mistakes in implementation, someone will do those mistakes. My friend Murphy told me that ;)
In my field we work with loads of similar hardware related situations. Factories gives us both testability, security and better ways of failing construction.
It is worth considering lifetime issues and connecting/reconnecting, as Neal S. points out.
If you fail to connect to a device at the other end of a link then it is often the case that the 'device' at your end is usable and will be later if the other end gets its act together. Examples being network connections etc.
On the other hand if you try and access some local hardware device that does not exist and will never exist within the scope of your program (for example a graphics card that is not present) then I think this is a case where you want to know this in the constructor, so that the constructor can throw and the object can not exist. If you don't then you may end up with an object that is invalid and will always be so. Throwing in the constructor means that the object will not exist, and thus functions can't be called on that object. Obviously you need to be aware of cleanup issues if you throw in a constructor, but if you don't in cases like this then you typically end up with validation checks in all functions that may be called.
So I think that you should do enough in the constructor to ensure you have a valid, usable, object created.
I'd like to add my own experience there.
I won't say much about the traditional debate Constructor/Init... for example Google guidelines advise against anything the in the Constructor but that's because they advise against Exceptions and the 2 work together.
I can speak about a Connection class I use though.
When the Connection class is created, it will attempt to actually connect itself (at least, if not default constructed). If the Connection fails... the object is still constructed and you don't know about it.
When you try to use the Connection class you are thus in one of 3 cases:
no parameter has ever been precised > exception or error code
the object is actually connected > fine
the object is not connected, it will attempt to connect > this succeeds, fine, this fails, you get an exception or an error code
I think it's quite useful to have both. However, it means that in every single method actually using the connection, you need to test whether or not it works.
It's worth it though because of disconnection events. When you are connected, you may lose the connection without the object knowing about it. By encapsulating the connection self-check into a reconnect method that is called internally by all methods needing a working connection, you really isolate the developers from dealing with the issues... or at least as much as you can since when everything fails you have no other solution that letting them know :)
Doing "real work" in a constructor is best avoided.
If I setup database connections, open files etc inside a constructor and if in doing so one of them raise an exception then it would lead to a memory leak. This will compromise your application's exception safety.
Another reason to avoid doing work in a constructor is that it would make your application less testable. Suppose you are writing a credit-card payment processor. If say in CreditCardProcessor class's constructor you do all the work of connecting to a payment gateway, authentate and bill the credit card how do I ever write unit tests for CreditCardProcessor class?
Coming to your scenario, if the routines that query the device do not raise any exceptions and you are not going to test the class in isolation then there is its probably preferable to do work in the constructor and avoid calls to that extra init method.
There are a couple reasons I would use separate constructor/init():
Lazy/Delayed initialization. This allows you to create the object quickly, fast user response, and delay a more lengthy initialization for later or background processing. This is also a part of one or more design patterns concerning reusable object pools to avoid expensive allocation.
Not sure if this has a proper name, but perhaps when the object is created, the initialization information is unavailable or not understood by whoever is creating the object (for example, bulk generic object creation). Another section of code has the know-how to initialize it, but not create it.
As a personal reason, the destructor should be able to undo everything the constructor did. If that involves using internal init/deinit(), no problem, so long as they are mirror images of each other.
We're trying to build a class that provides the MFC CRecordset (or, really, the CODBCRecordset class) thread safety. Everything actually seem to work out pretty well for the various functions like opening and moving through the recordset (we enclose these calls with critical sections), however, one problem remains, a problem that seems to introduce deadlocks in practice.
The problem seems to lie in our constructor, like this:
CThreadSafeRecordset::CThreadSafeRecordset(void) : CODBCRecordset(g_db)
{ // <-- Deadlock!
}
The other thread might be one having ended up in CThreadSafeRecordset::Close() despite us guarding the enclosed Close call, but that doesn't really matter since the constructor is threading unaware. I assume the original CRecordset class is the culprit, doing bad things at construction time. I've looked around for programming techniques to work around this problem, but I'm unsure what could be the best solution? Since we have no code and can't control other code in our constructor, we can't wrap anything special in a critical section...?
Update: Thanks for the input; I've marked what I ended up with as the answer to my question. That, in combination with returning a shared_ptr as the returned instance for ease of updating the existing thread-unaware code.
You can make the CThreadSafeRecordset constructor private, then provide a public factory method that participates in your locking and returns an instance.
If there's no way to make CODBCRecordset move its thread-unsafe operations out of the constructor (a default constructor followed by an Initialize() call, say), you can always use composition instead of inheritance. Let CThreadSafeRecordset be a wrapper around CODBCRecordset instead of a subclass of it. That way, you can explicitly construct your recordset whenever you like, and can defend it with whatever rigor is appropriate.
The drawback, of course, is that you'll have to wrap every CODBCRecordset method you wish to expose, even the ones that don't relate to your threading guarantees. A C macro in the cpp file (so that it can't escape and afflict your clients) may help.
I have read multiple articles about why singletons are bad.
I know it has few uses like logging but what about initalizing and deinitializing.
Are there any problems doing that?
I have a scripting engine that I need to bind on startup to a library.
Libraries don't have main() so what should I use?
Regular functions or a Singleton.
Can this object be copied somehow:
class
{
public:
static void initialize();
static void deinitialize();
} bootstrap;
If not why do people hide the copy ctor, assignment operator and the ctor?
Libraries in C++ have a much simpler way to perform initialization and cleanup. It's the exact same way you'd do it for anything else. RAII.
Wrap everything that needs to be initialized in a class, and perform its initialization in the constructor. Voila, problems solved.
All the usual problems with singletons still apply:
You are going to need more than one instance, even if you hadn't planned for it. If nothing else, you'll want it when unit-testing. Each test should initialize the library from scratch so that it runs in a clean environment. That's hard to do with a singleton approach.
You're screwed as soon as these singletons start referencing each others. Because the actual initialization order isn't visible, you quickly end up with a bunch of circular references resulting in accessing uninitialized singletons or stack overflows or deadlocks or other fun errors which could have been caught at compile-time if you hadn't been obsessed with making everything global.
Multithreading. It's usually a bad idea to force all threads to share the same instance of a class, becaus it forces that class to lock and synchronize everything, which costs a lot of performance, and may lead to deadlocks.
Spaghetti code. You're hiding your code's dependencies every time you use a singleton or a global. It is no longer clear which objects a function depends on, because not all of them are visible as parameters. And because you don't need to add them as parameters, you easily end up adding far more dependencies than necessary. Which is why singletons are almost impossible to remove once you have them.
A singleton's purpose is to have only ONE instance of a certain class in your system.
The C'tor, D'tor and CC'tor are hidden, in order to have a single access point for receiving the only existing instance.
Usually the instance is static (could be allocated on the heap too) and private, and there's a static method (usually called GetInstance) which returns a reference to this instance.
The question you should ask yourself when deciding whether to have a singleton is : Do I really need to enforce having one object of this class?
There's also the inheritance problem - it can make things complicated if you are planning to inherit from a singleton.
Another problem is How to kill a singleton (the web is filled with articles about this issue)
In some cases it's better to have your private data held statically rather than having a singleton, all depends on the domain.
Note though, that if you're multi-threaded, static variables can give you a pain in the XXX...
So you should analyse your problem carefully before deciding on the design pattern you're going to use...
In your case, I don't think you need a singleton because you want the libraries to be initialized at the beginning, but it has nothing to do with enforcing having only one instance of your class. You could just hold a static flag (static bool Initialized) if all you want is to ensure initializing it only once.
Calling a method once is not reason enough to have a singleton.
It's a good practice to provide an interface for your libraries so that multiple modules (or threads) can use them simultaneously. If you really need to run some code when modules are loaded then use singletons to init parts that must be init once.
count the number of singletons in your design, and call this number 's'
count the number of threads in your design, and call this number 't'
now, raise t to the s-th power; this is roughly the number of hairs you are likely to lose while debugging the resulting code.
(I personally have run afoul of code that has over 50 singletons with 10 different threads all racing to get to .getInstance() first)