Long delegation chains in C++

Long delegation chains in C++ - c++

This is definitely subjective, but I'd like to try to avoid it
becoming argumentative. I think it could be an interesting question if
people treat it appropriately.
In my several recent projects I used to implement architectures where long delegation chains are a common thing.
Dual delegation chains can be encountered very often:
bool Exists = Env->FileSystem->FileExists( "foo.txt" );
And triple delegation is not rare at all:
Env->Renderer->GetCanvas()->TextStr( ... );
Delegation chains of higher order exist but are really scarce.
In above mentioned examples no NULL run-time checks are performed since the objects used are always there and are vital to the functioning of the program and
explicitly constructed when execution starts. Basically I used to split a delegation chain in these cases:
1) I reuse the object obtained through a delegation chain:
{ // make C invisible to the parent scope
clCanvas* C = Env->Renderer->GetCanvas();
C->TextStr( ... );
C->TextStr( ... );
C->TextStr( ... );
}
2) An intermediate object somewhere in the middle of the delegation chain should be checked for NULL before usage. Eg.
clCanvas* C = Env->Renderer->GetCanvas();
if ( C ) C->TextStr( ... );
I used to fight the case (2) by providing proxy objects so that a method can be invoked on non-NULL object leading to an empty result.
My questions are:
Is either of cases (1) or (2) a pattern or an antipattern?
Is there a better way to deal with long delegation chains in C++?
Here are some pros and cons I considered while making my choice:
Pros:
it is very descriptive: it is clear out of 1 line of code where did the object came from
long delegation chains look nice
Cons:
interactive debugging is labored since it is hard to inspect more than one temporary object in the delegation chain
I would like to know other pros and cons of the long delegation chains. Please, present your reasoning and vote based on how well-argued opinion is and not how well you agree with it.

I wouldn't go so far to call either an anti-pattern. However, the first has the disadvantage that your variable C is visible even after it's logically relevant (too gratuitous scoping).
You can get around this by using this syntax:
if (clCanvas* C = Env->Renderer->GetCanvas()) {
C->TextStr( ... );
/* some more things with C */
}
This is allowed in C++ (while it's not in C) and allows you to keep proper scope (C is scoped as if it were inside the conditional's block) and check for NULL.
Asserting that something is not NULL is by all means better than getting killed by a SegFault. So I wouldn't recommend simply skipping these checks, unless you're a 100% sure that that pointer can never ever be NULL.
Additionally, you could encapsulate your checks in an extra free function, if you feel particularly dandy:
template <typename T>
T notNULL(T value) {
assert(value);
return value;
}
// e.g.
notNULL(notNULL(Env)->Renderer->GetCanvas())->TextStr();

In my experience, chains like that often contains getters that are less than trivial, leading to inefficiencies. I think that (1) is a reasonable approach. Using proxy objects seems like an overkill. I would rather see a crash on a NULL pointer rather than using a proxy objects.

Such long chain of delegation should not happens if you follow the Law of Demeter. I've often argued with some of its proponents that they where holding themselves to it too conscientiously, but if you come to the point to wonder how best to handle long delegation chains, you should probably be a little more compliant with its recommendations.

Interesting question, I think this is open to interpretation, but:
My Two Cents
Design patterns are just reusable solutions to common problems which are generic enough to be widely applied in object oriented (usually) programming. Many common patterns will start you out with interfaces, inheritance chains, and/or containment relationships that will result in you using chaining to call things to some extent. The patterns are not trying to solve a programming issue like this though - chaining is just a side effect of them solving the functional problems at hand. So, I wouldn't really consider it a pattern.
Equally, anti-patterns are approaches that (in my mind) counter-act the purpose of design patterns. For example, design patterns are all about structure and the adaptability of your code. People consider a singleton an anti-pattern because it (often, not always) results in spider-web like code due to the fact that it inherently creates a global, and when you have many, your design deteriorates fast.
So, again, your chaining problem doesn't necessarily indicate good or bad design - it's not related to the functional objectives of patterns or the drawbacks of anti-patterns. Some designs just have a lot of nested objects even when designed well.
What to do about it:
Long delegation chains can definitely be a pain in the butt after a while, and as long as your design dictates that the pointers in those chains won't be reassigned, I think saving a temporary pointer to the point in the chain you're interested in is completely fine (function scope or less preferably).
Personally though, I'm against saving a permanent pointer to a part of the chain as a class member as I've seen that end up in people having 30 pointers to sub objects permanently stored, and you lose all conception of how the objects are laid out in the pattern or architecture you're working with.
One other thought - I'm not sure if I like this or not, but I've seen some people create a private (for your sanity) function that navigates the chain so you can recall that and not deal with issues about whether or not your pointer changes under the covers, or whether or not you have nulls. It can be nice to wrap all that logic up once, put a nice comment at the top of the function stating which part of the chain it gets the pointer from, and then just use the function result directly in your code instead of using your delegation chain each time.
Performance
My last note would be that this wrap-in-function approach as well as your delegation chain approach both suffer from performance drawbacks. Saving a temporary pointer lets you avoid the extra two dereferences potentially many times if you're using these objects in a loop. Equally, storing the pointer from the function call will avoid the over head of an extra function call every loop cycle.

For bool Exists = Env->FileSystem->FileExists( "foo.txt" ); I'd rather go for an even more detailed breakdown of your chain, so in my ideal world, there are the following lines of code:
Environment* env = GetEnv();
FileSystem* fs = env->FileSystem;
bool exists = fs->FileExists( "foo.txt" );
and why? Some reasons:
readability: my attention gets lost till I have to read to the end of the line in case of bool Exists = Env->FileSystem->FileExists( "foo.txt" ); It's just too long for me.
validity: regardles that you mentioned the objects are, if your company tomorrow hires a new programmer and he starts writing code, the day after tomorrow the objects might not be there. These long lines are pretty unfriendly, new people might get scared of them and will do something interesting such as optimising them... which will take more experienced programmer extra time to fix.
debugging: if by any chance (and after you have hired the new programmer) the application throws a segmentation fault in the long list of chain it is pretty difficult to find out which object was the guilty one. The more detailed the breakdown the more easier to find the location of the bug.
speed: if you need to do lots of calls for getting the same chain elements, it might be faster to "pull out" a local variable from the chain instead of calling a "proper" getter function for it. I don't know if your code is production or not, but it seems to miss the "proper" getter function, instead it seems to use only the attribute.

Long delegation chains are a bit of a design smell to me.
What a delegation chain tells me is that one piece of code has deep access to an unrelated piece of code, which makes me think of high coupling, which goes against the SOLID design principles.
The main problem I have with this is maintainability. If you're reaching two levels deep, that is two independent pieces of code that could evolve on their own and break under you. This quickly compounds when you have functions inside the chain, because they can contain chains of their own - for example, Renderer->GetCanvas() could be choosing the canvas based on information from another hierarchy of objects and it is difficult to enforce a code path that does not end up reaching deep into objects over the life time of the code base.
The better way would be to create an architecture that obeyed the SOLID principles and used techniques like Dependency Injection and Inversion Of Control to guarantee your objects always have access to what they need to perform their duties. Such an approach also lends itself well to automated and unit testing.
Just my 2 cents.

If it is possible I would use references instead of pointers. So delegates are guaranteed to return valid objects or throw exception.
clCanvas & C = Env.Renderer().GetCanvas();
For objects which can not exist i will provide additional methods such as has, is, etc.
if ( Env.HasRenderer() ) clCanvas* C = Env.Renderer().GetCanvas();

If you can guarantee that all the objects exist, I don't really see a problem in what you're doing. As others have mentioned, even if you think that NULL will never happen, it may just happen anyway.
This being said, I see that you use bare pointers everywhere. What I would suggest is that you start using smart pointers instead. When you use the -> operator, a smart pointer will usually throw if the pointer is NULL. So you avoid a SegFault. Not only that, if you use smart pointers, you can keep copies and the objects don't just disappear under your feet. You have to explicitly reset each smart pointer before the pointer goes to NULL.
This being said, it wouldn't prevent the -> operator from throwing once in a while.
Otherwise I would rather use the approach proposed by AProgrammer. If object A needs a pointer to object C pointed by object B, then the work that object A is doing is probably something that object B should actually be doing. So A can guarantee that it has a pointer to B at all time (because it holds a shared pointer to B and thus it cannot go NULL) and thus it can always call a function on B to do action Z on object C. In function Z, B knows whether it always has a pointer to C or not. That's part of its B's implementation.
Note that with C++11 you have std::smart_ptr<>, so use it!

Related

C++ std features and Binary size

I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).
After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this. Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).

This question probably deserves more attention than it’s likely to get, especially for people trying to pursue a career in embedded systems. So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.
This is also why you can’t find a definitive answer from a good old fashioned google search. Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”
So let me approach this from a slightly different angle. I’ve both worked for, and interviewed at companies that enforced these kinds of restrictions, I’ve even voluntarily enforced them myself. It really comes down to this. When you’re running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language. Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they’re doing “bad things”.
How you define a “bad thing” is then also context specific. On a desktop platform, using lots of code space isn’t really a “bad” enough thing to rigorously enforce. On a tiny embedded system, it probably is.
C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly. I think that statement is pretty self-evident, it’s the whole point of meta-programming, and I doubt anyone would challenge it, in fact it’s one of the strengths of the language.
So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don’t want to allow people to use features that make it trivial to generate code that isn’t obvious. Some people will use that feature responsibly and some people won’t, but you have to standardize around the least common denominator. A C compiler is very simple. Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.

(Partially extracted from comments I wrote earlier)
I don't think there is a comprehensive answer. A lot also depends on the specific use case and needs to be judged on a case-by-case basis.
Templates
Templates may result in code bloat, yes, but they can also avoid it. If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.
Another aspect where they can at least not hurt is when used in conjunction with type erasure. The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code. Qt's QList is an example that does this to some extend.
This bare-bones vector type shows what I mean:
class VectorBase
{
protected:
void** start, *end, *capacity;
void push_back(void*);
void* at(std::size_t i);
void clear(void (*cleanup_function)(void*));
};
template<class T>
class Vector: public VectorBase
{
public:
void push_back(T* value)
{ this->VectorBase::push_back(value); }
T* at(std::size_t i)
{ return static_cast<T*>(this->VectorBase::at(i)); }
~Vector()
{ clear(+[](void* object) { delete static_cast<T*>(object); }); }
};
By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.
(Note: This is just meant as a demonstration of type erasure, not an actually good vector type)
Smart pointers
When written carefully, they won't generate much code that wouldn't be there anyway. Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.
The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code. For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code. A programmer knows that the value is NULL, the compiler often doesn't.
Another issue comes up with calling conventions. Objects with destructors are usually passed on the stack, even if you declare them pass-by-value. Same for return values. So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.
Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee. That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site, instead of putting it once inside the function. Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous; but the caller doesn't know this (unless you have LTO).
Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs. Should they be atomic? Should they allow type casting, weak pointers, what indirection is used for destruction? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?
Exceptions, RTTI
Generally avoided and removed via compiler flags.
Library components
On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step. I suggest any such project use continuous integration and tracks the code size as a metric.
For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream. That pulled in the entire iostream library. The resulting code exceeded my entire RAM and ROM capacity. IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up.
Move constructors and destructors
It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location. In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.
Qt for example accounts for such simple cases in its meta type system.

Is there a way to optimize shared_ptr for the case of permanent objects?

I've got some code that is using shared_ptr quite widely as the standard way to refer to a particular type of object (let's call it T) in my app. I've tried to be careful to use make_shared and std::move and const T& where I can for efficiency. Nevertheless, my code spends a great deal of time passing shared_ptrs around (the object I'm wrapping in shared_ptr is the central object of the whole caboodle). The kicker is that pretty often the shared_ptrs are pointing to an object that is used as a marker for "no value"; this object is a global instance of a particular T subclass, and it lives forever since its refcount never goes to zero.
Using a "no value" object is nice because it responds in nice ways to various methods that get sent to these objects, behaving in the way that I want "no value" to behave. However, performance metrics indicate that a huge amount of the time in my code is spent incrementing and decrementing the refcount of that global singleton object, making new shared_ptrs that refer to it and then destroying them. To wit: for a simple test case, the execution time went from 9.33 seconds to 7.35 seconds if I stuck nullptr inside the shared_ptrs to indicate "no value", instead of making them point to the global singleton T "no value" object. That's a hugely important difference; run on much larger problems, this code will soon be used to do multi-day runs on computing clusters. So I really need that speedup. But I'd really like to have my "no value" object, too, so that I don't have to put checks for nullptr all over my code, special-casing that possibility.
So. Is there a way to have my cake and eat it too? In particular, I'm imagining that I might somehow subclass shared_ptr to make a "shared_immortal_ptr" class that I could use with the "no value" object. The subclass would act just like a normal shared_ptr, but it would simply never increment or decrement its refcount, and would skip all related bookkeeping. Is such a thing possible?
I'm also considering making an inline function that would do a get() on my shared_ptrs and would substitute a pointer to the singleton immortal object if get() returned nullptr; if I used that everywhere in my code, and never used * or -> directly on my shared_ptrs, I would be insulated, I suppose.
Or is there another good solution for this situation that hasn't occurred to me?

Galik asked the central question that comes to mind regarding your containment strategy. I'll assume you've considered that and have reason to rely on shared_ptr as a communical containment strategy for which no alternative exists.
I have to suggestions which may seem controversial. What you've defined is that you need a type of shared_ptr that never has a nullptr, but std::shared_ptr doesn't do that, and I checked various versions of the STL to confirm that the customer deleter provided is not an entry point to a solution.
So, consider either making your own smart pointer, or adopting one that you change to suit your needs. The basic idea is to establish a kind of shared_ptr which can be instructed to point it's shadow pointer to a global object it doesn't own.
You have the source to std::shared_ptr. The code is uncomfortable to read. It may be difficult to work with. It is one avenue, but of course you'd copy the source, change the namespace and implement the behavior you desire.
However, one of the first things all of us did in the middle 90's when templates were first introduced to the compilers of the epoch was to begin fashioning containers and smart pointers. Smart pointers are remarkably easy to write. They're harder to design (or were), but then you have a design to model (which you've already used).
You can implement the basic interface of shared_ptr to create a drop in replacement. If you used typedefs well, there should be a limited few places you'd have to change, but at least a search and replace would work reasonably well.
These are the two means I'm suggesting, both ending up with the same feature. Either adopt shared_ptr from the library, or make one from scratch. You'd be surprised how quickly you can fashion a replacement.
If you adopt std::shared_ptr, the main theme would be to understand how shared_ptr determines it should decrement. In most implementations shared_ptr must reference a node, which in my version it calls a control block (_Ref). The node owns the object to be deleted when the reference count reaches zero, but naturally shared_ptr skips that if _Ref is null. However, operators like -> and *, or the get function, don't bother checking _Ref, they just return the shadow, or _Ptr in my version.
Now, _Ptr will be set to nullptr (or 0 in my source) when a reset is called. Reset is called when assigning to another object or pointer, so this works even if using assignment to nullptr. The point is, that for this new type of shared_ptr you need, you could simply change the behavior such that whenever that happens (a reset to nullptr), you set _Ptr, the shadow pointer in shared_ptr, to the "no value global" object's address.
All uses of *,get or -> will return the _Ptr of that no value object, and will correctly behave when used in another assignment, or reset is called again, because those functions don't rely upon the shadow pointer to act upon the node, and since in this special condition that node (or control block) will be nullptr, the rest of shared_ptr would behave as though it was pointing to nullptr correctly - that is, not deleting the global object.
Obviously this sounds crazy to alter std::pointer to such application specific behavior, but frankly that's what performance work tends to make us do; otherwise strange things, like abandoning C++ occasionally in order to obtain the more raw speed of C, or assembler.
Modifying std::shared_ptr source, taken as a copy for this special purpose, is not what I would choose (and, factually, I've faced other versions of your situation, so I have made this choice several times over decades).
To that end, I suggest you build a policy based smart pointer. I find it odd I suggested this earlier on another post today (or yesterday, it's 1:40am).
I refer to Alexandrescu's book from 2001 (I think it was Modern C++...and some words I don't recall). In that he presented loki, which included a policy based smart pointer design, which is still published and freely available on his website.
The idea should have been incorporated into shared_ptr, in my opinion.
Policy based design is implemented as the paradigm of a template class deriving from one or more of it's parameters, like this:
template< typename T, typename B >
class TopClass : public B {};
In this way, you can provide B, from which the object is built. Now, B may have the same construction, it may also be a policy level which derives from it's second parameter (or multiple derivations, however the design works).
Layers can be combined to implement unique behaviors in various categories.
For example:
std::shared_ptr and std::weak_ptrare separate classes which interact as a family with others (the nodes or control blocks) to provide smart pointer service. However, in a design I used several times, these two were built by the same top level template class. The difference between a shared_ptr and a weak_ptr in that design was the attachment policy offered in the second parameter to the template. If the type is instantiated with the weak attachment policy as the second parameter, it's a weak pointer. If it's given a strong attachment policy, it's a smart pointer.
Once you create a policy designed template, you can introduce layers not in the original design (expanding it), or to "intercept" behavior and specialize it like the one you currently require - without corrupting the original code or design.
The smart pointer library I developed had high performance requirements, along with a number of other options including custom memory allocation and automatic locking services to make writing to smart pointers thread safe (which std::shared_ptr doesn't provide). The interface and much of the code is shared, yet several different kinds of smart pointers could be fashioned simply by selecting different policies. To change behavior, a new policy could be inserted without altering the existing code. At present, I use both std::shared_ptr (which I used when it was in boost years ago) and the MetaPtr library I developed years ago, the latter when I need high performance or flexible options, like yours.
If std::shared_ptr had been a policy based design, as loki demonstrates, you'd be able to do this with shared_ptr WITHOUT having to copy the source and move it to a new namespace.
In any event, simply creating a shared pointer which points the shadow pointer to the global object on reset to nullptr, leaving the node pointing to null, provides the behavior you described.

C++: Safe to use locals of caller in function?

I think it's best if I describe the situation using a code example:
int MyFuncA()
{
MyClass someInstance;
//<Work with and fill someInstance...>
MyFuncB( &someInstance )
}
int MyFuncB( MyClass* instance )
{
//Do anything you could imagine with instance, *except*:
//* Allowing references to it or any of it's data members to escape this function
//* Freeing anything the class will free in it's destructor, including itself
instance->DoThis();
instance->ModifyThat();
}
And here come my straightforward questions:
Is the above concept guranteed, by C and C++ standards, to work as expected? Why (not)?
Is this considered doing this, sparingly and with care, bad practice?

Is the above concept guranteed, by C and C++ standards, to work as expected? Why (not)?
Yes, it will work as expected. someInstance is available through the scope of MyFuncA. The call to MyFuncB is within that scope.
Is this considered doing this, sparingly and with care, bad practice?
Don't see why.

I don't see any problem in actually using the pointer you were passed to call functions on the object. As long as you call public methods of MyClass, everything remains valid C/C++.
The actual instance you create at the beginning of MyFuncA() will get destroyed at the end of MyFuncA(), and you are guaranteed that the instance will remain valid for the whole execution of MyFuncB() because someInstance is still valid in the scope of MyFuncA().

Yes it will work. It does not matter if the pointer you pass into MyFuncB is on the stack or on the heap (in this specific case).
In regards for the bad practice part you can probably argue both ways. In general it's bad I think because if for any reason any object which is living outside of MyFuncA gets hold of the object reference then it will die a horrible death later on and cause sometime very hard to track bugs. It rewally depends how extensive the usage of the object becomes in MyFuncB. Especially when it starts involving another 3rd class it can get messy.

Others have answered the basic question, with "yeah, that's legal". And in the absence of greater architecture it is hard to call it good or bad practice. But I'll try and wax philosophical on the broader question you seem to be picking up about pointers, object lifetimes, and expectations across function calls...
In the C++ language, there's no built-in way to pass a pointer to a function and "enforce" that it won't stow that away after the call is complete. And since C++ pointers are "weak references" by default, the objects pointed to may disappear out from under someone you pass it to.
But explicitly weak pointer abstractions do exist, for instance in Qt:
http://doc.qt.nokia.com/latest/qweakpointer.html
These are designed to specifically encode the "paranoia" to the recipient that the object it is holding onto can disappear out from under it. Anyone dereferencing one sort of realizes something is up, and they have to take the proper cautions under the design contract.
Additionally, abstractions like shared pointer exist which signal a different understanding to the recipient. Passing them one of those gives them the right to keep the object alive as long as they want, giving you something like garbage collection:
http://doc.qt.nokia.com/4.7-snapshot/qsharedpointer.html
These are only some options. But in the most general sense, if you come up with any interesting invariant for the lifetimes of your object...consider not passing raw pointers. Instead pass some pointer-wrapping class that embodies and documents the rules of the "game" in your architecture.
(One of major the reasons to use C++ instead of other languages is the wealth of tools you have to do cool things like that, without too much runtime cost!)

i don't think there should be any problem with that barring, as you say, something that frees the object, or otherwise trashes its state. i think whatever unexpected things happen would not have anything to do with using the class this way. (nothing in life is guaranteed of course, but classes are intended to be passed around and operated on, whether it's a local variable or otherwise i do not believe is relevant.)
the one thing you would not be able to do is keep a reference to the class after it goes out of scope when MyFuncA() returns, but that's just the nature of the scoping rules.

Best practice when calling initialize functions multiple times?

This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).

You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).

This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.

I'd look at using boost or STL smart pointers.

I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)

The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).

I think that this could be a scenario where the Singleton pattern could be applied.

How defensive should you be? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Defensive programming
We had a great discussion this morning about the subject of defensive programming. We had a code review where a pointer was passed in and was not checked if it was valid.
Some people felt that only a check for null pointer was needed. I questioned whether it could be checked at a higher level, rather than every method it is passed through, and that checking for null was a very limited check if the object at the other end of the point did not meet certain requirements.
I understand and agree that a check for null is better than nothing, but it feels to me that checking only for null provides a false sense of security since it is limited in scope. If you want to ensure that the pointer is usable, check for more than the null.
What are your experiences on the subject? How do you write defenses in to your code for parameters that are passed to subordinate methods?

In Code Complete 2, in the chapter on error handling, I was introduced to the idea of barricades. In essence, a barricade is code which rigorously validates all input coming into it. Code inside the barricade can assume that any invalid input has already been dealt with, and that the inputs that are received are good. Inside the barricade, code only needs to worry about invalid data passed to it by other code within the barricade. Asserting conditions and judicious unit testing can increase your confidence in the barricaded code. In this way, you program very defensively at the barricade, but less so inside the barricade. Another way to think about it is that at the barricade, you always handle errors correctly, and inside the barricade you merely assert conditions in your debug build.
As far as using raw pointers goes, usually the best you can do is assert that the pointer is not null. If you know what is supposed to be in that memory then you could ensure that the contents are consistent in some way. This begs the question of why that memory is not wrapped up in an object which can verify it's consistency itself.
So, why are you using a raw pointer in this case? Would it be better to use a reference or a smart pointer? Does the pointer contain numeric data, and if so, would it be better to wrap it up in an object which managed the lifecycle of that pointer?
Answering these questions can help you find a way to be more defensive, in that you'll end up with a design that is easier to defend.

The best way to be defensive is not to check pointers for null at runtime, but to avoid using pointers that may be null to begin with
If the object being passed in must not be null, use a reference! Or pass it by value! Or use a smart pointer of some sort.
The best way to do defensive programming is to catch your errors at compile-time.
If it is considered an error for an object to be null or point to garbage, then you should make those things compile errors.
Ultimately, you have no way of knowing if a pointer points to a valid object. So rather than checking for one specific corner case (which is far less common than the really dangerous ones, pointers pointing to invalid objects), make the error impossible by using a data type that guarantees validity.
I can't think of another mainstream language that allows you to catch as many errors at compile-time as C++ does. use that capability.

There is no way to check if a pointer is valid.

In all serious, it depends on how many bugs you'd like to have to have inflicted upon you.
Checking for a null pointer is definitely something that I would consider necessary but not sufficient. There are plenty of other solid principles you can use starting with entry points of your code (e.g., input validation = does that pointer point to something useful) and exit points (e.g., you thought the pointer pointed to something useful but it happened to cause your code to throw an exception).
In short, if you assume that everyone calling your code is going to do their best to ruin your life, you'll probably find a lot of the worst culprits.
EDIT for clarity: some other answers are talking about unit tests. I firmly believe that test code is sometimes more valuable than the code that it's testing (depending on who's measuring the value). That said, I also think that units tests are also necessary but not sufficient for defensive coding.
Concrete example: consider a 3rd party search method that is documented to return a collection of values that match your request. Unfortunately, what wasn't clear in the documentation for that method is that the original developer decided that it would be better to return a null rather than an empty collection if nothing matched your request.
So now, you call your defensive and well unit-tested method thinking (that is sadly lacking an internal null pointer check) and boom! NullPointerException that, without an internal check, you have no way of dealing with:
defensiveMethod(thirdPartySearch("Nothing matches me"));
// You just passed a null to your own code.

I'm a big fan of the "let it crash" school of design. (Disclaimer: I don't work on medical equipment, avionics, or nuclear power-related software.) If your program blows up, you fire up the debugger and figure out why. In contrast, if your program keeps running after illegal parameters have been detected, by the time it crashes you'll probably have no idea what went wrong.
Good code consists of many small functions/methods, and adding a dozen lines of parameter-checking to every one of those snippets of code makes it harder to read and harder to maintain. Keep it simple.

I may be a bit extreme, but I don't like Defensive Programming, I think it's laziness that has introduced the principle.
For this particular example, there is no sense in assert that the pointer is not null. If you want a null pointer, there is no better way to actually enforce it (and document it clearly at the same time) than to use a reference instead. And it's documentation that will actually be enforced by the compiler and does not cost a ziltch at runtime!!
In general, I tend not to use 'raw' types directly. Let's illustrate:
void myFunction(std::string const& foo, std::string const& bar);
What are the possible values of foo and bar ? Well that's pretty much limited only by what a std::string may contain... which is pretty vague.
On the other hand:
void myFunction(Foo const& foo, Bar const& bar);
is much better!
if people mistakenly reverse the order of the arguments, it's detected by the compiler
each class is solely responsible for checking that the value is right, the users are not burdenned.
I have a tendency to favor Strong Typing. If I have an entry that should be composed only of alphabetical characters and be up to 12 characters, I'd rather create a small class wrapping a std::string, with a simple validate method used internally to check the assignments, and pass that class around instead. This way I know that if I test the validation routine ONCE, I don't have to actually worry about all the paths through which that value can get to me > it will be validated when it reaches me.
Of course, that doesn't me that the code should not be tested. It's just that I favor strong encapsulation, and validation of an input is part of knowledge encapsulation in my opinion.
And as no rule can come without an exception... exposed interface is necessarily bloated with validation code, because you never know what might come upon you. However with self-validating objects in your BOM it's quite transparent in general.

"Unit tests verifying the code does what it should do" > "production code trying to verify its not doing what its not supposed to do".
I wouldn't even check for null myself, unless its part of a published API.

It very much depends; is the method in question ever called by code external to your group, or is it an internal method?
For internal methods, you can test enough to make this a moot point, and if you're building code where the goal is highest possible performance, you might not want to spend the time on checking inputs you're pretty darn sure are right.
For externally visible methods - if you have any - you should always double check your inputs. Always.

From debugging point of view, it is most important that your code is fail-fast. The earlier the code fails, the easier to find the point of failure.

For internal methods, we usually stick to asserts for these kinds of checks. That does get errors picked up in unit tests (you have good test coverage, right?) or at least in integration tests that are running with assertions on.

checking for null pointer is only half of the story,
you should also assign a null value to every unassigned pointer.
most responsible API will do the same.
checking for a null pointer comes very cheap in CPU cycles, having an application crashing once its delivered can cost you and your company in money and reputation.
you can skip null pointer checks if the code is in a private interface you have complete control of and/or you check for null by running a unit test or some debug build test (e.g. assert)

There are a few things at work here in this question which I would like to address:
Coding guidelines should specify that you either deal with a reference or a value directly instead of using pointers. By definition, pointers are value types that just hold an address in memory -- validity of a pointer is platform specific and means many things (range of addressable memory, platform, etc.)
If you find yourself ever needing a pointer for any reason (like for dynamically generated and polymorphic objects) consider using smart pointers. Smart pointers give you many advantages with the semantics of "normal" pointers.
If a type for instance has an "invalid" state then the type itself should provide for this. More specifically, you can implement the NullObject pattern that specifies how an "ill-defined" or "un-initialized" object behaves (maybe by throwing exceptions or by providing no-op member functions).
You can create a smart pointer that does the NullObject default that looks like this:
template <class Type, class NullTypeDefault>
struct possibly_null_ptr {
possibly_null_ptr() : p(new NullTypeDefault) {}
possibly_null_ptr(Type* p_) : p(p_) {}
Type * operator->() { return p.get(); }
~possibly_null_ptr() {}
private:
shared_ptr<Type> p;
friend template<class T, class N> Type & operator*(possibly_null_ptr<T,N>&);
};
template <class Type, class NullTypeDefault>
Type & operator*(possibly_null_ptr<Type,NullTypeDefault> & p) {
return *p.p;
}
Then use the possibly_null_ptr<> template in cases where you support possibly null pointers to types that have a default derived "null behavior". This makes it explicit in the design that there is an acceptable behavior for "null objects", and this makes your defensive practice documented in the code -- and more concrete -- than a general guideline or practice.

Pointer should only be used if do you need to do something with the pointer. Such as pointer arithmetic to transverse some data structure. Then if possible that should be encapsulated in a class.
IF the pointer is passed into the function to do something with the object to which it points, then pass in a reference instead.
One method for defensive programming is to assert almost everything that you can. At the beginning of the project it is annoying but later it is a good adjunct to unit testing.

A number of answer address the question of how to write defenses in your code, but no much was said about "how defensive should you be?". That's something you have to evaluate based on the criticality of your software components.
We're doing flight software and the impacts of a software error range from a minor annoyance to loss of aircraft/crew. We categorize different pieces of software based on their potential adverse impacts which affects coding standards, testing, etc. You need to evaluate how your software will be used and the impacts of errors and set what level of defensiveness you want (and can afford). The DO-178B standard calls this "Design Assurance Level".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js