I was browsing my teacher's code when I stumbled across this:
Order* order1 = NULL;
then
order1 = order(customer1, product2);
which calls
Order* order(Customer* customer, Product* product)
{
return new Order(customer, product);
}
This looks like silly code. I'm not sure why, but the teacher initialized all pointers to NULL instead of declaring them right away(looking at the code it's entirely possible, but he chose not to).
My question is: is this good or acceptable code? Does the function call have any benefits over calling a constructor explicitely? And how does new work in this case? Can I imagine the code now as kind of like:
order1 = new Order(customer, product);
Init to NULL
[edit] since there's a valid discussion, I've changed the order of the options a bit to emphasize the recommended option.
Variables should be declared as local and as late as possible, and initialized immediately. Thus, the most common pattern is:
Order * order1 = order(...);
just before order1 is required.
If there is any reason to separate the declaration of order1 from the instantiation, like this:
Order * order1; // Oh no! not initialized!
// ... some code
order1 = order(...);
order1 should be initialized to NULL, to prevent common bugs that occur with uninitialized variables, easily introduced when // some code changes.
Factory method
Again, there's some more change resilence here: the requirements for instantiating an Order may change. There are two scenarios I can think of right off top of my head:
(1) Validation that can't be done by Order's constructor. Order may come from a 3rd party library and can't be changed, or instantiation needs to add validation that isn't within the scope of Order:
Order* order(Customer* customer, Product* product)
{
// Order can't validate these, since it doesn't "know" the database
database.ValidateCustomer(customer); // throws on error
database.ValidateProduct(product); // throws on error
return new Order(customer, product);
}
(2) You may need an order that behaves differently.
class DemoOrder : public Order { ... }
Order* order(Customer* customer, Product* product)
{
if (demoMode)
return new DemoOrder(customer, product); // doesn't write to web service
else
return new Order(customer, product);
}
However, I wouldn't make this a general pattern blindly.
It seems to me that your teacher is an old C programmer who hasn't quite shaken off some of his old habits. In the old times, you had to declare all variables at the beginning of a function, so it's not unusual to see some old timers still doing so.
If that's really all the important code, I see no benefit to the function or the initial NULL value. new works the way it always works. It constructs an Order and returns a pointer to it, which is in turn returned by order.
The code give could be important if the assignment to NULL happens in one function, e.g. the constructor, and the assignment that calls new happens in another function. Here's three reasons;
If the customer and product parameters might not be available when the order = NULL was called.
The NULL value could be significant in the interim to let other functions know that the order hasn't yet been created.
If the Order class took a lot of resources, deferring its initialising could be beneficial.
Best to ask your teacher why they did it this way. Sometimes the obvious conclusions aren't the right ones.
If the things really are as you describe them then your teacher's code illustrates some rather bad programming practices.
Your question is tagged C++. In C++ the proper technique would be to move the variable declaration to the point where it can be meaningfuly initialized. I.e. it should've looked as follows
Order* order1 = order(customer1, product2);
The same would appliy to C99 code, since C99 allows declaring variable in the middle of the block.
In C code the declaration must be placed at the beginning of the block, which might lead to situations where you can't meaningfully initialize a variable at the point of declaration. Some people believe that in this case you have to initialize the variable with something, anything just to keep it initialized as opposed to leaving it uninitialized. I can respect this as a matter of personal preference, buy personally consider it counterproductive. It interferes with compiler optimizations and tends to hide bugs by sweeping them under the carpet (instead of encouraging proper fixes).
I would say that your teacher's C++ code suffers from some bad habits brought over from C.
For the given code, there is no difference. Its better to initialize order object with a newly created object at single place. It will avoid using order variable initialized with NULL value
Related
Ive been reading a excellent book written by Bjarne Stroustrup and he recommends that you declare variables as late as possible, preferable just before you use it, however it fails to mention any benefits over declaring the variables late than at the start of the function body.
So what is the benefit of declaring variable late like this:
int main()
{
/* some
code
here
*/
int MyVariable1;
int MyVariable2;
std::cin >> MyVariable1 >> MyVariable2;
return(0);
}
instead of at the start of a function body like this:
int main()
{
int MyVariable1;
int MyVariable2;
/* some
code
here
*/
std::cin >> MyVariable1 >> MyVariable2;
return (0);
}
It makes the code easier to follow. In general, you declare variables when you need them, e.g. near a loop when you want to find a minimum of something via that loop. In this way, when someone reads your code, (s)he doesn't have to try to decipher what 25 variables mean at the start of a function, but the variables will "explain" themselves when going through the code. After all, it's not important to know what variables mean, but to understand what the code does.
Remember that most of the time you use that local variable in a very small portion of your code, so it makes sense to define it in that small portion where you need it.
A few points that comes to mind
Not all objects are default - constructible , so many times declaring the object in the beginning of the function is not an option, only on assignment (aka auto myObj = creationalfunction();)
your function gets smaller number of lines, hence more readable. declaring each variable in the beginning of the function really makes it a few lines bigger, throughout the code.
if your function throws - it's not economical to build a list of objects, just to destroy them on stack-unwinding
declaring variables in the same line they are assigned can let you use auto, which makes the code times more flexible.
it's the common convention for C++ these days, and that is pretty important.
create an object + assign it later on might be more slow than directly initialize an object with values.
If "other code" is a page of code then you can't actually see the declaration on the screen when you read the values. If you thought that you were reading two doubles, you can't see on the screen that you are wrong. If you declare the variable on one line and use it on the next, any mistake would be obvious.
Suppose, that you deal with some objects and construction of these objects is an expensive operation. In such situation there are a few reasons why it is better to define variables just before their usage:
1) First of all, it is sometimes faster to create an object using appropriate constructor instead of default-constructing and assignment. So this:
T obj(/* some arguments here */);
may be faster then this:
T obj;
/* some code here*/
obj = T(/* some arguments here */);
Note that in the first example only a single constructor is invoked. But in the second example default constructor and assignment operator are invoked.
2) If an exception is thrown somewhere between object definition and its first usage you just do unnecessary work creating and destroying your object without any usage at all. The same is applicable when function returns between object definition and its first usage.
3) Yes, readability is also worth to mention here :)
When starting to get good at programming you will usually end up holding the entire program in your head at the same time. Later, you will learn how to reduce this to one function in your head.
Both of these limit how large/complex a program or function you can work with. You can help this problem by simplifying what is going on so you no longer have to think about it: reduce your working memory needs. Also you can trade one kind of complexity for another; fsncy variable value dancing for some complex higher level algorithm, or for certainty of code correctness.
There are many ways to do this. You can work with chunkable patterns, and think in those patterns instead of in lower level primitives (this is basically what you did when you graduated from whole program state to single function state). You can also do this by making your state simpler.
Every variable carries state. It modifies what that line of code means, and what every previous line of code means up to the point of its declaration. A variable that exists on a line could be modified by the line or read by the line. To understand what the reading of a variable means, you have to audit every line between its declaration and its use for the possibility it is edited.
Now, this may not happen: but checking it both takes time and working memory. If you have 10 variables, having to remember which of them where modified "above" and which not and what their values mean can burn a lot of headspace.
On the other hand, a variable created, used, and either falling out of scope or never used again is not going to cause this cognitive load. You do not have to check for hidden state or meaning. What more, you are not tempted -- indeed not able -- to use it prior to that line. You are definitely not going to overwrite important state that later code relies on when you set it, and you are not going to have it modified to something surprising between initialization and use.
In short, reduce the "state space" of the lines of code you use it, and even don't use it in.
Sometimes this is difficult to achieve, and sometimes impractical or impossible. But quite often it is easy, improves code quality, makes it easier to read or understand. The most important audience of code is humans, there is a reason we don't check in the object file output of a compiler (or some intermediate representation).
Suc "low state" code is also way easier to modify after the fact. In the limit, it becomes pure functional code.
I don't want to use pointers when I don't have to, but here's the problem: in the code below if I remove the asterisk and make level simply an object, and of course remove the line level = new Level; I get a runtime error, the reason being that level is then initialized on the first line, BEFORE initD3D and init_pipeline - the methods that set up the projection and view for use. You see the problem is that level uses these two things but when done first I get a null pointer exception.
Is this simply a circumstance where the answer is to use a pointer? I've had this problem before, basically it seems extremely vexing that when a class type accepts no arguments, you are essentially initializing it where you declare it.... or am I wrong about this last part?
Level* level;
D3DMATRIX* matProjection,* matView;
//called once
void Initialise(HWND hWnd)
{
initD3D(hWnd);
init_pipeline();
level = new Level;
}
I'm coming from c# and in c#, you are simply declaring a name with the line Level level; arguments or not, you still have to initialize it at some point.
You are correct that if you do:
Level level;
then level will be instantiated at that point. That is because the above expression, which appears to be a global, isn't just a declaration, but also a definition.
If this is causing you problems because Level is being instantiated before something else is being instantiated, then you have encountered a classic reason why globals suck.
You have attempted to resolve this by making level a pointer and then "initializing" it later. Wjhat might suprise you is that level is still being instantiated at the same point. The difference now is the type of level. It's not a Level anymore; now its a pointer-to-level. If you examine the value of level when your code enters Initialize you'll see that it has a value of NULL.
It has a value of NULL instead of a garbage value because globals are static initialized, which in the case here, means zero-initialized.
But this is all somewhat tangential to the real problem, which is that you are using globals in the first place. If you need to instantiate objects in a specific order, then instantiate them in that order. Don't use globals, and you may find that by doing that, you don't need to use pointers, either.
Is this simply a circumstance where the answer is to use a pointer
Yea, basically.
This may be a subjective question, but I'm more or less asking it and hoping that people share their experiences. (As that is the biggest thing which I lack in C++)
Anyways, suppose I have -for some obscure reason- an initialize function that initializes a datastructure from the heap:
void initialize() {
initialized = true;
pointer = new T;
}
now When I would call the initialize function twice, an memory leak would happen (right?). So I can prevent this is multiple ways:
ignore the call (just check wether I am initialized, and if I am don't do anything)
Throw an error
automatically "cleanup" the code and then reinitialize the thing.
Now what is generally the "best" method, which helps keeping my code manegeable in the future?
EDIT: thank you for the answers so far. However I'd like to know how people handle this is a more generic way. - How do people handle "simple" errors which can be ignored. (like, calling the same function twice while only 1 time it makes sense).
You're the only one who can truly answer the question : do you consider that the initialize function could eventually be called twice, or would this mean that your program followed an unexpected execution flow ?
If the initialize function can be called multiple times : just ignore the call by testing if the allocation has already taken place.
If the initialize function has no decent reason to be called several times : I believe that would be a good candidate for an exception.
Just to be clear, I don't believe cleanup and regenerate to be a viable option (or you should seriously consider renaming the function to reflect this behavior).
This pattern is not unusual for on-demand or lazy initialization of costly data structures that might not always be needed. Singleton is one example, or for a class data member that meets those criteria.
What I would do is just skip the init code if the struct is already in place.
void initialize() {
if (!initialized)
{
initialized = true;
pointer = new T;
}
}
If your program has multiple threads you would have to include locking to make this thread-safe.
I'd look at using boost or STL smart pointers.
I think the answer depends entirely on T (and other members of this class). If they are lightweight and there is no side-effect of re-creating a new one, then by all means cleanup and re-create (but use smart pointers). If on the other hand they are heavy (say a network connection or something like that), you should simply bypass if the boolean is set...
You should also investigate boost::optional, this way you don't need an overall flag, and for each object that should exist, you can check to see if instantiated and then instantiate as necessary... (say in the first pass, some construct okay, but some fail..)
The idea of setting a data member later than the constructor is quite common, so don't worry you're definitely not the first one with this issue.
There are two typical use cases:
On demand / Lazy instantiation: if you're not sure it will be used and it's costly to create, then better NOT to initialize it in the constructor
Caching data: to cache the result of a potentially expensive operation so that subsequent calls need not compute it once again
You are in the "Lazy" category, in which case the simpler way is to use a flag or a nullable value:
flag + value combination: reuse of existing class without heap allocation, however this requires default construction
smart pointer: this bypass the default construction issue, at the cost of heap allocation. Check the copy semantics you need...
boost::optional<T>: similar to a pointer, but with deep copy semantics and no heap allocation. Requires the type to be fully defined though, so heavier on dependencies.
I would strongly recommend the boost::optional<T> idiom, or if you wish to provide dependency insulation you might fall back to a smart pointer like std::unique_ptr<T> (or boost::scoped_ptr<T> if you do not have access to a C++0x compiler).
I think that this could be a scenario where the Singleton pattern could be applied.
I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).
#define SAFE_DELETE(a) if( (a) != NULL ) delete (a); (a) = NULL;
OR
template<typename T> void safe_delete(T*& a) {
delete a;
a = NULL;
}
or any other better way
I would say neither, as both will give you a false sense of security. For example, suppose you have a function:
void Func( SomePtr * p ) {
// stuff
SafeDelete( p );
}
You set p to NULL, but the copies of p outside the function are unaffected.
However, if you must do this, go with the template - macros will always have the potential for tromping on other names.
Clearly the function, for a simple reason. The macro evaluates its argument multiple times. This can have evil side effects. Also the function can be scoped. Nothing better than that :)
delete a;
ISO C++ specifies, that delete on a NULL pointer just doesn't do anything.
Quote from iso 14882:
5.3.5 Delete [expr.delete]
2 [...] In either alternative, if the value of the operand of delete is the
null pointer the operation has no effect. [...]
Regards, Bodo
/edit: I didn't notice the a=NULL; in the original post, so new version: delete a; a=NULL; however, the problem with setting a=NULL has already been pointed out (false feeling of security).
Generally, prefer inline functions over macros, as macros don't respect scope, and may conflict with some symbols during preprocessing, leading to very strange compile errors.
Of course, sometimes templates and functions won't do, but here this is not the case.
Additionally, the better safe-delete is not necessary, as you could use smart-pointers, therefore not requiring to remember using this method in the client-code, but encapsulating it.
(edit) As others have pointed out, safe-delete is not safe, as even if somebody does not forget to use it, it still may not have the desired effect. So it is actually completely worthless, because using safe_delete correctly needs more thought than just setting to 0 by oneself.
You don't need to test for nullity with delete, it is equivalent to a no-op. (a) = NULL makes me lift an eyebrow. The second option is better.
However, if you have a choice, you should use smart pointers, such as std::auto_ptr or tr1::shared_ptr, which already do this for you.
I think
#define SAFE_DELETE(pPtr) { delete pPtr; pPtr = NULL } is better
its ok to call delete if pPtr is NULL. So if check is not required.
in case if you call SAFE_DELETE(ptr+i), it will result in compilation error.
Template definition will create multiple instances of the function for each data type. In my opinion in this case, these multiple definitions donot add any value.
Moreover, with template function definition, you have overhead of function call.
Usage of SAFE_DELETE really appears to be a C programmers approach to commandeering the built in memory management in C++. My question is: Will C++ allow this method of using a SAFE_DELETE on pointers that have been properly encapsulated as Private? Would this macro ONLY work on pointer that are declared Public? OOP BAD!!
As mentioned quite a bit above, the second one is the better one, not a macro with potential unintended side effects, doesn't have the unneeded check against NULL (although I suspect you are doing that as a type check), etc. But neither are promising any safety. If you do use something like tr1::smart_ptr, please make sure you read the docs on them and are sure that it has the right semantics for your task. I just recently had to hunt down and clean up a huge memory leak due to a co-worker putting smart_ptrs into a data structure with circular links :) (he should have used weak_ptrs for back references)
I prefer this version:
~scoped_ptr() {
delete this->ptr_; //this-> for emphasis, ptr_ is owned by this
}
Setting the pointer to null after deleting it is quite pointless, as the only reason that you would use pointers is to allow an object to be referenced in multiple places at once. Even if the pointer in one part of the program is 0 there may well be others that are not set to 0.
Furthermore the safe_delete macro / function template is very difficult to use right, because there are only two places that it can be used if there is code that may throw between the new and delete for the given pointer.
1) Inside either a catch (...) block that rethrows the exception and also duplicated next to the catch (...) block for the path that doesn't throw. (Also duplicated next to every break, return, continue etc that may allow the pointer to fall out of scope)
2) Inside a destructor for an object that owns the pointer (unless there is no code between the new and delete that can throw).
Even if there is no code that could throw when you write the code, this could change in the future (all it takes is for someone to came along and add another new after the first one). It is better write code in a way that stays correct even in the face of exceptions.
Option 1 creates so much code duplication and is so easy to get wrong that I am doubtful to even call it an option.
Option 2 makes safe_delete redundant, as the ptr_ that you are setting to 0 will go out of scope on the next line.
In summary -- don't use safe_delete as it is not safe at all (it is very difficult to use correctly, and leads to redundant code even when its use is correct). Use SBRM and smart pointers.