Why aren't data instances NULL by default? - c++

Something that has always bothered me is why isn't data set to NULL by the compiler, if nothing else was specified? I can't see any reason NOT to, and surely it must be better than the junk that gets referred to otherwise?

The original reason was "for speed" -- the variables were only going to make sense once you assigned something to them, so there was an effort to not make a pointless initialization to 0 that was just going to be overwritten by the programmer in a few milliseconds with the actual data anyway.
While it isn't exactly expensive, compilers probably don't zero out variables today for the same reason: any compiler that does would look slow compared to other compilers on benchmarks.

The only good reason is that it's not as efficient. C and C++ (unlike higher-level languages) tend not to do anything you didn't ask for. That is how they got the reputation of being extremely fast. Other languages are safer, but slower.
If you wanted to initialise fields to anything other than NULL, it would be a waste of time to first initialise it to NULL and then initialise it to whatever you set it to. So C/C++ assumes you know what you are doing.

My guess is because of speed reasons. They would be another CPU call(s) to do it.

Setting the allocated memory to zeroes and marking the structures as 'null' takes time. Your application may have time-critical sections where objects must be instantiated as quickly as possible, after which you initialise them in as minimalist way as possible to make them usable. So the compiler and runtimes do as little work as possible to make the objects, and leave it to you to complete their setup as you see fit.
Some high-level languages do implement further initialisation to 'null', but these are then not always suitable for time-critical applications.

If you declare a variable on the heap (not free store, i.e. globally and only globally), this will be initialised to 0.
Data on the stack, however, is not allocated as a way to speed this up. There's no point initialising data to 0 if you're going to overwrite it, which is the assumption made with local (stack) data.

Related

Performance: should I use a global variable in a function which gets called often?

First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.

Why do C++ compilers not initialise every integer declaration to 0, be it local or global or a member?

It baffles me why C++ compilers do not initialise every integer declaration to 0, be it local or global or members? Why do uninitialised sections exists in the memory model?
I know it's a dull answer, but your question begs for it exactly:
Because the C++ standard says so.
Why does it say so? Because C++ is built on a principle:
Don't pay for what you don't use.
Setting memory to a certain value costs CPU time and memory bandwidth. If you want to do it, do it explicitly. A variable declaration should not incur this cost.
C++ is based on C, and in C a primary design concern was code efficiency. In most cases you want to initialize a new variable to a specific value after declaring it. When the compiler would write 0 to that memory address just to write another value to it shortly afterwards, it would be a waste of a CPU cycle.
Sure, a smart compiler could detect that a variable isn't read before it gets a value assigned and could optimize the initialization to 0 away. But when C was developed, compilers weren't that smart yet.
The C language and its standard library generally follow the principle that it doesn't do stuff automatically when it might be unnecessary to do it under some circumstances.
It might make life easier for you if it did, but C++ errs on the side of avoiding overheads, e.g. setting values which you might then reset to something else.

C++ code migration: handling uninitialized pointers

As per the title, I am planning to move some legacy code developed a decade+ ago for AIX. The problem is the code base is huge. The developers didn't initialize their pointers in the original code. Now while migrating the code to the latest servers, I see some problems with it.
I know that the best solution is to run through all the code and initialize all the variables whereever required. However, I am just keen to know if there are any other solutions available to this problem. I tried google but couldn't find an appropriate answer.
The most preventive long-term approach is to initialize all pointers at the location they're declared, changing the code to use appropriate smart pointers to manage the lifetime. If you have any sort of unit tests this refactoring can be relatively painless.
In a shorter term and if you're porting to Linux you could use valgrind and get a good shot at tracking down the one or two real issues that are biting you, giving you time to refactor at a more leisurely pace.
Just initializing all the variables may not be a good idea.
Reliable behavior generally depends on variables having values known to be correct ("guaranteed by construction" to be correct). The problem with uninitialized variables isn't simply that they have unknown values. Obviously being unknown is a problem, but again the desired sate is having known and correct values. Initializing a variable to a known value that is not correct does not yield reliable behavior.
Not infrequently it happens that there is no 'default' value that is correct to use as a fallback if more complicated initialization fails. A program may choose not to initialize a variable with a value if that value must be over-written before the variable can be used.
Initializing a variable to a default value may have a few problems in such cases. Often 'default' values are inoffensive in that if they are used the consequences aren't immediately obvious. That's not generally desirable because as the developer you want to notice when things go wrong. You can avoid this problem by picking default values that will have obvious consequences, but that doesn't solve a second issue; Static analyzers can often detect and report when an uninitialized variable is used. If there's a problem with some complicated initialization logic such that no value is set, you want that to be detectable. Setting a default value prevents static analysis from detecting such cases. So there are cases where you do not want to initialize variables.
With pointers the default value is typically nullptr, which to a certain extent avoids the first issue discussed above because dereferencing a null pointer typically produces an immediate crash (good for debugging). However code might also detect a null pointer and report an error (good for debugging) or might fall back to some other method (bad for debugging). You may be better off using static analysis to detect usages of uninitialized pointers rather than initializing them. Though static analysis may detect dereferencing of null pointers it won't detect when null pointers cause error reporting or fallback routines to be used.
In response to your comment:
The major problems that i see are
Pointers to local variables are returned from functions.
Almost all the pointer variables are not initialized. I am sure that AIX does provide this comfort for the customer in the earlier platform however i really doubt that the code would run flawlessly in Linux when it is being put to real test (Production).
I cannot deliver partial solutions which may work. i prefer to give the best to my customer who pays me for my work. So Wont prefer to use workarounds.
Quality cannot be compromised.
fix them (and pay special attention to correctly cleaning up)
As I argue above simply lacking an initializer is not in and of itself a defect. There is only a defect if the uninitialized value is actually used in an illegal manner. I'm not sure what you mean about AIX providing comfort.
As I argue above the 'partial solution' and 'workaround' would be to blindly initialize everything.
Again, blindly initializing everything can result not only in useless work, but it can actually compromise quality by taking away some tools for detecting bugs.

In either C or C++, should I check pointer parameters against NULL/nullptr?

This question was inspired by this answer.
I've always been of the philosophy that the callee is never responsible when the caller does something stupid, like passing of invalid parameters. I have arrived at this conclusion for several reasons, but perhaps the most important one comes from this article:
Everything not defined is undefined.
If a function doesn't say in it's docs that it's valid to pass nullptr, then you damn well better not be passing nullptr to that function. I don't think it's the responsibility of the callee to deal with such things.
However, I know there are going to be some who disagree with me. I'm curious whether or not I should be checking for these things, and why.
If you're going to check for NULL pointer arguments where you have not entered into a contract to accept and interpret them, do it with an assert, not a conditional error return. This way the bugs in the caller will be immediately detected and can be fixed, and it makes it easy to disable the overhead in production builds. I question the value of the assert except as documentation however; a segfault from dereferencing the NULL pointer is just as effective for debugging.
If you return an error code to a caller which has already proven itself buggy, the most likely result is that the caller will ignore the error, and bad things will happen much later down the line when the original cause of the error has become difficult or impossible to track down. Why is it reasonable to assume the caller will ignore the error you return? Because the caller already ignored the error return of malloc or fopen or some other library-specific allocation function which returned NULL to indicate an error!
In C++, if you don't want to accept NULL pointers, then don't take the chance: accept a reference instead.
While in general I don't see the value in detecting NULL (why NULL and not some other invalid address?) for a public API I'd probably still do it simply because many C and C++ programmers expect such behavior.
Defense in Depth principle says yes. If this is an external API then totally essential. Otherwise, at least an assert to assist in debugging misuse of your API.
You can document the contract until you are blue in the face, but you cannot in callee code prevent ill-advised or malicious misuse of your function. The decision you have to make is what's the likely cost of misuse.
In my view, it's not a question of responsibility. It's a question of robustness.
Unless I have full control on the caller and I must optimize for even the minute speed improvement, I always check for NULL.
I lean heavily on the side of 'don't trust your user's input to not blow up your system' and in defensive programming in general. Since I have made APIs in a past life, I have seen users of the libraries pass in null pointers and then application crashes result.
If it is truly an internal library and I'm the only person (or only a select few) have the ability to use it, then I might ease up on null pointer checks as long as everyone agrees to abide by general contracts. I can't trust the user base at large to adhere to that.
The answer is going to be different for C and C++.
C++ has references. The only difference between passing a pointer and passing a reference is that the pointer can be null. So, if the writer of the called function expects a pointer argument and forgets to do something sane when it's null, he's silly, crazy or writing C-with-classes.
Either way, this is not a matter of who wears the responsibility hat. In order to write good software, the two programmers must co-operate, and it is the responsibility of all programmers to 1° avoid special cases that would require this kind of decision and 2° when that fails, write code that blows up in a non-ambiguous and documented way in order to help with debugging.
So, sure, you can point and laugh at the caller because he messed up and "everything not defined is undefined" and had to spend one hour debugging a simple null pointer bug, but your team wasted some precious time on that.
My philosophy is: Your users should be allowed to make mistakes, your programming team should not.
What this means is that the only place you should check for invalid parameters including NULL, is in the top-level user interface. Everywhere the user can provide input to your code, you should check for errors, and handle them as gracefully as possible.
Everywhere else, you should use ASSERTS to ensure the programmers are using the functions correctly.
If you are writing an API, then only the top-level functions should catch and handle bad input. It is pointless to keep checking for a NULL pointer three or four levels deep into your call stack.
I am pro defensive programming.
Unless you can profile that these nullptr checkings happen in a bottleneck of your application... (in such cases it is conceivable one should not do those pointers value tests at those points)
but all in all comparing an int with 0 is really cheap an operation.
I think it is a shame to let potential crash bugs instead of consuming so little CPU.
so: Test your pointers against NULL!
I think that you should strive to write code that is robust for every conceivable situation. Passing a NULL pointer to a function is very common; therefore, your code should check for it and deal with it, usually by returning an error value. Library functions should NOT crash an application.
For C++, if your function doesn't accept nullpointer, then use a reference argument. In general.
There are some exceptions. For example, many people, including myself, think it's better with pointer argument when the actual argument will most naturally be a pointer, especially when the function stores away of a copy of the pointer. Even when the function doesn't support nullpointer argument.
How much to defend against invalid argument depends, including that it depends on subjective opinion and gut-feeling.
Cheers & hth.,
One thing you have to consider is what happens if some caller DOES misuse your API. In the case of passing NULL pointers, the result is an obvious crash, so it's OK not to check. Any misuse will be readily apparent to the calling code's developer.
The infamous glibc debacle is another thing entirely. The misuse resulted in actually useful behavior for the caller, and the API stayed that way for decades. Then they changed it.
In this case, the API developers' should have checked values with an assert or some similar mechanism. But you can't go back in time to correct an error. The wailing and gnashing of teeth were inevitable. Read all about it here.
If you don't want a NULL then don't make the parameter a pointer.
By using a reference you guarantee that the object will not be NULL.
He who performas invalid operations on invalid or nonexisting data, only deserves his system-state to become invalid.
I consider it complete nonsense that functions which expect input should check for NULL. Or whatever other value for that matter. The sole job of a function is to do a task based on its input or scope-state, nothing else. If you have no valid input, or no input at all, then don't even call the function. Besides, a NULL-check doesn't detect the other millions and millions of possible invalid values. You know on forehand you would be passing NULL, so why would you still pass it, waste valuable cycles on yet another function call with parameter passing, an in-function comparison of some pointer, and then check the function output again for success or not. Sure, I might have done so when I was 6 years old back in 1982, but those days have long since gone.
There is ofcourse the argument to be made for public API's. Like some DLL offering idiot-proof checking. You know, those arguments: "If the user supplies NULL you don't want your application to crash." What a non-argument. It is the user which passes bogus data in the first place; it's an explicit choice and nothing else than that. If one feels that is quality, well... I prefer solid logic and performance over such things. Besides, a programmer is supposed to know what he's doing. If he's operating on invalid data for the particular scope, then he has no business calling himself a programmer. I see no reason to downgrade the performance, increase power consumption, while increasing binary size which in turn affects instruction caching and branch-prediction, of my products in order to support such users.
I don't think it's the responsibility of the callee to deal with such things
If it doesn't take this responsibility it might create bad results, like dereferencing NULL pointers. Problem is that it always implicitly takes this responsibility. That's why i prefer graceful handling.
In my opinion, it's the callee's responsibility to enforce its contract.
If the callee shouldn't accept NULL, then it should assert that.
Otherwise, the callee should be well behaved when it's handed a NULL. That is, either it should functionally be a no-op, return an error code, or allocate its own memory, depending on the contract that you specified for it. It should do whatever seems to be the most sensible from the caller's perspective.
As the user of the API, I want to be able to continue using it without having the program crash; I want to be able to recover at the least or shut down gracefully at worst.
One side effect of that approach is that when your library crashes in response to being passed an invalid argument, you will tend to get the blame.
There is no better example of this than the Windows operating system. Initially, Microsoft's approach was to eliminate many tests for bogus arguments. The result was an operating system that was more efficient.
However, the reality is that invalid arguments are passed all time. From programmers that aren't up to snuff, or just using values returned by other functions there weren't expected to be NULL. Now, Windows performs more validation and is less efficient as a result.
If you want to allow your routines to crash, then don't test for invalid parameters.
Yes, you should check for null pointers. You don't want to crash an application because the developer messed something up.
Overhead of development time + runtime performance has a trade-off with the robustness of the API you are designing.
If the API you are publishing has to run inside the process of the calling routine, you SHOULD NOT check for NULL or invalid arguments. In this scenario, if you crash, the client program crashes and the developer using your API should mend his ways.
However, if you are providing a runtime/ framework which will run the client program inside it (e.g., you are writing a virtual machine or a middleware which can host the code or an operating system), you should definitely check of the correctness of the arguments passed. You don't want your program to be blamed for the mistakes of a plugin.
There is a distinction between what I would call legal and moral responsibility in this case. As an analogy, suppose you see a man with poor eyesight walking towards a cliff edge, blithely unaware of its existence. As far as your legal responsibility goes, it would in general not be possible to successfully prosecute you if you fail to warn him and he carries on walking, falls off the cliff and dies. On the other hand, you had an opportunity to warn him -- you were in a position to save his life, and you deliberately chose not to do so. The average person tends to regard such behaviour with contempt, judging that you had a moral responsibility to do the right thing.
How does this apply to the question at hand? Simple -- the callee is not "legally" responsible for the actions of the caller, stupid or otherwise, such as passing in invalid input. On the other hand, when things go belly up and it is observed that a simple check within your function could have saved the caller from his own stupidity, you will end up sharing some of the moral responsibility for what has happened.
There is of course a trade-off going on here, dependent on how much the check actually costs you. Returning to the analogy, suppose that you found out that the same stranger was inching slowly towards a cliff on the other side of the world, and that by spending your life savings to fly there and warn him, you could save him. Very few people would judge you entirely harshly if, in this particular situation, you neglected to do so (let's assume that the telephone has not been invented, for the purposes of this analogy). In coding terms, however, if the check is as simple as checking for NULL, you are remiss if you fail to do so, even if the "real" blame in the situation lies with the caller.

Consequences of only using stack in C++

Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?
Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?
Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.
There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).
I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.
Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.
First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision
I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.
As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.
The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....
The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.
Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.
Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?