Consequences of only using stack in C++ - c++

Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?

Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?

Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.

There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).

I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.

Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.

First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.

I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision

I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.

As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.

The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....

The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.

Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.

Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?

Related

C++ std features and Binary size

I was told recently in a job interview their project works on building the smallest size binary for their application (runs embedded) so I would not be able to use things such as templating or smart pointers as these would increase the binary size, they generally seemed to imply using things from std would be generally a no go (not all cases).
After the interview, I tried to do research online about coding and what features from standard lib caused large binary sizes and I could find basically nothing in regards to this. Is there a way to quantify using certain features and the size impact they would have (without needing to code 100 smart pointers in a code base vs self managed for example).
This question probably deserves more attention than it’s likely to get, especially for people trying to pursue a career in embedded systems. So far the discussion has gone about the way that I would expect, specifically a lot of conversation about the nuances of exactly how and when a project built with C++ might be more bloated than one written in plain C or a restricted C++ subset.
This is also why you can’t find a definitive answer from a good old fashioned google search. Because if you just ask the question “is C++ more bloated than X?”, the answer is always going to be “it depends.”
So let me approach this from a slightly different angle. I’ve both worked for, and interviewed at companies that enforced these kinds of restrictions, I’ve even voluntarily enforced them myself. It really comes down to this. When you’re running an engineering organization with more than one person with plans to keep hiring, it is wildly impractical to assume everyone on your team is going to fully understand the implications of using every feature of a language. Coding standards and language restrictions serve as a cheap way to prevent people from doing “bad things” without knowing they’re doing “bad things”.
How you define a “bad thing” is then also context specific. On a desktop platform, using lots of code space isn’t really a “bad” enough thing to rigorously enforce. On a tiny embedded system, it probably is.
C++ by design makes it very easy for an engineer to generate lots of code without having to type it out explicitly. I think that statement is pretty self-evident, it’s the whole point of meta-programming, and I doubt anyone would challenge it, in fact it’s one of the strengths of the language.
So then coming back to the organizational challenges, if your primary optimization variable is code space, you probably don’t want to allow people to use features that make it trivial to generate code that isn’t obvious. Some people will use that feature responsibly and some people won’t, but you have to standardize around the least common denominator. A C compiler is very simple. Yes you can write bloated code with it, but if you do, it will probably be pretty obvious from looking at it.
(Partially extracted from comments I wrote earlier)
I don't think there is a comprehensive answer. A lot also depends on the specific use case and needs to be judged on a case-by-case basis.
Templates
Templates may result in code bloat, yes, but they can also avoid it. If your alternative is introducing indirection through function pointers or virtual methods, then the templated function itself may become bigger in code size simply because function calls take several instructions and removes optimization potential.
Another aspect where they can at least not hurt is when used in conjunction with type erasure. The idea here is to write generic code, then put a small template wrapper around it that only provides type safety but does not actually emit any new code. Qt's QList is an example that does this to some extend.
This bare-bones vector type shows what I mean:
class VectorBase
{
protected:
void** start, *end, *capacity;
void push_back(void*);
void* at(std::size_t i);
void clear(void (*cleanup_function)(void*));
};
template<class T>
class Vector: public VectorBase
{
public:
void push_back(T* value)
{ this->VectorBase::push_back(value); }
T* at(std::size_t i)
{ return static_cast<T*>(this->VectorBase::at(i)); }
~Vector()
{ clear(+[](void* object) { delete static_cast<T*>(object); }); }
};
By carefully moving as much code as possible into the non-templated base, the template itself can focus on type-safety and to provide necessary indirections without emitting any code that wouldn't have been here anyway.
(Note: This is just meant as a demonstration of type erasure, not an actually good vector type)
Smart pointers
When written carefully, they won't generate much code that wouldn't be there anyway. Whether an inline function generates a delete statement or the programmer does it manually doesn't really matter.
The main issue that I see with those is that the programmer is better at reasoning about code and avoiding dead code. For example even after a unique_ptr has been moved away, the destructor of the pointer still has to emit code. A programmer knows that the value is NULL, the compiler often doesn't.
Another issue comes up with calling conventions. Objects with destructors are usually passed on the stack, even if you declare them pass-by-value. Same for return values. So a function unique_ptr<foo> bar(unique_ptr<foo> baz) will have higher overhead than foo* bar(foo* baz) simply because pointers have to be put on and off the stack.
Even more egregiously, the calling convention used for example on Linux makes the caller clean up parameters instead of the callee. That means if a function accepts a complex object like a smart pointer by value, a call to the destructor for that parameter is replicated at every call site, instead of putting it once inside the function. Especially with unique_ptr this is so stupid because the function itself may know that the object has been moved away and the destructor is superfluous; but the caller doesn't know this (unless you have LTO).
Shared pointers are a different beast altogether, simply because they allow a lot of different tradeoffs. Should they be atomic? Should they allow type casting, weak pointers, what indirection is used for destruction? Do you really need two raw pointers per shared pointer or can the reference counter be accessed through shared object?
Exceptions, RTTI
Generally avoided and removed via compiler flags.
Library components
On a bare-metal system, pulling in parts of the standard library can have a significant effect that can only be measured after the linker step. I suggest any such project use continuous integration and tracks the code size as a metric.
For example I once added a small feature, I don't remember which, and in its error handling it used std::stringstream. That pulled in the entire iostream library. The resulting code exceeded my entire RAM and ROM capacity. IIRC the issue was that even though exception handling was deactivated, the exception message was still being set up.
Move constructors and destructors
It's a shame that C++'s move semantics aren't the same as for example Rust's where objects can be moved with a simple memcpy and then "forgetting" their original location. In C++ the destructor for a moved object is still invoked, which requires more code in the move constructor / move assignment operator, and in the destructor.
Qt for example accounts for such simple cases in its meta type system.

Why is a function not an object?

I read in the standards n4296 (Draft) § 1.8 page 7:
An object is a region of storage. [ Note: A function is not an object,
regardless of whether or not it occupies storage in the way that
objects do. —end note ]
I spent some days on the net looking for a good reason for such exclusion, with no luck. Maybe because I do not fully understand objects. So:
Why is a function not an object? How does it differ?
And does this have any relation with the functors (function objects)?
A lot of the difference comes down to pointers and addressing. In C++¹ pointers to functions and pointers to objects are strictly separate kinds of things.
C++ requires that you can convert a pointer to any object type into a pointer to void, then convert it back to the original type, and the result will be equal to the pointer you started with². In other words, regardless of exactly how they do it, the implementation has to ensure that a conversion from pointer-to-object-type to pointer-to-void is lossless, so no matter what the original was, whatever information it contained can be recreated so you can get back the same pointer as you started with by conversion from T* to void * and back to T*.
That's not true with a pointer to a function though--if you take a pointer to a function, convert it to void *, and then convert it back to a pointer to a function, you may lose some information in the process. You might not get back the original pointer, and dereferencing what you do get back gives you undefined behavior (in short, don't do that).
For what it's worth, you can, however, convert a pointer to one function to a pointer to a different type of function, then convert that result back to the original type, and you're guaranteed that the result is the same as you started with.
Although it's not particularly relevant to the discussion at hand, there are a few other differences that may be worth noting. For example, you can copy most objects--but you can't copy any functions.
As far as relationship to function objects goes: well, there really isn't much of one beyond one point: a function object supports syntax that looks like a function call--but it's still an object, not a function. So, a pointer to a function object is still a pointer to an object. If, for example, you convert one to void *, then convert it back to the original type, you're still guaranteed that you get back the original pointer value (which wouldn't be true with a pointer to a function).
As to why pointers to functions are (at least potentially) different from pointers to objects: part of it comes down to existing systems. For example, on MS-DOS (among others) there were four entirely separate memory models: small, medium, compact, and large. Small model used 16 bit addressing for either functions or data. Medium used 16 bit addresses for data, and 20-bit addresses for code. Compact reversed that (16 bit addresses for code, 20-bit addresses for data). Large used 20-bit addresses for both code and data. So, in either compact or medium model, converting between pointers to code and pointers to functions really could and did lead to problems.
More recently, a fair number of DSPs have used entirely separate memory buses for code and for data and (like with MS-DOS memory models) they were often different widths, converting between the two could and did lose information.
These particular rules came to C++ from C, so the same is true in C, for whatever that's worth.
Although it's not directly required, with the way things work, pretty much the same works out to be true for a conversion from the original type to a pointer to char and back, for whatever that's worth.
Why a function is not an object? How does it differ?
To understand this, let's move from bottom to top in terms of abstractions involved. So, you have your address space through which you can define the state of the memory and we have to remember that fundamentally it's all about this state you operate on.
Okay, let's move a bit higher in terms of abstractions. I am not taking about any abstractions imposed by a programming language yet (like object, array, etc.) but simply as a layman I want to keep a record of a portion of the memory, lets call it Ab1 and another one called Ab2.
Both have a state fundamentally but I intend to manipulate/make use of the state differently.
Differently...Why and How?
Why ?
Because of my requirements (to perform addition of 2 numbers and store the result back, for example). I will be using use Ab1 as a long usage state and Ab2 as relatively shorter usage state. So, I will create a state for Ab1(with the 2 numbers to add) and then use this state to populate some of state of Ab2(copy them temporarily) and perform further manipulation of Ab2(add them) and save a portion of resultant Ab2 to Ab1(the added result). Post that Ab2 becomes useless and we reset its state.
How?
I am going to need some management of both the portions to keep track of what words to pick from Ab1 and copy to Ab2 and so on. At this point I realize that I can make it work to perform some simple operations but something serious shall require a laid out specification for managing this memory.
So, I look for such management specification and it turns out there exists a variety of these specifications (with some having built-in memory model, others provide flexibility to manage the memory yourself) with a better design. In-fact because they(without even dictating how to manage the memory directly) have successfully defined the encapsulation for this long lived storage and rules for how and when this can be created and destroyed.
The same goes for Ab2 but the way they present it makes me feel like this is much different from Ab1. And indeed, it turns out to be. They use a stack for state manipulation of Ab2 and reserve memory from heap for Ab1. Ab2 dies after a while.(after finished executing).
Also, the way you define what to do with Ab2 is done through yet another storage portion called Ab2_Code and specification for Ab1 involves similarly Ab1_Code
I would say, this is fantastic! I get so much convenience that allows me to solve so many problems.
Now, I am still looking from a layman's perspective so I don't feel surprised really having gone through the thought process of it all but if you question things top-down, things can get a bit difficult to put into perspective.(I suspect that's what happened in your case)
BTW, I forgot to mention that Ab1 is called an object officially and Ab2 a function stack while Ab1_Code is the class definition and Ab2_Code is the function definition code.
And it is because of these differences imposed by the PL, you find that they are so different.(your question)
Note: Don't take my representation of Ab1/Object as a long storage abstraction as a rule or a concrete thing - it was from layman perspective. The programming language provides much more flexibility in terms of managing lifecycle of an object. So, object may be deployed like Ab1 but it can be much more.
And does this have any relation with the functors (function objects)?
Note that the first part answer is valid for many programming languages in general(including C++), this part has to do specifically with C++ (whose spec you quoted). So you have pointer to a function, you can have a pointer to an object too. Its just another programming construct that C++ defines. Notice that this is about having a pointer to the Ab1, Ab2 to manipulate them rather than having another distinct abstraction to act upon.
You can read about its definition, usage here:
C++ Functors - and their uses
Let me answer the question in simpler language (terms).
What does a function contain?
It basically contains instructions to do something. While executing the instructions, the function can temporarily store and / or use some data - and might return some data.
Although the instructions are stored somewhere - those instructions themselves are not considered as objects.
Then, what are the objects?
Generally, objects are entities which contain data - which get manipulated / changed / updated by functions (the instructions).
Why the difference?
Because computers are designed in such way that the instructions do not depend on the data.
To understand this, let's think about a calculator. We do different mathematical operations using a calculator. Say, if we want to add some numbers, we provide the numbers to the calculator. No matter what the numbers are, the calculator will add them in the same way following the same instructions (if the result exceeds the calculator's capacity to store, it will show an error - but that is because of calculator's limitation to store the result (the data), not because of its instructions for addition).
Computers are designed in the similar manner. That is why when you use a library function (for example qsort()) on some data which are compatible with the function, you get the same result as you expect - and the functionality of the function doesn't change if the data changes - because the instructions of the function remains unchanged.
Relation between function and functors
Functions are set of instructions; and while they are being executed, some temporary data can be required to store. In other words, some objects might be temporarily created while executing the function. These temporary objects are functors.

Pointers in C++ : How large should an object be to need use of a pointer?

Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!

How to return smart pointers (shared_ptr), by reference or by value?

Let's say I have a class with a method that returns a shared_ptr.
What are the possible benefits and drawbacks of returning it by reference or by value?
Two possible clues:
Early object destruction. If I return the shared_ptr by (const) reference, the reference counter is not incremented, so I incur the risk of having the object deleted when it goes out of scope in another context (e.g. another thread). Is this correct? What if the environment is single-threaded, can this situation happen as well?
Cost. Pass-by-value is certainly not free. Is it worth avoiding it whenever possible?
Thanks everybody.
Return smart pointers by value.
As you've said, if you return it by reference, you won't properly increment the reference count, which opens up the risk of deleting something at the improper time. That alone should be enough reason to not return by reference. Interfaces should be robust.
The cost concern is nowadays moot thanks to return value optimization (RVO), so you won't incur a increment-increment-decrement sequence or something like that in modern compilers. So the best way to return a shared_ptr is to simply return by value:
shared_ptr<T> Foo()
{
return shared_ptr<T>(/* acquire something */);
};
This is a dead-obvious RVO opportunity for modern C++ compilers. I know for a fact that Visual C++ compilers implement RVO even when all optimizations are turned off. And with C++11's move semantics, this concern is even less relevant. (But the only way to be sure is to profile and experiment.)
If you're still not convinced, Dave Abrahams has an article that makes an argument for returning by value. I reproduce a snippet here; I highly recommend that you go read the entire article:
Be honest: how does the following code make you feel?
std::vector<std::string> get_names();
...
std::vector<std::string> const names = get_names();
Frankly, even though I should know better, it makes me nervous. In principle, when get_names()
returns, we have to copy a vector of strings. Then, we need to copy it again when we initialize
names, and we need to destroy the first copy. If there are N strings in the vector, each copy
could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses > as the string contents are copied.
Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid
needless copies:
get_names(std::vector<std::string>& out_param );
...
std::vector<std::string> names;
get_names( names );
Unfortunately, this approach is far from ideal.
The code grew by 150%
We’ve had to drop const-ness because we’re mutating names.
As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
We no longer have strict value semantics for names.
But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x).
Regarding any smart pointer (not just shared_ptr), I don't think it's ever acceptable to return a reference to one, and I would be very hesitant to pass them around by reference or raw pointer. Why? Because you cannot be certain that it will not be shallow-copied via a reference later. Your first point defines the reason why this should be a concern. This can happen even in a single-threaded environment. You don't need concurrent access to data to put bad copy semantics in your programs. You don't really control what your users do with the pointer once you pass it off, so don't encourage misuse giving your API users enough rope to hang themselves.
Secondly, look at your smart pointer's implementation, if possible. Construction and destruction should be darn close to negligible. If this overhead isn't acceptable, then don't use a smart pointer! But beyond this, you will also need to examine the concurrency architecture that you've got, because mutually exclusive access to the mechanism that tracks the uses of the pointer is going to slow you down more than mere construction of the shared_ptr object.
Edit, 3 years later: with the advent of the more modern features in C++, I would tweak my answer to be more accepting of cases when you've simply written a lambda that never lives outside of the calling function's scope, and isn't copied somewhere else. Here, if you wanted to save the very minimal overhead of copying a shared pointer, it would be fair and safe. Why? Because you can guarantee that the reference will never be mis-used.

Does anyone use nothing but reference variables to increase efficency and decrease size?

If some of you haven't noticed already, I'm a noob. With that said, here is my question:
Do any of you experienced programmers use reference variables to decrease the memory required of your programs? I was thinking that, while probably a dangerous practice, you could use reference variables for mobile applications to make them use less memory and make them faster.
I know that in C++ when you pass a variable, as an argument, to a function that it creates a copy of that variable, but you can use the & to make it a reference variable which just points to the variables memory location. Wouldn't that make your program use less memory overall and make things faster?
For big things, like structs and objects a reference uses less memory. However most people already pass these by reference anyway so it doesn't matter for our discussion.
Smaller things like ints and chars are the same size or smaller then a reference. There is no memory gain by passing them by reference...
... but there is a performance penalty, since referrences need to be dereferenced in order to manipulate the value.
Finally, pass by reference is more prone to bugs then pass-by-value. Programs should be build for correctness first and performance second.
This depends on (a) underlying architecture, (b) framework and (c) language you use, but the general answer is no - this is not the best (or even common) optimization practice and yes - programs may run slower (much slower!) using your approach.
In C variables are passed by value by default and there is no advantage in most cases in passing by reference as you are still passing a value (It's just the pointer rather than the value it refers to). Keep in mind that a pointer to a byte will be bigger than the byte its self!
With more complex types such as arrays and structs C will pass them by reference by default. It is inefficient to create a copy of a struct to pass into a function unless the function needs its own copy of the structure for some reason.
If you want to pass by reference, but you are worried about the function changing your struct you can use the const keyword to ensure it cannot be changed (or at least not easily).