I read in the standards n4296 (Draft) § 1.8 page 7:
An object is a region of storage. [ Note: A function is not an object,
regardless of whether or not it occupies storage in the way that
objects do. —end note ]
I spent some days on the net looking for a good reason for such exclusion, with no luck. Maybe because I do not fully understand objects. So:
Why is a function not an object? How does it differ?
And does this have any relation with the functors (function objects)?
A lot of the difference comes down to pointers and addressing. In C++¹ pointers to functions and pointers to objects are strictly separate kinds of things.
C++ requires that you can convert a pointer to any object type into a pointer to void, then convert it back to the original type, and the result will be equal to the pointer you started with². In other words, regardless of exactly how they do it, the implementation has to ensure that a conversion from pointer-to-object-type to pointer-to-void is lossless, so no matter what the original was, whatever information it contained can be recreated so you can get back the same pointer as you started with by conversion from T* to void * and back to T*.
That's not true with a pointer to a function though--if you take a pointer to a function, convert it to void *, and then convert it back to a pointer to a function, you may lose some information in the process. You might not get back the original pointer, and dereferencing what you do get back gives you undefined behavior (in short, don't do that).
For what it's worth, you can, however, convert a pointer to one function to a pointer to a different type of function, then convert that result back to the original type, and you're guaranteed that the result is the same as you started with.
Although it's not particularly relevant to the discussion at hand, there are a few other differences that may be worth noting. For example, you can copy most objects--but you can't copy any functions.
As far as relationship to function objects goes: well, there really isn't much of one beyond one point: a function object supports syntax that looks like a function call--but it's still an object, not a function. So, a pointer to a function object is still a pointer to an object. If, for example, you convert one to void *, then convert it back to the original type, you're still guaranteed that you get back the original pointer value (which wouldn't be true with a pointer to a function).
As to why pointers to functions are (at least potentially) different from pointers to objects: part of it comes down to existing systems. For example, on MS-DOS (among others) there were four entirely separate memory models: small, medium, compact, and large. Small model used 16 bit addressing for either functions or data. Medium used 16 bit addresses for data, and 20-bit addresses for code. Compact reversed that (16 bit addresses for code, 20-bit addresses for data). Large used 20-bit addresses for both code and data. So, in either compact or medium model, converting between pointers to code and pointers to functions really could and did lead to problems.
More recently, a fair number of DSPs have used entirely separate memory buses for code and for data and (like with MS-DOS memory models) they were often different widths, converting between the two could and did lose information.
These particular rules came to C++ from C, so the same is true in C, for whatever that's worth.
Although it's not directly required, with the way things work, pretty much the same works out to be true for a conversion from the original type to a pointer to char and back, for whatever that's worth.
Why a function is not an object? How does it differ?
To understand this, let's move from bottom to top in terms of abstractions involved. So, you have your address space through which you can define the state of the memory and we have to remember that fundamentally it's all about this state you operate on.
Okay, let's move a bit higher in terms of abstractions. I am not taking about any abstractions imposed by a programming language yet (like object, array, etc.) but simply as a layman I want to keep a record of a portion of the memory, lets call it Ab1 and another one called Ab2.
Both have a state fundamentally but I intend to manipulate/make use of the state differently.
Differently...Why and How?
Why ?
Because of my requirements (to perform addition of 2 numbers and store the result back, for example). I will be using use Ab1 as a long usage state and Ab2 as relatively shorter usage state. So, I will create a state for Ab1(with the 2 numbers to add) and then use this state to populate some of state of Ab2(copy them temporarily) and perform further manipulation of Ab2(add them) and save a portion of resultant Ab2 to Ab1(the added result). Post that Ab2 becomes useless and we reset its state.
How?
I am going to need some management of both the portions to keep track of what words to pick from Ab1 and copy to Ab2 and so on. At this point I realize that I can make it work to perform some simple operations but something serious shall require a laid out specification for managing this memory.
So, I look for such management specification and it turns out there exists a variety of these specifications (with some having built-in memory model, others provide flexibility to manage the memory yourself) with a better design. In-fact because they(without even dictating how to manage the memory directly) have successfully defined the encapsulation for this long lived storage and rules for how and when this can be created and destroyed.
The same goes for Ab2 but the way they present it makes me feel like this is much different from Ab1. And indeed, it turns out to be. They use a stack for state manipulation of Ab2 and reserve memory from heap for Ab1. Ab2 dies after a while.(after finished executing).
Also, the way you define what to do with Ab2 is done through yet another storage portion called Ab2_Code and specification for Ab1 involves similarly Ab1_Code
I would say, this is fantastic! I get so much convenience that allows me to solve so many problems.
Now, I am still looking from a layman's perspective so I don't feel surprised really having gone through the thought process of it all but if you question things top-down, things can get a bit difficult to put into perspective.(I suspect that's what happened in your case)
BTW, I forgot to mention that Ab1 is called an object officially and Ab2 a function stack while Ab1_Code is the class definition and Ab2_Code is the function definition code.
And it is because of these differences imposed by the PL, you find that they are so different.(your question)
Note: Don't take my representation of Ab1/Object as a long storage abstraction as a rule or a concrete thing - it was from layman perspective. The programming language provides much more flexibility in terms of managing lifecycle of an object. So, object may be deployed like Ab1 but it can be much more.
And does this have any relation with the functors (function objects)?
Note that the first part answer is valid for many programming languages in general(including C++), this part has to do specifically with C++ (whose spec you quoted). So you have pointer to a function, you can have a pointer to an object too. Its just another programming construct that C++ defines. Notice that this is about having a pointer to the Ab1, Ab2 to manipulate them rather than having another distinct abstraction to act upon.
You can read about its definition, usage here:
C++ Functors - and their uses
Let me answer the question in simpler language (terms).
What does a function contain?
It basically contains instructions to do something. While executing the instructions, the function can temporarily store and / or use some data - and might return some data.
Although the instructions are stored somewhere - those instructions themselves are not considered as objects.
Then, what are the objects?
Generally, objects are entities which contain data - which get manipulated / changed / updated by functions (the instructions).
Why the difference?
Because computers are designed in such way that the instructions do not depend on the data.
To understand this, let's think about a calculator. We do different mathematical operations using a calculator. Say, if we want to add some numbers, we provide the numbers to the calculator. No matter what the numbers are, the calculator will add them in the same way following the same instructions (if the result exceeds the calculator's capacity to store, it will show an error - but that is because of calculator's limitation to store the result (the data), not because of its instructions for addition).
Computers are designed in the similar manner. That is why when you use a library function (for example qsort()) on some data which are compatible with the function, you get the same result as you expect - and the functionality of the function doesn't change if the data changes - because the instructions of the function remains unchanged.
Relation between function and functors
Functions are set of instructions; and while they are being executed, some temporary data can be required to store. In other words, some objects might be temporarily created while executing the function. These temporary objects are functors.
Related
Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!
I have been writing a program that has a rather large structure that is passed by reference to a few functions. However, there are a few other functions that need access to small pieces of information within the large structure. It's not being edited, just read.
I was thinking of creating a a second structure that just copies the specific pieces of information needed and passing that by reference, rather than passing the entire structure by reference.
What I am wondering is two things:
Since I am passing the large structure by reference, there really is no performance impact. Correct?
Even if 1) is correct, is it bad practice to be passing around a structure that shouldn't be edited (even though it wouldn't be edited, but still I'm talking about the principle here).
More specifically:
I have a configuration structure that sets up the programs configuration by calling a function and passing the structure by reference. There is some information (process name, command line arguments) that I want to use for informative purposes only. I'm asking if it's bad practice to pass around a structure that wasn't meant for the purpose of what I want to use it for.
1) Since I am passing the large structure by reference, there really is no performance impact. Correct?
Correct.
2) Even if 1) is correct, is it bad etiquette to be passing around a structure that shouldn't be edited (even though it wouldn't be edited, but still I'm talking about the principle here).
You could let your function accept a reference to const to make sure the function won't alter the state of the corresponding argument.
I'm asking if it's bad practice to pass around a structure that wasn't meant for the purpose of what I want to use it for.
I'm not sure what you mean by this. The way you write it, this definitely seems to be a bad practice: you shouldn't use something for doing what it wasn't meant for. That means distorting the semantics of an object. However, the rest of your question doesn't seem to imply this.
Rather, it seems like you are concerned with passing a reference to a function because that may allow the function to alter the argument's state; but provided the function takes a reference to const, it won't be able to alter the state of its argument. In that case, no it's not a bad practice.
If you are referring to the fact that the function only need to work with some of the data members or member functions of your structure, then again that is not necessarily a bad design. It would be silly to require that each function access every member of a data structure.
Of course, this is the best I can write without knowing anything concrete about the semantics of the function and the particular data structure.
Correct.
Pass it by const reference; you'll get the performance gains of pass-by-reference withoug allowing editing.
By the way, if only a fraction of the "big structure" is required to that function it may be an indicator that such fields store some information "on their own" - i.e. the rest of the "big struct" is not needed to interpret them correctly. In this case, you may consider moving them to a separate struct, that will itself be a member of the first "big struct".
As one step further, you can keep such configuration objects in a shared pointer and pass it anywhere you want and so you dont have to worry about ownership of the structure. In this way you ensure that a single original configuration object is shared by the all program components
Like others have said, use const.
If you are doing C++, access those small pieces of information with accessor functions. Then functions that don't need to change the state of your struct will not have to touch any member fields, only member functions.
As others have mentioned, const& if you aren't modifying the data.
However, your point about "should I copy the data to a smaller struct" has mostly been glossed over. The answer is "maybe".
A good reason not to do it is that it is a waste of time -- literally, it costs time to copy stuff around.
A good reason to do it is that it reduces the effective state of your subprocedure. A subprocedure that doesn't access global variables (and hence global state), and isn't passed any pointers, has a very limited state. Procedures with limited state are easier to test, often easier to understand, and usually easier to debug.
Often you want to call each function with the absolute least amount of data required for that function to solve the problem it has. If you avoid passing in a "pointer to everything" (and references are pointers) to every function, you can maintain this rule, and it can often result in code that is easier to maintain.
On the other hand, stripping the data out of the big monolithic state and into small local structs can contain bugs and errors.
One way to avoid this problem entirely is to avoid the big monolithic state object with parameters all mixed together, and if there are some parameters that are bundled together to answer some questions, they should be in their own sub-struct to start with. Now calling the subprocedure is easy -- you pass in the sub-struct which already has the parameters bundled.
I'm pretty much a beginner at C++. Just started learning it a few weeks ago. I'm really interested in improving my skills as a programmer, and there's something that's been confusing me in the last few days. It is pointers. Pointers and the reference operator. My question is, what exactly is the functionality of the pointers and reference operator? How will I know when to use them, and what are their purposes and common usages. Any examples consisting of common algorithms using dereference and reference will be greatly appreciated.
how can I use reference and dereference to become a better programmer, and improve my algorithms(and possibly make them simpler)?
Thanks :D
Definitely check this question out, the accepted answer explains pointers and common errors with them in a nice manner.
Update: a few words of my own
Pointers are bunches of bits, like any other kind of variable. We use them so much because they have several very convenient properties:
Their size (in bytes) is fixed, making it trivial to know how many bytes we need to read to get the value of a pointer.
When using other types of variables (e.g. objects), some mechanism needs to be in place so that the compiler knows how large each object is. This introduces various restrictions which vary among languages and compilers. Pointers have no such problems.
Their size is also small (typically 4 or 8 bytes), making it very fast to update their values.
This is very useful when you use the pointer as a token that points to a potentially large amount of information. Consider an example: we have a book with pictures of paintings. You need to describe a painting to me, so I can find it in the book. You can either sit down and paint an exact copy of it, show it to me, and let me search the book for it; or you can tell me "it's in page 25". This would be like using a pointer, and so much faster.
They can be used to implement polymorphism, which is one of the foundations of object-oriented-programming.
So, to find out how to use pointers: find cases where these properties will come in handy. :)
There's some things a programmer needs to understand before diving into pointers and C++ references.
First you must understand how a program works. When you write variables out, when you write statements, you need to understand what's happening at a lower level; it's important to know what happens from a computer stand-point.
Essentially your program becomes data in memory (a process) when you execute it. At this point you must have a simple way to reference spots of data - we call these variables. You can store things and read them, all from memory (the computers memory).
Now imagine having to pass some data to a function - you want this function to manipulate this data - you can either do this by passing the entire set of data, or you can do it by passing its address (the location of the data in memory). All the function really needs is the address of this data, it doesn't need the entire data itself.
So pointers are used exactly for this sort of task - when you need to pass address of data around - pointers in fact are just regular variables that contain an address.
C++ makes things a bit easier with references (int &var) but the concept is the same. It lets you skip the step of creating a pointer to store the address of some data, and it does it all automatically for you when passing data to a function.
This is just a simple introduction of how they work - you should read up on Google to search fo more detailed resources and all the cool things you can do with pointers/references.
Better name of the operator is "Address of" operator. Because it returns the address of the operand.
In C++ you will use pointers (and both reference/dereference operators) when dealing with dynamically allocated memory or when working with pointer arithmetic.
Pointers are also used to break down static bindings since they imply dynamic binding (through the address stored in the pointer, which can change dynamically).
For all other uses, it is usually better to use references instead of pointers.
to be short:
reference are some improvment of pointers that inherited from C to C++
its a bit safer because it helps you avoid using "*" in your functions and that cause you less segmentation faults.
or like my frines say "avoid the starwars"
there is a lot to learn about it !!!!
look for the use of "&" for sending and receiving values by refrence
understand the use of "&" for getting variable adress
its a very very big question, if you can be more specific it will be better.
Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?
Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?
Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.
There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).
I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.
Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.
First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision
I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.
As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.
The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....
The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.
Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.
Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?
In my application I have quite some void-pointers (this is because of historical reasons, application was originally written in pure C). In one of my modules I know that the void-pointers points to instances of classes that could inherit from a known base class, but I cannot be 100% sure of it. Therefore, doing a dynamic_cast on the void-pointer might give problems. Possibly, the void-pointer even points to a plain-struct (so no vptr in the struct).
I would like to investigate the first 4 bytes of the memory the void-pointer is pointing to, to see if this is the address of the valid vtable. I know this is platform, maybe even compiler-version-specific, but it could help me in moving the application forward, and getting rid of all the void-pointers over a limited time period (let's say 3 years).
Is there a way to get a list of all vtables in the application, or a way to check whether a pointer points to a valid vtable, and whether that instance pointing to the vtable inherits from a known base class?
I would like to investigate the first
4 bytes of the memory the void-pointer
is pointing to, to see if this is the
address of the valid vtable.
You can do that, but you have no guarantees whatsoever it will work. Y don't even know if the void* will point to the vtable. Last time I looked into this (5+ years ago) I believe some compiler stored the vtable pointer before the address pointed to by the instance*.
I know this is platform, maybe even
compiler-version-specific,
It may also be compiler-options speciffic, depending on what optimizations you use and so on.
but it could help me in moving the
application forward, and getting rid
of all the void-pointers over a
limited time period (let's say 3
years).
Is this the only option you can see for moving the application forward? Have you considered others?
Is there a way to get a list of all
vtables in the application,
No :(
or a way to check whether a pointer
points to a valid vtable,
No standard way. What you can do is open some class pointers in your favorite debugger (or cast the memory to bytes and log it to a file) and compare it and see if it makes sense. Even so, you have no guarantees that any of your data (or other pointers in the application) will not look similar enough (when cast as bytes) to confuse whatever code you like.
and whether that instance pointing to
the vtable inherits from a known base
class?
No again.
Here are some questions (you may have considered them already). Answers to these may give you more options, or may give us other ideas to propose:
how large is the code base? Is it feasible to introduce global changes, or is functionality to spread-around for that?
do you treat all pointers uniformly (that is: are there common points in your source code where you could plug in and add your own metadata?)
what can you change in your sourcecode? (If you have access to your memory allocation subroutines or could plug in your own for example you may be able to plug in your own metadata).
If different data types are cast to void* in various parts of your code, how do you decide later what is in those pointers? Can you use the code that discriminates the void* to decide if they are classes or not?
Does your code-base allow for refactoring methodologies? (refactoring in small iterations, by plugging in alternate implementations for parts of your code, then removing the initial implementation and testing everything)
Edit (proposed solution):
Do the following steps:
define a metadata (base) class
replace your memory allocation routines with custom ones which just refer to the standard / old routines (and make sure your code still works with the custom routines).
on each allocation, allocate the requested size + sizeof(Metadata*) (and make sure your code still works).
replace the first sizeof(Metadata*) bytes of your allocation with a standard byte sequence that you can easily test for (I'm partial to 0xDEADBEEF :D). Then, return [allocated address] + sizeof(Metadata*) to the application. On deallocation, take the recieved pointer, decrement it by `sizeof(Metadata*), then call the system / previous routine to perform the deallocation. Now, you have an extra buffer allocated in your code, specifically for metadata on each allocation.
In the cases you're interested in having metadata for, create/obtain a metadata class pointer, then set it in the 0xDEADBEEF zone. When you need to check metadata, reinterpret_cast<Metadata*>([your void* here]), decrement it, then check if the pointer value is 0xDEADBEEF (no metadata) or something else.
Note that this code should only be there for refactoring - for production code it is slow, error prone and generally other bad things that you do not want your production code to be. I would make all this code dependent on some REFACTORING_SUPPORT_ENABLED macro that would never allow your Metadata class to see the light of a production release (except for testing builds maybe).
I would say it is not possible without related reference (header declaration).
If you want to replace those void pointers to correct interface type, here is what I think to automate it:
Go through your codebase to get a list of all classes that has virtual functions, you could do this fast by writing script, like Perl
Write an function which take a void* pointer as input, and iterate over those classes try to dynamic_cast it, and log information if succeeded, such as interface type, code line
Call this function anywhere you used void* pointer, maybe you could wrap it with a macro so you could get file, line information easy
Run a full automation (if you have) and analyse the output.
The easier way would be to overload operator new for your particular base class. That way, if you know your void* pointers are to heap objects, then you can also with 100% certainty determine whether they're pointing to your object.