I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now
The compiler will probably equalize them. If it has any brains at all, it will copy values.a, values.b, and values.c into local variables or registers, which is also what happens in the simple case.
The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.
This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)
No, your cpu would cache the variables you use over and over again.
I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
int a = objA.a;
int b = objA.b;
int c = objA.c;
I think it would be almost the same performance
If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.
you can also start your function with
int & a = values.a;
int & b = values.b;
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.
As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the this pointer) and some offset (the location in the function's stack frame, or the location inside the class).
Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the this pointer might not. If this is the case, accessing this (which is presumably on the stack) would require an extra memory lookup.
Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.
If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.
Related
First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.
Often times I read in literature explaining that one of the use case of C++ pointers is when one has big objects to deal with, but how large should an object be to need a pointer when being manipulated? Is there any guiding principle in this regard?
I don't think size is the main factor to consider.
Pointers (or references) are a way to designate a single bunch of data (be it an object, a function or a collection of untyped bytes) from different locations.
If you do copies instead of using pointers, you run the risk of having two separate versions of the same data becoming inconsistent with each other. If the two copies are meant to represent a single piece of information, then you will have to do twice the work to make sure they stay consistent.
So in some cases using a pointer to reference even a single byte could be the right thing to do, even though storing copies of the said byte would be more efficient in terms of memory usage.
EDIT: to answer jogojapan remarks, here is my opinion on memory efficiency
I often ran programs through profilers and discovered that an amazing percentage of the CPU power went into various forms of memory-to-memory copies.
I also noticed that the cost of optimizing memory efficiency was often offset by code complexity, for surprisingly little gains.
On the other hand, I spent many hours tracing bugs down to data inconsistencies, some of them requiring sizeable code refactoring to get rid of.
As I see it, memory efficiency should become more of a concern near the end of a project, when profiling reveals where the CPU/memory drain really occurs, while code robustness (especially data flows and data consistency) should be the main factor to consider in the early stages of conception and coding.
Only the bulkiest data types should be dimensionned at the start, if the application is expected to handle considerable amounts of data. In a modern PC, we are talking about hundreds of megabytes, which most applications will never need.
As I designed embedded software 10 or 20 years ago, memory usage was a constant concern. But in environments like a desktop PC where memory requirements are most of the time neglectible compared to the amount of available RAM, focusing on a reliable design seems more of a priority to me.
You should use a pointer when you want to refer to the same object at different places. In fact you can even use references for the same but pointers give you the added advantage of being able to refer different objects while references keep referring the same object.
On a second thought maybe you are referring to objects created on freestore using new etc and then referring them through pointers. There is no definitive rule for that but in general you can do so when:
Object being created is too large to be accommodated on stack or
You want to increase the lifetime of the object beyond the scope etc.
There is no such limitation or guideline. You will have to decide it.
Assume class definition below. Size is 100 ints = 400 bytes.
class test
{
private:
int m_nVar[100];
};
When you use following function definition(passed by value), copy constructor will get called (even if you don't provide one). So copying of 100 ints will happen which will obviously take some time to finish
void passing_to_function(test a);
When you change definition of function to reference or pointer, there is no such copying will happen. Just transfer of test* (only pointer size)
void passing_to_function(test& a);
So you obviously have advantage by passing by ref or passing by ptr than passing by value!
Is it more efficient for a class to access member variables or local variables? For example, suppose you have a (callback) method whose sole responsibility is to receive data, perform calculations on it, then pass it off to other classes. Performance-wise, would it make more sense to have a list of member variables that the method populates as it receives data? Or just declare local variables each time the callback method is called?
Assume this method would be called hundreds of times a second...
In case I'm not being clear, here's some quick examples:
// use local variables
class thisClass {
public:
void callback( msg& msg )
{
int varA;
double varB;
std::string varC;
varA = msg.getInt();
varB = msg.getDouble();
varC = msg.getString();
// do a bunch of calculations
}
};
// use member variables
class thisClass {
public:
void callback( msg& msg )
{
m_varA = msg.getInt();
m_varB = msg.getDouble();
m_varC = msg.getString();
// do a bunch of calculations
}
private:
int m_varA;
double m_varB;
std::string m_varC;
};
Executive summary: In virtually all scenarios, it doesn't matter, but there is a slight advantage for local variables.
Warning: You are micro-optimizing. You will end up spending hours trying to understand code that is supposed to win a nanosecond.
Warning: In your scenario, performance shouldn't be the question, but the role of the variables - are they temporary, or state of thisClass?
Warning: First, second and last rule of optimization: measure!
First of all, look at the typical assembly generated for x86 (your platform may vary):
// stack variable: load into eax
mov eax, [esp+10]
// member variable: load into eax
mov ecx, [adress of object]
mov eax, [ecx+4]
Once the address of the object is loaded, int a register, the instructions are identical. Loading the object address can usually be paired with an earlier instruction and doesn't hit execution time.
But this means the ecx register isn't available for other optimizations. However, modern CPUs do some intense trickery to make that less of an issue.
Also, when accessing many objects this may cost you extra. However, this is less than one cycle average, and there are often more opprtunities for pairing instructions.
Memory locality: here's a chance for the stack to win big time. Top of stack is virtually always in the L1 cache, so the load takes one cycle. The object is more likely to be pushed back to L2 cache (rule of thumb, 10 cycles) or main memory (100 cycles).
However, you pay this only for the first access. if all you have is a single access, the 10 or 100 cycles are unnoticable. if you have thousands of accesses, the object data will be in L1 cache, too.
In summary, the gain is so small that it virtually never makes sense to copy member variables into locals to achieve better performance.
I'd prefer the local variables on general principles, because they minimize evil mutable state in your program. As for performance, your profiler will tell you all you need to know. Locals should be faster for ints and perhaps other builtins, because they can be put in registers.
This should be your compilers problem. Instead, optimize for maintainability: If the information is only ever used locally, store it in local (automatic) variables. I hate reading classes littered with member variables that don't actually tell me anything about the class itself, but only some details about how a bunch of methods work together :(
In fact, I would be surprised if local variables aren't faster anyway - they are bound to be in cache, since they are close to the rest of the functions data (call frame) and an objects pointer might be somewhere totally else - but I am just guessing here.
Silly question.
It all depends on the compiler and what it does for optimization.
Even if it did work what have you gained? Way to obfuscate your code?
Variable access is usually done via a pointer and and offset.
Pointer to Object + offset
Pointer to Stack Frame + offset
Also don't forget to add in the cost of moving the variables to local storage and then copying the results back. All of which could be meaning less as the compiler may be smart enough to optimize most of it away anyway.
A few points that have not been mentioned explicitly by others:
You are potentially invoking assignment operators in your code.
e.g varC = msg.getString();
You have some wasted cycles every time the function frame is setup. You are creating variables, default constructor called, then invoke the assignment operator to get the RHS value into the locals.
Declare the locals to be const-refs and, of course, initialize them.
Member variables might be on the heap(if your object was allocated there) and hence suffer from non-locality.
Even a few cycles saved is good - why waste computation time at all, if you could avoid it.
When in doubt, benchmark and see for yourself. And make sure it makes a difference first - hundreds of times a second isn't a huge burden on a modern processor.
That said, I don't think there will be any difference. Both will be constant offsets from a pointer, the locals will be from the stack pointer and the members will be from the "this" pointer.
In my oppinion, it should not impact performance, because:
In Your first example, the variables are accessed via a lookup on the stack, e.g. [ESP]+4 which means current end of stack plus four bytes.
In the second example, the variables are accessed via a lookup relative to this (remember, varB equals to this->varB). This is a similar machine instruction.
Therefore, there is not much of a difference.
However, You should avoid copying the string ;)
The amount of data that you will be interacting with will have a bigger influence on the execution speed than the way you represent the data in the implementation of the algorithm.
The processor does not really care if the data is on the stack or on the heap (apart from the chance that the top of the stack will be in the processor cache as peterchen mentioned) but for maximum speed, the data will have to fit into the processor's cache (L1 cache if you have more than one level of cache, which pretty much all modern processors have). Any load from L2 cache - or $DEITY forbid, main memory - will slow down the execution. So if you're processing a string that's a few hundred KB in size and chances on every invocation, the difference will not even be measurable.
Keep in mind that in most cases, a 10% speedup in a program is pretty much undetectable to the end user (unless you manage to reduce the runtime of your overnight batch from 25h back to less than 24h) so this is not worth fretting over unless you are sure and have the profiler output to back up that this particular piece of code is within the 10%-20% 'hot area' that has a major influence over your program's runtime.
Other considerations should be more important, like maintainability or other external factors. For example if the above code is in heavily multithreaded code, using local variables can make the implementation easier.
It depends, but I expect there would be absolutely no difference.
What is important is this: Using member variables as temporaries will make your code non-reentrant - For example, it will fail if two threads try to call callback() on the same object. Using static locals (or static member variables) is even worse, because then your code will fail if two threads try to call callback() on any thisClass object - or descendant.
Using the member variables should be marginally faster since they only have to be allocated once (when the object is constructed) instead of every time the callback is invoked. But in comparison to the rest of the work you're probably doing I expect this would be a very tiny percentage. Benckmark both and see which is faster.
Also, there's a third option: static locals. These don't get re-allocated every time the function is called (in fact, they get preserved across calls) but they don't pollute the class with excessive member variables.
Lets say I know a guy who is new to C++. He does not pass around pointers (rightly so) but he refuses to pass by reference. He uses pass by value always. Reason being that he feels that "passing objects by reference is a sign of a broken design".
The program is a small graphics program and most of the passing in question is mathematical Vector(3-tuple) objects. There are some big controller objects but nothing more complicated than that.
I'm finding it hard to find a killer argument against only using the stack.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
On the pro side, I believe the stack is faster at allocating/deallocating memory and has a constant allocation time.
The only major argument I can think of is that the stack could possibly overflow, but I'm guessing that it is improbable that this will occur? Are there any other arguments against using only the stack/pass by value as opposed to pass by reference?
Subtyping-polymorphism is a case where passing by value wouldn't work because you would slice the derived class to its base class. Maybe to some, using subtyping-polymorphism is bad design?
Your friend's problem is not his idea as much as his religion. Given any function, always consider the pros and cons of passing by value, reference, const reference, pointer or smart pointer. Then decide.
The only sign of broken design I see here is your friend's blind religion.
That said, there are a few signatures that don't bring much to the table. Taking a const by value might be silly, because if you promise not to change the object then you might as well not make your own copy of it. Unless its a primitive, of course, in which case the compiler can be smart enough to take a reference still. Or, sometimes it's clumsy to take a pointer to a pointer as argument. This adds complexity; instead, you might be able to get away with it by taking a reference to a pointer, and get the same effect.
But don't take these guidelines as set in stone; always consider your options because there is no formal proof that eliminates any alternative's usefulness.
If you need to change the argument for your own needs, but don't want to affect the client, then take the argument by value.
If you want to provide a service to the client, and the client is not closely related to the service, then consider taking an argument by reference.
If the client is closely related to the service then consider taking no arguments but write a member function.
If you wish to write a service function for a family of clients that are closely related to the service but very distinct from each other then consider taking a reference argument, and perhaps make the function a friend of the clients that need this friendship.
If you don't need to change the client at all then consider taking a const-reference.
There are all sorts of things that cannot be done without using references - starting with a copy constructor. References (or pointers) are fundamental and whether he likes it or not, he is using references. (One advantage, or maybe disadvantage, of references is that you do not have to alter the code, in general, to pass a (const) reference.) And there is no reason not to use references most of the time.
And yes, passing by value is OK for smallish objects without requirements for dynamic allocation, but it is still silly to hobble oneself by saying "no references" without concrete measurements that the so-called overhead is (a) perceptible and (b) significant. "Premature optimization is the root of all evil"1.
1
Various attributions, including C A Hoare (although apparently he disclaims it).
I think there is a huge misunderstanding in the question itself.
There is not relationship between stack or heap allocated objects on the one hand and pass by value or reference or pointer on the other.
Stack vs Heap allocation
Always prefer stack when possible, the object's lifetime is then managed for you which is much easier to deal with.
It might not be possible in a couple of situations though:
Virtual construction (think of a Factory)
Shared Ownership (though you should always try to avoid it)
And I might miss some, but in this case you should use SBRM (Scope Bound Resources Management) to leverage the stack lifetime management abilities, for example by using smart pointers.
Pass by: value, reference, pointer
First of all, there is a difference of semantics:
value, const reference: the passed object will not be modified by the method
reference: the passed object might be modified by the method
pointer/const pointer: same as reference (for the behavior), but might be null
Note that some languages (the functional kind like Haskell) do not offer reference/pointer by default. The values are immutable once created. Apart from some work-arounds for dealing with the exterior environment, they are not that restricted by this use and it somehow makes debugging easier.
Your friend should learn that there is absolutely nothing wrong with pass-by-reference or pass-by-pointer: for example thing of swap, it cannot be implemented with pass-by-value.
Finally, Polymorphism does not allow pass-by-value semantics.
Now, let's speak about performances.
It's usually well accepted that built-ins should be passed by value (to avoid an indirection) and user-defined big classes should be passed by reference/pointer (to avoid copying). big in fact generally means that the Copy Constructor is not trivial.
There is however an open question regarding small user-defined classes. Some articles published recently suggest that in some case pass-by-value might allow better optimization from the compiler, for example, in this case:
Object foo(Object d) { d.bar(); return d; }
int main(int argc, char* argv[])
{
Object o;
o = foo(o);
return 0;
}
Here a smart compiler is able to determine that o can be modified in place without any copying! (It is necessary that the function definition be visible I think, I don't know if Link-Time Optimization would figure it out)
Therefore, there is only one possibility to the performance issue, like always: measure.
Reason being that he feels that "passing objects by reference is a sign of a broken design".
Although this is wrong in C++ for purely technical reasons, always using pass-by-value is a good enough approximation for beginners – it’s certainly much better than passing everything by pointers (or perhaps even than passing everything by reference). It will make some code inefficient but, hey! As long as this doesn’t bother your friend, don’t be unduly disturbed by this practice. Just remind him that someday he might want to reconsider.
On the other hand, this:
There are some big controller objects but nothing more complicated than that.
is a problem. Your friend is talking about broken design, and then all the code uses are a few 3D vectors and large control structures? That is a broken design. Good code achieves modularity through the use of data structures. It doesn’t seem as though this were the case.
… And once you use such data structures, code without pass-by-reference may indeed become quite inefficient.
First thing is, stack rarely overflows outside this website, except in the recursion case.
About his reasoning, I think he might be wrong because he is too generalized, but what he has done might be correct... or not?
For example, the Windows Forms library use Rectangle struct that have 4 members, the Apple's QuartzCore also has CGRect struct, and those structs always passed by value. I think we can compare that to Vector with 3 floating-point variable.
However, as I do not see the code, I feel I should not judge what he has done, though I have a feeling he might did the right thing despite of his over generalized idea.
I would argue that pass by value is fine for small objects such as vectors but even then there is a lot of unnecessary copying occurring in the code. Passing large objects by value is obviously wasteful and most likely not what you want functionally.
It's not quite as obvious as you might think. C++ compilers perform copy elision very aggressively, so you can often pass by value without incurring the cost of a copy operation. And in some cases, passing by value might even be faster.
Before condemning the issue for performance reasons, you should at the very least produce the benchmarks to back it up. And they might be hard to create because the compiler typically eliminates the performance difference.
So the real issue should be one of semantics. How do you want your code to behave? Sometimes, reference semantics are what you want, and then you should pass by reference. If you specifically want/need value semantics then you pass by value.
There is one point in favor of passing by value. It's helpful in achieving a more functional style of code, with fewer side effects and where immutability is the default. That makes a lot of code easier to reason about, and it may make it easier to parallelize the code as well.
But in truth, both have their place. And never using pass-by-reference is definitely a big warning sign.
For the last 6 months or so, I've been experimenting with making pass-by-value the default. If I don't explicitly need reference semantics, then I try to assume that the compiler will perform copy elision for me, so I can pass by value without losing any efficiency.
So far, the compiler hasn't really let me down. I'm sure I'll run into cases where I have to go back and change some calls to passing by reference, but I'll do that when I know that
performance is a problem, and
the compiler failed to apply copy elision
I would say that Not using pointers in C is a sign of a newbie programmer.
It sounds like your friend is scared of pointers.
Remember, C++ pointers were actually inherited from the C language, and C was developed when computers were much less powerful. Nevertheless, speed and efficiency continue to be vital until this day.
So, why use pointers? They allow the developer to optimize a program to run faster or use less memory that it would otherwise! Referring to the memory location of a data is much more efficient then copying all the data around.
Pointers usually are a concept that is difficult to grasp for those beginning to program, because all the experiments done involve small arrays, maybe a few structs, but basically they consist of working with a couple of megabytes (if you're lucky) when you have 1GB of memory laying around the house. In this scene, a couple of MB are nothing and it usually is too little to have a significant impact on the performance of your program.
So let's exaggerate that a little bit. Think of a char array with 2147483648 elements - 2GB of data - that you need to pass to function that will write all the data to the disk. Now, what technique do you think is going to be more efficient/faster?
Pass by value, which is going to have to re-copy those 2GB of data to another location in memory before the program can write the data to the disk, or
Pass by reference, which will just refer to that memory location.
What happens when you just don't have 4GB of RAM? Will you spend $ and buy chips of RAM just because you are afraid of using pointers?
Re-copying the data in memory sounds a bit redundant when you don't have to, and its a waste of computer resource.
Anyway, be patient with your friend. If he would like to become a serious/professional programmer at some point in his life he will eventually have to take the time to really understand pointers.
Good Luck.
As already mentioned the big difference between a reference and a pointer is that a pointer can be null. If a class requires data a reference declaration will make it required. Adding const will make it 'read only' if that is what is desired by the caller.
The pass-by-value 'flaw' mentioned is simply not true. Passing everything by value will completely change the performance of an application. It is not so bad when primitive types (i.e. int, double, etc.) are passed by value but when a class instance is passed by value temporary objects are created which requires constructors and later on destructor's to be called on the class and on all of the member variable in the class. This is exasperated when large class hierarchies are used because parent class constructors/destructor's must be called as well.
Also, just because the vector is passed by value does not mean that it only uses stack memory. heap may be used for each element as it is created in the temporary vector that is passed to the method/function. The vector itself may also have to reallocate via heap if it reaches its capacity.
If pass by value is being so that the callers values are not modified then just use a const reference.
The answers that I've seen so far have all focused on performance: cases where pass-by-reference is faster than pass-by-value. You may have more success in your argument if you focus on cases that are impossible with pass-by-value.
Small tuples or vectors are a very simple type of data-structure. More complex data-structures share information, and that sharing can't be represented directly as values. You either need to use references/pointers or something that simulates them such as arrays and indices.
Lots of problems boil down to data that forms a Graph, or a Directed-Graph. In both cases you have a mixture of edges and nodes that need to be stored within the data-structure. Now you have the problem that the same data needs to be in multiple places. If you avoid references then firstly the data needs to be duplicated, and then every change needs to be carefully replicated in each of the other copies.
Your friend's argument boils down to saying: tackling any problem complex enough to be represented by a Graph is a bad-design....
The only major argument I can think of
is that the stack could possibly
overflow, but I'm guessing that it is
improbable that this will occur? Are
there any other arguments against
using only the stack/pass by value as
opposed to pass by reference?
Well, gosh, where to start...
As you mention, "there is a lot of unnecessary copying occurring in the code". Let's say you've got a loop where you call a function on these objects. Using a pointer instead of duplicating the objects can accelerate execution by one or more orders of magnitude.
You can't pass a variable-sized data structures, arrays, etc. around on the stack. You have to dynamically allocate it and pass a pointers or reference to the beginning. If your friend hasn't run into this, then yes, he's "new to C++."
As you mention, the program in question is simple and mostly uses quite small objects like graphics 3-tuples, which if the elements are doubles would be 24 bytes apiece. But in graphics, it's common to deal with 4x4 arrays, which handle both rotation and translation. Those would be 128 bytes apiece, so if a program that had to deal with those would be five times slower per function call with pass-by-value due to the increased copying. With pass-by-reference, passing a 3-tuple or a 4x4 array in a 32-bit executable would just involve duplicating a single 4-byte pointer.
On register-rich CPU architecures like ARM, PowerPC, 64-bit x86, 680x0 - but not 32-bit x86 - pointers (and references, which are secretly pointers wearing fancy syntatical clothing) are commonly be passed or returned in a register, which is really freaking fast compared to the memory access involved in a stack operation.
You mention the improbability of running out of stack space. And yes, that's so on a small program one might write for a class assignment. But a couple of months ago, I was debugging commercial code that was probably 80 function calls below main(). If they'd used pass-by-value instead of pass-by-reference, the stack would have been ginormous. And lest your friend think this was a "broken design", this was actually a WebKit-based browser implemented on Linux using GTK+, all of which is very state-of-the-art, and the function call depth is normal for professional code.
Some executable architectures limit the size of an individual stack frame, so even though you might not run out of stack space per se, you could exceed that and wind up with perfectly valid C++ code that wouldn't build on such a platform.
I could go on and on.
If your friend is interested in graphics, he should take a look at some of the common APIs used in graphics: OpenGL and XWindows on Linux, Quartz on Mac OS X, Direct X on Windows. And he should look at the internals of large C/C++ systems like the WebKit or Gecko HTML rendering engines, or any of the Mozilla browsers, or the GTK+ or Qt GUI toolkits. They all pass by anything much larger than a single integer or float by reference, and often fill in results by reference rather than as a function return value.
Nobody with any serious real world C/C++ chops - and I mean nobody - passes data structures by value. There's a reason for this: it's just flipping inefficient and problem-prone.
Wow, there are already 13 answers… I didn't read all in detail but I think this is quite different from the others…
He has a point. The advantage of pass-by-value as a rule is that subroutines cannot subtly modify their arguments. Passing non-const references would indicate that every function has ugly side effects, indicating poor design.
Simply explain to him the difference between vector3 & and vector3 const&, and demonstrate how the latter may be initialized by a constant as in vec_function( vector3(1,2,3) );, but not the former. Pass by const reference is a simple optimization of pass by value.
Buy your friend a good c++ book. Passing non-trivial objects by reference is a good practice and saves you a lot of unneccessary constructor/destructor calls. This has also nothing to do with allocating on free store vs. using stack. You can (or should) pass objects allocated on program stack by reference without any free store usage. You also can ignore free store completely, but that throws you back to the old fortran days which your friend probably hadn't in mind - otherwise he would pick an ancient f77 compiler for your project, wouldn't he...?
Is it more efficient for a class to access member variables or local variables? For example, suppose you have a (callback) method whose sole responsibility is to receive data, perform calculations on it, then pass it off to other classes. Performance-wise, would it make more sense to have a list of member variables that the method populates as it receives data? Or just declare local variables each time the callback method is called?
Assume this method would be called hundreds of times a second...
In case I'm not being clear, here's some quick examples:
// use local variables
class thisClass {
public:
void callback( msg& msg )
{
int varA;
double varB;
std::string varC;
varA = msg.getInt();
varB = msg.getDouble();
varC = msg.getString();
// do a bunch of calculations
}
};
// use member variables
class thisClass {
public:
void callback( msg& msg )
{
m_varA = msg.getInt();
m_varB = msg.getDouble();
m_varC = msg.getString();
// do a bunch of calculations
}
private:
int m_varA;
double m_varB;
std::string m_varC;
};
Executive summary: In virtually all scenarios, it doesn't matter, but there is a slight advantage for local variables.
Warning: You are micro-optimizing. You will end up spending hours trying to understand code that is supposed to win a nanosecond.
Warning: In your scenario, performance shouldn't be the question, but the role of the variables - are they temporary, or state of thisClass?
Warning: First, second and last rule of optimization: measure!
First of all, look at the typical assembly generated for x86 (your platform may vary):
// stack variable: load into eax
mov eax, [esp+10]
// member variable: load into eax
mov ecx, [adress of object]
mov eax, [ecx+4]
Once the address of the object is loaded, int a register, the instructions are identical. Loading the object address can usually be paired with an earlier instruction and doesn't hit execution time.
But this means the ecx register isn't available for other optimizations. However, modern CPUs do some intense trickery to make that less of an issue.
Also, when accessing many objects this may cost you extra. However, this is less than one cycle average, and there are often more opprtunities for pairing instructions.
Memory locality: here's a chance for the stack to win big time. Top of stack is virtually always in the L1 cache, so the load takes one cycle. The object is more likely to be pushed back to L2 cache (rule of thumb, 10 cycles) or main memory (100 cycles).
However, you pay this only for the first access. if all you have is a single access, the 10 or 100 cycles are unnoticable. if you have thousands of accesses, the object data will be in L1 cache, too.
In summary, the gain is so small that it virtually never makes sense to copy member variables into locals to achieve better performance.
I'd prefer the local variables on general principles, because they minimize evil mutable state in your program. As for performance, your profiler will tell you all you need to know. Locals should be faster for ints and perhaps other builtins, because they can be put in registers.
This should be your compilers problem. Instead, optimize for maintainability: If the information is only ever used locally, store it in local (automatic) variables. I hate reading classes littered with member variables that don't actually tell me anything about the class itself, but only some details about how a bunch of methods work together :(
In fact, I would be surprised if local variables aren't faster anyway - they are bound to be in cache, since they are close to the rest of the functions data (call frame) and an objects pointer might be somewhere totally else - but I am just guessing here.
Silly question.
It all depends on the compiler and what it does for optimization.
Even if it did work what have you gained? Way to obfuscate your code?
Variable access is usually done via a pointer and and offset.
Pointer to Object + offset
Pointer to Stack Frame + offset
Also don't forget to add in the cost of moving the variables to local storage and then copying the results back. All of which could be meaning less as the compiler may be smart enough to optimize most of it away anyway.
A few points that have not been mentioned explicitly by others:
You are potentially invoking assignment operators in your code.
e.g varC = msg.getString();
You have some wasted cycles every time the function frame is setup. You are creating variables, default constructor called, then invoke the assignment operator to get the RHS value into the locals.
Declare the locals to be const-refs and, of course, initialize them.
Member variables might be on the heap(if your object was allocated there) and hence suffer from non-locality.
Even a few cycles saved is good - why waste computation time at all, if you could avoid it.
When in doubt, benchmark and see for yourself. And make sure it makes a difference first - hundreds of times a second isn't a huge burden on a modern processor.
That said, I don't think there will be any difference. Both will be constant offsets from a pointer, the locals will be from the stack pointer and the members will be from the "this" pointer.
In my oppinion, it should not impact performance, because:
In Your first example, the variables are accessed via a lookup on the stack, e.g. [ESP]+4 which means current end of stack plus four bytes.
In the second example, the variables are accessed via a lookup relative to this (remember, varB equals to this->varB). This is a similar machine instruction.
Therefore, there is not much of a difference.
However, You should avoid copying the string ;)
The amount of data that you will be interacting with will have a bigger influence on the execution speed than the way you represent the data in the implementation of the algorithm.
The processor does not really care if the data is on the stack or on the heap (apart from the chance that the top of the stack will be in the processor cache as peterchen mentioned) but for maximum speed, the data will have to fit into the processor's cache (L1 cache if you have more than one level of cache, which pretty much all modern processors have). Any load from L2 cache - or $DEITY forbid, main memory - will slow down the execution. So if you're processing a string that's a few hundred KB in size and chances on every invocation, the difference will not even be measurable.
Keep in mind that in most cases, a 10% speedup in a program is pretty much undetectable to the end user (unless you manage to reduce the runtime of your overnight batch from 25h back to less than 24h) so this is not worth fretting over unless you are sure and have the profiler output to back up that this particular piece of code is within the 10%-20% 'hot area' that has a major influence over your program's runtime.
Other considerations should be more important, like maintainability or other external factors. For example if the above code is in heavily multithreaded code, using local variables can make the implementation easier.
It depends, but I expect there would be absolutely no difference.
What is important is this: Using member variables as temporaries will make your code non-reentrant - For example, it will fail if two threads try to call callback() on the same object. Using static locals (or static member variables) is even worse, because then your code will fail if two threads try to call callback() on any thisClass object - or descendant.
Using the member variables should be marginally faster since they only have to be allocated once (when the object is constructed) instead of every time the callback is invoked. But in comparison to the rest of the work you're probably doing I expect this would be a very tiny percentage. Benckmark both and see which is faster.
Also, there's a third option: static locals. These don't get re-allocated every time the function is called (in fact, they get preserved across calls) but they don't pollute the class with excessive member variables.