Is it more efficient for a class to access member variables or local variables? For example, suppose you have a (callback) method whose sole responsibility is to receive data, perform calculations on it, then pass it off to other classes. Performance-wise, would it make more sense to have a list of member variables that the method populates as it receives data? Or just declare local variables each time the callback method is called?
Assume this method would be called hundreds of times a second...
In case I'm not being clear, here's some quick examples:
// use local variables
class thisClass {
public:
void callback( msg& msg )
{
int varA;
double varB;
std::string varC;
varA = msg.getInt();
varB = msg.getDouble();
varC = msg.getString();
// do a bunch of calculations
}
};
// use member variables
class thisClass {
public:
void callback( msg& msg )
{
m_varA = msg.getInt();
m_varB = msg.getDouble();
m_varC = msg.getString();
// do a bunch of calculations
}
private:
int m_varA;
double m_varB;
std::string m_varC;
};
Executive summary: In virtually all scenarios, it doesn't matter, but there is a slight advantage for local variables.
Warning: You are micro-optimizing. You will end up spending hours trying to understand code that is supposed to win a nanosecond.
Warning: In your scenario, performance shouldn't be the question, but the role of the variables - are they temporary, or state of thisClass?
Warning: First, second and last rule of optimization: measure!
First of all, look at the typical assembly generated for x86 (your platform may vary):
// stack variable: load into eax
mov eax, [esp+10]
// member variable: load into eax
mov ecx, [adress of object]
mov eax, [ecx+4]
Once the address of the object is loaded, int a register, the instructions are identical. Loading the object address can usually be paired with an earlier instruction and doesn't hit execution time.
But this means the ecx register isn't available for other optimizations. However, modern CPUs do some intense trickery to make that less of an issue.
Also, when accessing many objects this may cost you extra. However, this is less than one cycle average, and there are often more opprtunities for pairing instructions.
Memory locality: here's a chance for the stack to win big time. Top of stack is virtually always in the L1 cache, so the load takes one cycle. The object is more likely to be pushed back to L2 cache (rule of thumb, 10 cycles) or main memory (100 cycles).
However, you pay this only for the first access. if all you have is a single access, the 10 or 100 cycles are unnoticable. if you have thousands of accesses, the object data will be in L1 cache, too.
In summary, the gain is so small that it virtually never makes sense to copy member variables into locals to achieve better performance.
I'd prefer the local variables on general principles, because they minimize evil mutable state in your program. As for performance, your profiler will tell you all you need to know. Locals should be faster for ints and perhaps other builtins, because they can be put in registers.
This should be your compilers problem. Instead, optimize for maintainability: If the information is only ever used locally, store it in local (automatic) variables. I hate reading classes littered with member variables that don't actually tell me anything about the class itself, but only some details about how a bunch of methods work together :(
In fact, I would be surprised if local variables aren't faster anyway - they are bound to be in cache, since they are close to the rest of the functions data (call frame) and an objects pointer might be somewhere totally else - but I am just guessing here.
Silly question.
It all depends on the compiler and what it does for optimization.
Even if it did work what have you gained? Way to obfuscate your code?
Variable access is usually done via a pointer and and offset.
Pointer to Object + offset
Pointer to Stack Frame + offset
Also don't forget to add in the cost of moving the variables to local storage and then copying the results back. All of which could be meaning less as the compiler may be smart enough to optimize most of it away anyway.
A few points that have not been mentioned explicitly by others:
You are potentially invoking assignment operators in your code.
e.g varC = msg.getString();
You have some wasted cycles every time the function frame is setup. You are creating variables, default constructor called, then invoke the assignment operator to get the RHS value into the locals.
Declare the locals to be const-refs and, of course, initialize them.
Member variables might be on the heap(if your object was allocated there) and hence suffer from non-locality.
Even a few cycles saved is good - why waste computation time at all, if you could avoid it.
When in doubt, benchmark and see for yourself. And make sure it makes a difference first - hundreds of times a second isn't a huge burden on a modern processor.
That said, I don't think there will be any difference. Both will be constant offsets from a pointer, the locals will be from the stack pointer and the members will be from the "this" pointer.
In my oppinion, it should not impact performance, because:
In Your first example, the variables are accessed via a lookup on the stack, e.g. [ESP]+4 which means current end of stack plus four bytes.
In the second example, the variables are accessed via a lookup relative to this (remember, varB equals to this->varB). This is a similar machine instruction.
Therefore, there is not much of a difference.
However, You should avoid copying the string ;)
The amount of data that you will be interacting with will have a bigger influence on the execution speed than the way you represent the data in the implementation of the algorithm.
The processor does not really care if the data is on the stack or on the heap (apart from the chance that the top of the stack will be in the processor cache as peterchen mentioned) but for maximum speed, the data will have to fit into the processor's cache (L1 cache if you have more than one level of cache, which pretty much all modern processors have). Any load from L2 cache - or $DEITY forbid, main memory - will slow down the execution. So if you're processing a string that's a few hundred KB in size and chances on every invocation, the difference will not even be measurable.
Keep in mind that in most cases, a 10% speedup in a program is pretty much undetectable to the end user (unless you manage to reduce the runtime of your overnight batch from 25h back to less than 24h) so this is not worth fretting over unless you are sure and have the profiler output to back up that this particular piece of code is within the 10%-20% 'hot area' that has a major influence over your program's runtime.
Other considerations should be more important, like maintainability or other external factors. For example if the above code is in heavily multithreaded code, using local variables can make the implementation easier.
It depends, but I expect there would be absolutely no difference.
What is important is this: Using member variables as temporaries will make your code non-reentrant - For example, it will fail if two threads try to call callback() on the same object. Using static locals (or static member variables) is even worse, because then your code will fail if two threads try to call callback() on any thisClass object - or descendant.
Using the member variables should be marginally faster since they only have to be allocated once (when the object is constructed) instead of every time the callback is invoked. But in comparison to the rest of the work you're probably doing I expect this would be a very tiny percentage. Benckmark both and see which is faster.
Also, there's a third option: static locals. These don't get re-allocated every time the function is called (in fact, they get preserved across calls) but they don't pollute the class with excessive member variables.
Related
First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.
I am using Sublime and Arduino to program a Barometer (MS5611). But what is the best practice to store variables that is only used as temporary storage inside a specific function:
1) Create private variables in my header file for all variables used?
2) Create the variables inside the functions where they are used?
What takes most processing power and memory usage - (1) create them once as private variables and change the content with the functions, or (2) create the variables each time I call a function?
Always declare them inside the function. This improves readability as it shows the intent behind the declaration. Also it lowers the chance for mistakes.
Wherever possible as "const", e.g.
uint16_t sample_it() {
const uint16_t sample = analogRead(...);
const uint16_t result = do_somehting(sample);
return result;
}
Almost for the same reasons but this also gives the compiler more optimization options.
If and how variables are allocated is up to the compiler and its optimizer. Unless you have very tight performance constraints chances are that the compiler will optimize much better than you would. Actually using global variables instead will sometimes slow down your code. Of course you might avoid allocation. However you will pay by additional storage instructions. On the other hand the "allocation" might get optimized away and then your global variables code becomes slower than the local variables code.
depends on your sample rate, meaning how many times calls the function to save the data?
In any case, it is important to also take into account how empty the memory once you've collected and processed the data, in any case if you do not have a lot of variables, but you have to handle more functions that can use it is best to set them globally.
At least, I do so in my projects, and I have never had a problem.
you should avoid using global variables as they are allocated from the available heap RAM and exist (take up space) for the duration of the program (forever in embedded systems) Globals also make for less maintainable and more fragile programs.
If you only need the data inside a function, declare it there. There is almost no penalty (initialization only) and the used space is automatically returned when the function returns as local variables are placed on the stack as are passed parameters.
Is it more efficient for a class to access member variables or local variables? For example, suppose you have a (callback) method whose sole responsibility is to receive data, perform calculations on it, then pass it off to other classes. Performance-wise, would it make more sense to have a list of member variables that the method populates as it receives data? Or just declare local variables each time the callback method is called?
Assume this method would be called hundreds of times a second...
In case I'm not being clear, here's some quick examples:
// use local variables
class thisClass {
public:
void callback( msg& msg )
{
int varA;
double varB;
std::string varC;
varA = msg.getInt();
varB = msg.getDouble();
varC = msg.getString();
// do a bunch of calculations
}
};
// use member variables
class thisClass {
public:
void callback( msg& msg )
{
m_varA = msg.getInt();
m_varB = msg.getDouble();
m_varC = msg.getString();
// do a bunch of calculations
}
private:
int m_varA;
double m_varB;
std::string m_varC;
};
Executive summary: In virtually all scenarios, it doesn't matter, but there is a slight advantage for local variables.
Warning: You are micro-optimizing. You will end up spending hours trying to understand code that is supposed to win a nanosecond.
Warning: In your scenario, performance shouldn't be the question, but the role of the variables - are they temporary, or state of thisClass?
Warning: First, second and last rule of optimization: measure!
First of all, look at the typical assembly generated for x86 (your platform may vary):
// stack variable: load into eax
mov eax, [esp+10]
// member variable: load into eax
mov ecx, [adress of object]
mov eax, [ecx+4]
Once the address of the object is loaded, int a register, the instructions are identical. Loading the object address can usually be paired with an earlier instruction and doesn't hit execution time.
But this means the ecx register isn't available for other optimizations. However, modern CPUs do some intense trickery to make that less of an issue.
Also, when accessing many objects this may cost you extra. However, this is less than one cycle average, and there are often more opprtunities for pairing instructions.
Memory locality: here's a chance for the stack to win big time. Top of stack is virtually always in the L1 cache, so the load takes one cycle. The object is more likely to be pushed back to L2 cache (rule of thumb, 10 cycles) or main memory (100 cycles).
However, you pay this only for the first access. if all you have is a single access, the 10 or 100 cycles are unnoticable. if you have thousands of accesses, the object data will be in L1 cache, too.
In summary, the gain is so small that it virtually never makes sense to copy member variables into locals to achieve better performance.
I'd prefer the local variables on general principles, because they minimize evil mutable state in your program. As for performance, your profiler will tell you all you need to know. Locals should be faster for ints and perhaps other builtins, because they can be put in registers.
This should be your compilers problem. Instead, optimize for maintainability: If the information is only ever used locally, store it in local (automatic) variables. I hate reading classes littered with member variables that don't actually tell me anything about the class itself, but only some details about how a bunch of methods work together :(
In fact, I would be surprised if local variables aren't faster anyway - they are bound to be in cache, since they are close to the rest of the functions data (call frame) and an objects pointer might be somewhere totally else - but I am just guessing here.
Silly question.
It all depends on the compiler and what it does for optimization.
Even if it did work what have you gained? Way to obfuscate your code?
Variable access is usually done via a pointer and and offset.
Pointer to Object + offset
Pointer to Stack Frame + offset
Also don't forget to add in the cost of moving the variables to local storage and then copying the results back. All of which could be meaning less as the compiler may be smart enough to optimize most of it away anyway.
A few points that have not been mentioned explicitly by others:
You are potentially invoking assignment operators in your code.
e.g varC = msg.getString();
You have some wasted cycles every time the function frame is setup. You are creating variables, default constructor called, then invoke the assignment operator to get the RHS value into the locals.
Declare the locals to be const-refs and, of course, initialize them.
Member variables might be on the heap(if your object was allocated there) and hence suffer from non-locality.
Even a few cycles saved is good - why waste computation time at all, if you could avoid it.
When in doubt, benchmark and see for yourself. And make sure it makes a difference first - hundreds of times a second isn't a huge burden on a modern processor.
That said, I don't think there will be any difference. Both will be constant offsets from a pointer, the locals will be from the stack pointer and the members will be from the "this" pointer.
In my oppinion, it should not impact performance, because:
In Your first example, the variables are accessed via a lookup on the stack, e.g. [ESP]+4 which means current end of stack plus four bytes.
In the second example, the variables are accessed via a lookup relative to this (remember, varB equals to this->varB). This is a similar machine instruction.
Therefore, there is not much of a difference.
However, You should avoid copying the string ;)
The amount of data that you will be interacting with will have a bigger influence on the execution speed than the way you represent the data in the implementation of the algorithm.
The processor does not really care if the data is on the stack or on the heap (apart from the chance that the top of the stack will be in the processor cache as peterchen mentioned) but for maximum speed, the data will have to fit into the processor's cache (L1 cache if you have more than one level of cache, which pretty much all modern processors have). Any load from L2 cache - or $DEITY forbid, main memory - will slow down the execution. So if you're processing a string that's a few hundred KB in size and chances on every invocation, the difference will not even be measurable.
Keep in mind that in most cases, a 10% speedup in a program is pretty much undetectable to the end user (unless you manage to reduce the runtime of your overnight batch from 25h back to less than 24h) so this is not worth fretting over unless you are sure and have the profiler output to back up that this particular piece of code is within the 10%-20% 'hot area' that has a major influence over your program's runtime.
Other considerations should be more important, like maintainability or other external factors. For example if the above code is in heavily multithreaded code, using local variables can make the implementation easier.
It depends, but I expect there would be absolutely no difference.
What is important is this: Using member variables as temporaries will make your code non-reentrant - For example, it will fail if two threads try to call callback() on the same object. Using static locals (or static member variables) is even worse, because then your code will fail if two threads try to call callback() on any thisClass object - or descendant.
Using the member variables should be marginally faster since they only have to be allocated once (when the object is constructed) instead of every time the callback is invoked. But in comparison to the rest of the work you're probably doing I expect this would be a very tiny percentage. Benckmark both and see which is faster.
Also, there's a third option: static locals. These don't get re-allocated every time the function is called (in fact, they get preserved across calls) but they don't pollute the class with excessive member variables.
For clarification: I know how evil globals are and when not to use them :)
Is there any performance penalty when accessing/setting a global variable vs. a local one in a compiled C++ program?
That would depend entirely on your machine architecture. Global variables are accessed via a single known address, whereas local variables are typically accessed by indexing off an address register. The chances of the difference between the two being significant is extremely remote, but if you think it will be important, you should write a test for your target architecture and measure the difference.
It depends but usually yes although it is a micro issue. Global variables should be reference-able from many contexts, which means that putting them into a register is not possible. While in the case of local variables, that is possible and preferable. In fact, the more narrower the scope the more the compiler has the opportunity to optimize access/modifying that variable.
Local variables are probably "faster" in many cases, but I don't think the performance gain would be noticeable or outweigh the additional maintenance cost of having many global variables. Everything I list below either has a negligible cost or can easily be dwarfed by just about any other inefficiency in your program. I would consider these to be a perfect example of a micro-optimization.
Local variables are on the stack, which is more likely to be in the cache. This point is moot if your global variable is frequently used, since it will therefore also be in the cache.
Local variables are scoped to the function - therefore, the compiler can presume that they won't be changed by any other function calls. With a global, the compiler may be forced to reload the global value.
On some 64-bit machines, getting the address of a global variable is a two-step process - you must also add the 32-bit offset of the global to a 64-bit base address. Local variables can always be directly accessed off of the stack pointer.
Strictly speaking, no.
Some things to consider:
Global variables increase the static size of your program in memory.
If access to the variable needs to be synchronized, that would incur some performance overhead.
There are a number of compiler optimisations that are possible with local variables but not with global variables, so in some cases you might see a difference in performance. I doubt that your global variable is being accessed in a performance-critical loop though (very bad design if it is !) so it's probably not going to be an issue.
It's more of the way how you use data stored in your variables that matters performance wise then how you declare them. I'm not sure about the correct terminology here, but one can define two types of data access. Shared access (where you access same data from different parts of the code) and private data, where each part has its own data. By default global variables imply shared access and local imply private access. But both types of access can be achieved with both types of variables (i.e. local pointers pointing to the same chunk of memory, or global array where each part of code access different part of array).
Shared access has better caching, lower memory footprint, but is harder to optimize, especially in multi threaded environment. It is also bad for scaling especially with NUMA architecture..
Private access is easier to optimise and better for scaling. Problems with private access usually exist in situation where you have multiple copies of same data. The problems usually associated with these scenario's are higher memory footprint, synchronization between copies, worse caching etc.
The answer is tied to the overall structure of the program.
For example, I just disassembled this, and in both cases the looping variable was moved into a register, after which there was no difference:
int n = 9;
int main()
{
for (n = 0; n < 10; ++n)
printf("%d", n);
for (int r = 0; r < 10; ++r)
printf("%d", r);
return 0;
}
Just to be sure, I did similar things with classes and again saw no difference. But if the global is in a different compilation unit that might change.
There is no performance penalty, either way, that you should be concerned about. In adition to what everyone else has said, you should also bear in mind the paging overhead. Local instance variables are fetched from the object structure which has likely (?) already been paged into cache memory). Global variables, on the other hand, may cause a different pattern of virtual memory paging.
Again, the performance is really not worth any consideration on your part.
In addition to other answers, I would just point out that accessing a global variable in a multithreaded environment is likely to be more expensive because you need to ensure it is locked properly and the threads may wait in line to access it. With local variables it is not an issue.
I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now
The compiler will probably equalize them. If it has any brains at all, it will copy values.a, values.b, and values.c into local variables or registers, which is also what happens in the simple case.
The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.
This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)
No, your cpu would cache the variables you use over and over again.
I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
int a = objA.a;
int b = objA.b;
int c = objA.c;
I think it would be almost the same performance
If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.
you can also start your function with
int & a = values.a;
int & b = values.b;
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.
As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the this pointer) and some offset (the location in the function's stack frame, or the location inside the class).
Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the this pointer might not. If this is the case, accessing this (which is presumably on the stack) would require an extra memory lookup.
Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.
If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.