This question already has answers here:
Is there any overhead to declaring a variable within a loop? (C++) [duplicate]
(13 answers)
Closed 8 years ago.
Is there any performance difference between:
uint a;
for (uint i = 0 ; i < n ; ++i)
{
a = i;
}
and:
for (uint i = 0 ; i < n ; ++i)
{
uint a;
a = i;
}
Does the second piece of code result in a program creating a variable (allocating memory or sth) multiple times instead of just once? If not, is it because both codes are equivalent or because of some compiler optimization?
I expect all modern compilers to produce identical code in both cases.
You might be surprised to know that with most compilers there is absolutely no difference whatsoever in the amount of overhead between either case.
In brief, these kinds of variables are instantiated on the stack, and a compiler, for each function, computes the maximum amount of stack needed to instantiate all local variables inside the function, and allocate the required stack space at the function's entry point.
Of course, if instead of a simple int you have some class with a non-trivial amount of complexity in its constructor, then it would certainly make a lot of difference, in several ways. But for a simple int like this, absolutely nothing.
Assuming uint is a basic data type, I would say they are the same. Apart from the fact you can use a after the loop in the first example, of course. This is because the cost of placing basic data-types on the stack is trivial and the compiler can optimize this case so the memory is actually reserved once, as if you had declared it outside the loop.
If uint is a class, placing it outside the loop and reusing it could be faster. For example, declaring a std::string before a loop and reusing it inside the loop can be faster than creating a new string each iteration. This is because a string can reuse its existing dynamic memory allocation.
You can look at the disassembly of your compiler to see if there is any difference.
The second example must be slightly slower because a is being redefined for each iteration. This means that space is being set aside to store the maximum possible value of a (in this case, probably 4 bytes of memory). The allocation of that space is a command which will be executed, so I think that it is slower. Does it matter? Probably not.
Related
Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.
char *str_Modify()
{
char buffer[70000] = { 0 };
char targetString[70000] = { 0 };
memset(targetString, '\0', sizeof(targetString));
...
...
...
return targetString;
}
As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns.
My question
Wanted to know what is the static data memory limit?
What can be the quick fix for this code? Is it a good practice to make the variable targetString static?
(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)
It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.
Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.
And pick your language: In C++ there are other techniques.
Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.
Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.
An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.
As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!
It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.
For example:
#include <stdio.h>
int *f()
{
static int a[10]={0};
return a;
}
int main()
{
f();
return 0;
}
It is working fine on GCC compiler. [Live Demo]
But, if you remove static keyword then compiler generates warning:
prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
return a;
^
Also, see this question comments wrote by Ludin.
I believe you are confusing this with int* fun (void) { static int i
= 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is
undefined behavior.
Also, tutorialspoint say's :
Second point to remember is that C does not advocate to return the
address of a local variable to outside of the function, so you would
have to define the local variable as static variable.
Wanted to know what is the static data memory limit?
Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.
What can be the quick fix for this code?
The quick fix is indeed to make the buffer static.
The good fix is to rewrite the function as
char *modify(char *out, size_t outsz) {
// ...
return out;
}
(returning the input is just to simplify reusing the new function in existing code).
Is it a good practice to make the variable targetString static?
No. Sometimes it's the best you can do, but it has a number of problems:
The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
Using static (or global) variables breaks re-entrancy. Simple example: code like
printf("%s,%s\n", str_Modify(1), str_Modify(2));
cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).
Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.
for(int i=0;i<10;i++)
{
int x=0;
printf("%d",x);
{
int x=10;
printf("%d",x);
}
printf("%d",x);
}
Here I want to know if memory for variable x will be allocated twice or is the value just reset after exiting the 2nd block and memory is allocated only once (for x) ?
From the point of view of the C programming model, the two definitions of x are two completely different objects. The assignment in the inner block will not affect the value of x in the outer block.
Moreover, the definitions for each iteration of the loop count as different objects too. Assigning a value to either x in one iteration will not affect the x in subsequent iterations.
As far as real implementations are concerned, there are two common scenarios, assuming no optimisation is done. If you have optimisation turned on, your code is likely to be discarded because it's quite easy for the compiler to figure out that the loop has no effect on anything outside it except i.
The two common scenarios are
The variables are stored on the stack. In this scenario, the compiler will reserve a slot on the stack for the outer x and a slot on the stack for the inner x. In theory it ought to allocate the slots at the beginning of the scope and deallocate at the end of the scope, but that just wastes time, so it'll reuse the slots on each iteration.
The variables are stored in registers. This is the more likely option on modern 64 bit architectures. Again, the compiler ought to "allocate" (allocate is not really the right word) a register at the beginning of the scope and "deallocate" at the end but, it'll just reuse the same registers in real life.
In both cases, you will note that the value from each iteration will be preserved to the next iteration because the compiler uses the same storage space. However, never do this
for (int i = 0 ; i < 10 ; ++i)
{
int x;
if (i > 0)
{
printf("Before %d\n", x); // UNDEFINED BEHAVIOUR
}
x = i;
printf("After %d\n", x);
}
If you compile and run the above (with no optimisation), you'll probably find it prints sensible values but, each time you go round the loop, x is theoretically a completely new object, so the first printf accesses an uninitialised variable. This is undefined behaviour so the program may give you the value from the previous iteration because it is using the same storage or it may firebomb your house and sell your daughter into slavery.
Setting aside compiler optimization that might remove these unused variables, the answer is twice.
The second definition of x masks (technical term) the other definition in its scope following its declaration.
But the first definition is visible again after that scope.
So logically (forgetting about optimization) the first value of x (x=0) has to be held somewhere while x=10 is 'in play'. So two pieces of storage are (logically) required.
Execute the C program below. Typical partial output:
A0 x==0 0x7ffc1c47a868
B0 x==0 0x7ffc1c47a868
C0 x==10 0x7ffc1c47a86c
D0 x==0 0x7ffc1c47a868
A1 x==0 0x7ffc1c47a868
B1 x==0 0x7ffc1c47a868
C1 x==10 0x7ffc1c47a86c
//Etc...
Notice how only point C sees the variable x with value 10 and the variable with value 0 is visible again at point D. Also see how the two versions of x are stored at different addresses.
Theoretically the addresses could be different for each iteration but I'm not aware of an implementation that actually does that because it is unnecessary. However if you made these non-trival C++ objects their constuctors and destructors would get called on each loop though still reside at the same addresses (in practice).
It is obviously confusing to human readers to hide variables like this and it's not recommended.
#include <stdio.h>
int main(void) {
for(int i=0;i<10;i++)
{
int x=0;
printf("A%d x==%d %p\n",i,x,&x);
{
printf("B%d x==%d %p\n",i,x,&x);
int x=10;
printf("C%d x==%d %p\n",i,x,&x);
}
printf("D%d x==%d %p\n",i,x,&x);
}
}
This is an implementation specific detail.
For example, in code optimization stage it might detect those are not used. So no space will be allocated for them.
Even if some compiler has not that thing, then you can expect that there might be cases where two different variables spaces are not allocated.
For your information, the thing braces doesn't always mean it is different memory or stack space. It is a scope issue. And for the variable it might be the case that they are allocated in CPU register.
So you can't say anything in general. The thing you can say is they are of different scope.
I would expect that most compilers will use memory on the stack for variables of this type, if any memory is needed at all. In some cases a CPU register might be used for one or both the x's. Both will have their own storage, but it's compiler-dependent whether the lifetime of that storage is the same as the scope of the variables as declared in the source. So, for example, the memory used for the "inner" x might continue to be in use beyond the point at which that variable is out of scope -- this really depends on the compiler implementation.
Suppose I have a function
double tauscale(NumericVector x){
int n = x.size();
const double k = 2;
double sc = 1.48*sqrt(median(x*x));
double tauscale = 0.0;
for(int i = 0 ; i < n ; ++i){
tauscale = tauscale + rhobiweight(x(i)/sc,k);
}
return (1.0/n)*pow(sc,2)*tauscale;
}
Now we see here the function rhobiweight that accepts two doubles, currently written as:
double rhobiweight(double x,double k = 2.0){
double rho = 1.0;
if(std::abs(x)<k){
rho = 1.0-pow((1.0-pow(x/k,2)),3);
}
return rho/Erho(k) ;
}
The question is: how can I make use of pointers or references such that the x-value doesn't get copied? Ideally the computation time and memory use should be the same as if I had never written rhobiweight, but implemented this function directly in tauscale.
how can I make use of pointers or references such that the x-value doesn't get copied?
By declaring the arguments as either pointers or references. But don't do that. Then you need to copy the address of the variable, which is just as slow, because the size of a double is same (or nearly same) as the size of a memory address. Not only that, but you need to dereference the pointer whenever you use it in the function. Or dereference it once and copy the value anyway.
Ideally the computation time and memory use should be the same as if I had never written rhobiweight, but implemented this function directly in tauscale.
That would happen if the function is expanded inline by the optimizer. There is no standard way to force the compiler expand a function inline, but if the optimizer thinks it's advantageous, and you've enabled optimization, it will do that as long as the function is inlinable. To make the function inlinable, make sure that the definition is visible at the call site. A trivial way to do that is to declare the function inline.
Note that the peak memory use may actually be higher if many function calls are inlined.
TL;DR: Don't try to do this.
Full story:
You ask:
"how can I make use of pointers or references such that the x-value doesn't get copied?"
If you are compiling your program with optimization turned on, the variables probably do not get copied anyway. Using pointers and/or references might just make things slower.
This leads me to a more important point: How do you even know that copying the values is taking a long time? Why would you expect that using pointers would take less time?
The way to optimize your code is to measure where the time is spent, then try to optimize that.
I was asked in an interview, to tell the exact difference for the following c/c++ code statements
int a = 10;
and
int a;
a = 10;
Though both assign the same value, they told me there is a lot of difference between the two in memory.
Can anyone please explain to me about this?
As far as language concerned, they are two ways to do the same thing, initialize the variable a and assign 10 to it.
The statement
int a; reserves memory for the value a which certainly contains garbage.
Because of that you initialize it with a = 10;
In the statement int a = 10; these two steps are done in the same statement.
First a part of memory is reserved to the variable a, and then the memory is overwritten with the value of 10.
int a = 10;
^^^^^ ^^^^^
reserve memory for the variable a write 10 to that memory location
Regarding memory the first declaration uses less memory on your PC because less characters are used, so your .c file will be smaller.
But after compilation the produced executable files will be the same.
IMPORTANT: if those statements are outside any function they are possibly not the same (although they will produce the same result).
The problem is that the first statement will assign 0 to a in the first statement because most compilers do that to global variables in C (and is defined by C standard).
"Though both assign the same value, but there is a lot of difference between the two in memory."
No, there's no difference in stack memory usage!
The difference is that assigning a value though initialization may cause some extra cost for additional assembler instructions (and thus memory needed to store it, aka. code footprint), the compiler can't optimize out at this point (because it's demanded).
If you initialize a immediately this will have some cost in code. You might want to delay initialization for later use, when the value of a is actually needed:
void foo(int x) {
int a; // int a = 30; may generate unwanted extra assembler instructions!
switch(x) {
case 0:
a = 10;
break;
case 1:
a = 20;
break;
default:
return;
}
// Do something with a correctly initialized a
}
This could have well been an interview question made to you in our company, by particular colleagues of mine. And they'd wanted you to answer, that just having the declaration for int a; in 1st place is the more efficient choice.
I'd say this interview question was made to see, if you're really have an in-depth understanding of c and c++ language (A mean-spirited though!).
Speaking for me personally, I'm more convenient on interviews about such stuff usually.
I consider the effect is just very minimal. Though it could well seriously matter on embedded MCU targets, where you have very limited space left for the code footprint (say less/equal than 256K), and/or need to use compiler toolchains that actually aren't able to optimize this out for themselves.
If you are talking about a global variable (one that doesn't appear in a block of code, but outside of all functions/methods):
int a;
makes a zero-initialized variable. Some (most?) c++ implementations will place this variable in a memory place (segment? section? whatever it is called) dedicated for zero-initialized variables.
int a = 10;
makes a variable initialized to something other than 0. Some implementations have a different region in memory for these. So this variable may have an address (&a) that is very different from the previous case.
This is, I guess, what you mean by "lot of difference between the two in memory".
Practically, this can affect your program if it has severe bugs (memory overruns) - they may get masked if a is defined in one manner or the other.
P.S. To make it clear, I am only talking about global variables here. So if your code is like int main() {int a; a = 10;} - here a is typically allocated on stack and there is no "difference in memory" between initialization and assignment.
Is
int array[100] = {};
faster than
int array[100];
for(int i=0; i<100; ++i){
array[i] = 0;
}
Or are they equal? What are the differences if any?
The initialisation of non-statically allocated arrays might well be implemented the same way for both shown variants. You will have to measure or look at the generated assembly.
For statically allocated data (namespace scope data in C++ parlance), on UNIX there is the BSS segment for zero-initialized data and the data segment for non-zero-inialized data. Symbols places in the BSS segment are only specified in location and size, their content is implicitly zero and occupies no size in the executable. I'd certainly try to take advantage of zero-initialization for big arrays. (However, most of the time I'm dealing with big arrays, I don't know how big they will have to be and I have to allocate and intialialise them dynamically anyway.)
Once you need initial values different from zero, their compile-time initialization will occupy space in the executable (data segment) and you're facing a classic space/time tradeoff.
Given that today CPU speed is much faster than memory and disk bandwith, dynamic initialization will carry you a long way and also is more flexible.
It largely depends on your compiler, my guess is with appropriate optimizations turned on, both will be the same. It also depends on what you do with the values afterwards. If array goes out of scope immediately, it won't be created at all. If the values are just read, 0 might be directly available and an actual read might not even happen at all. Your best bet is to profile for your specific use-case.
Note that this only works for initialization to 0, if you need another value the first wouldn't be an option.
This code:
int array[100] = {};
Means the compiler is free to do whatever it wants to initialize the array. This might mean baking the initialization into the compiled code in such a way that the initialization takes zero, or at least constant time. Potentially this could be O(1) performance, but there's no guarantee.
This code on the other hand:
int array[100];
for(int i=0; i<100; ++i){
array[i] = 0;
}
is O(N). It will never be less than O(N) because you have a for loop in it. Perhaps in some cases a compiler might be able to see it can optimize away the for loop, but it's going to be a harder problem for it.
So static initialization can be faster, but isn't necessarily. A for loop will almost certainly not be faster.
int array[100] = {};
This will default-initialize an array, which means that for basic (scalar) types the entire array will be properly zero-initialized and indeed in O(1) time. since its compiler implementations it is bound to be optimized.
int array[100];
for(int i=0; i<100; ++i){
array[i] = 0;
}
here the programmer steps up and takes the responsibility to initialize the array. Now it depends on how well compiler optimization code is written to answer whether it's at par with the previous initialization or may fall short .