I was asked in an interview, to tell the exact difference for the following c/c++ code statements
int a = 10;
and
int a;
a = 10;
Though both assign the same value, they told me there is a lot of difference between the two in memory.
Can anyone please explain to me about this?
As far as language concerned, they are two ways to do the same thing, initialize the variable a and assign 10 to it.
The statement
int a; reserves memory for the value a which certainly contains garbage.
Because of that you initialize it with a = 10;
In the statement int a = 10; these two steps are done in the same statement.
First a part of memory is reserved to the variable a, and then the memory is overwritten with the value of 10.
int a = 10;
^^^^^ ^^^^^
reserve memory for the variable a write 10 to that memory location
Regarding memory the first declaration uses less memory on your PC because less characters are used, so your .c file will be smaller.
But after compilation the produced executable files will be the same.
IMPORTANT: if those statements are outside any function they are possibly not the same (although they will produce the same result).
The problem is that the first statement will assign 0 to a in the first statement because most compilers do that to global variables in C (and is defined by C standard).
"Though both assign the same value, but there is a lot of difference between the two in memory."
No, there's no difference in stack memory usage!
The difference is that assigning a value though initialization may cause some extra cost for additional assembler instructions (and thus memory needed to store it, aka. code footprint), the compiler can't optimize out at this point (because it's demanded).
If you initialize a immediately this will have some cost in code. You might want to delay initialization for later use, when the value of a is actually needed:
void foo(int x) {
int a; // int a = 30; may generate unwanted extra assembler instructions!
switch(x) {
case 0:
a = 10;
break;
case 1:
a = 20;
break;
default:
return;
}
// Do something with a correctly initialized a
}
This could have well been an interview question made to you in our company, by particular colleagues of mine. And they'd wanted you to answer, that just having the declaration for int a; in 1st place is the more efficient choice.
I'd say this interview question was made to see, if you're really have an in-depth understanding of c and c++ language (A mean-spirited though!).
Speaking for me personally, I'm more convenient on interviews about such stuff usually.
I consider the effect is just very minimal. Though it could well seriously matter on embedded MCU targets, where you have very limited space left for the code footprint (say less/equal than 256K), and/or need to use compiler toolchains that actually aren't able to optimize this out for themselves.
If you are talking about a global variable (one that doesn't appear in a block of code, but outside of all functions/methods):
int a;
makes a zero-initialized variable. Some (most?) c++ implementations will place this variable in a memory place (segment? section? whatever it is called) dedicated for zero-initialized variables.
int a = 10;
makes a variable initialized to something other than 0. Some implementations have a different region in memory for these. So this variable may have an address (&a) that is very different from the previous case.
This is, I guess, what you mean by "lot of difference between the two in memory".
Practically, this can affect your program if it has severe bugs (memory overruns) - they may get masked if a is defined in one manner or the other.
P.S. To make it clear, I am only talking about global variables here. So if your code is like int main() {int a; a = 10;} - here a is typically allocated on stack and there is no "difference in memory" between initialization and assignment.
Related
Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.
char *str_Modify()
{
char buffer[70000] = { 0 };
char targetString[70000] = { 0 };
memset(targetString, '\0', sizeof(targetString));
...
...
...
return targetString;
}
As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns.
My question
Wanted to know what is the static data memory limit?
What can be the quick fix for this code? Is it a good practice to make the variable targetString static?
(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)
It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.
Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.
And pick your language: In C++ there are other techniques.
Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.
Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.
An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.
As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!
It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.
For example:
#include <stdio.h>
int *f()
{
static int a[10]={0};
return a;
}
int main()
{
f();
return 0;
}
It is working fine on GCC compiler. [Live Demo]
But, if you remove static keyword then compiler generates warning:
prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
return a;
^
Also, see this question comments wrote by Ludin.
I believe you are confusing this with int* fun (void) { static int i
= 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is
undefined behavior.
Also, tutorialspoint say's :
Second point to remember is that C does not advocate to return the
address of a local variable to outside of the function, so you would
have to define the local variable as static variable.
Wanted to know what is the static data memory limit?
Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.
What can be the quick fix for this code?
The quick fix is indeed to make the buffer static.
The good fix is to rewrite the function as
char *modify(char *out, size_t outsz) {
// ...
return out;
}
(returning the input is just to simplify reusing the new function in existing code).
Is it a good practice to make the variable targetString static?
No. Sometimes it's the best you can do, but it has a number of problems:
The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
Using static (or global) variables breaks re-entrancy. Simple example: code like
printf("%s,%s\n", str_Modify(1), str_Modify(2));
cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).
Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.
for(int i=0;i<10;i++)
{
int x=0;
printf("%d",x);
{
int x=10;
printf("%d",x);
}
printf("%d",x);
}
Here I want to know if memory for variable x will be allocated twice or is the value just reset after exiting the 2nd block and memory is allocated only once (for x) ?
From the point of view of the C programming model, the two definitions of x are two completely different objects. The assignment in the inner block will not affect the value of x in the outer block.
Moreover, the definitions for each iteration of the loop count as different objects too. Assigning a value to either x in one iteration will not affect the x in subsequent iterations.
As far as real implementations are concerned, there are two common scenarios, assuming no optimisation is done. If you have optimisation turned on, your code is likely to be discarded because it's quite easy for the compiler to figure out that the loop has no effect on anything outside it except i.
The two common scenarios are
The variables are stored on the stack. In this scenario, the compiler will reserve a slot on the stack for the outer x and a slot on the stack for the inner x. In theory it ought to allocate the slots at the beginning of the scope and deallocate at the end of the scope, but that just wastes time, so it'll reuse the slots on each iteration.
The variables are stored in registers. This is the more likely option on modern 64 bit architectures. Again, the compiler ought to "allocate" (allocate is not really the right word) a register at the beginning of the scope and "deallocate" at the end but, it'll just reuse the same registers in real life.
In both cases, you will note that the value from each iteration will be preserved to the next iteration because the compiler uses the same storage space. However, never do this
for (int i = 0 ; i < 10 ; ++i)
{
int x;
if (i > 0)
{
printf("Before %d\n", x); // UNDEFINED BEHAVIOUR
}
x = i;
printf("After %d\n", x);
}
If you compile and run the above (with no optimisation), you'll probably find it prints sensible values but, each time you go round the loop, x is theoretically a completely new object, so the first printf accesses an uninitialised variable. This is undefined behaviour so the program may give you the value from the previous iteration because it is using the same storage or it may firebomb your house and sell your daughter into slavery.
Setting aside compiler optimization that might remove these unused variables, the answer is twice.
The second definition of x masks (technical term) the other definition in its scope following its declaration.
But the first definition is visible again after that scope.
So logically (forgetting about optimization) the first value of x (x=0) has to be held somewhere while x=10 is 'in play'. So two pieces of storage are (logically) required.
Execute the C program below. Typical partial output:
A0 x==0 0x7ffc1c47a868
B0 x==0 0x7ffc1c47a868
C0 x==10 0x7ffc1c47a86c
D0 x==0 0x7ffc1c47a868
A1 x==0 0x7ffc1c47a868
B1 x==0 0x7ffc1c47a868
C1 x==10 0x7ffc1c47a86c
//Etc...
Notice how only point C sees the variable x with value 10 and the variable with value 0 is visible again at point D. Also see how the two versions of x are stored at different addresses.
Theoretically the addresses could be different for each iteration but I'm not aware of an implementation that actually does that because it is unnecessary. However if you made these non-trival C++ objects their constuctors and destructors would get called on each loop though still reside at the same addresses (in practice).
It is obviously confusing to human readers to hide variables like this and it's not recommended.
#include <stdio.h>
int main(void) {
for(int i=0;i<10;i++)
{
int x=0;
printf("A%d x==%d %p\n",i,x,&x);
{
printf("B%d x==%d %p\n",i,x,&x);
int x=10;
printf("C%d x==%d %p\n",i,x,&x);
}
printf("D%d x==%d %p\n",i,x,&x);
}
}
This is an implementation specific detail.
For example, in code optimization stage it might detect those are not used. So no space will be allocated for them.
Even if some compiler has not that thing, then you can expect that there might be cases where two different variables spaces are not allocated.
For your information, the thing braces doesn't always mean it is different memory or stack space. It is a scope issue. And for the variable it might be the case that they are allocated in CPU register.
So you can't say anything in general. The thing you can say is they are of different scope.
I would expect that most compilers will use memory on the stack for variables of this type, if any memory is needed at all. In some cases a CPU register might be used for one or both the x's. Both will have their own storage, but it's compiler-dependent whether the lifetime of that storage is the same as the scope of the variables as declared in the source. So, for example, the memory used for the "inner" x might continue to be in use beyond the point at which that variable is out of scope -- this really depends on the compiler implementation.
This question already has answers here:
Is there any overhead to declaring a variable within a loop? (C++) [duplicate]
(13 answers)
Closed 8 years ago.
Is there any performance difference between:
uint a;
for (uint i = 0 ; i < n ; ++i)
{
a = i;
}
and:
for (uint i = 0 ; i < n ; ++i)
{
uint a;
a = i;
}
Does the second piece of code result in a program creating a variable (allocating memory or sth) multiple times instead of just once? If not, is it because both codes are equivalent or because of some compiler optimization?
I expect all modern compilers to produce identical code in both cases.
You might be surprised to know that with most compilers there is absolutely no difference whatsoever in the amount of overhead between either case.
In brief, these kinds of variables are instantiated on the stack, and a compiler, for each function, computes the maximum amount of stack needed to instantiate all local variables inside the function, and allocate the required stack space at the function's entry point.
Of course, if instead of a simple int you have some class with a non-trivial amount of complexity in its constructor, then it would certainly make a lot of difference, in several ways. But for a simple int like this, absolutely nothing.
Assuming uint is a basic data type, I would say they are the same. Apart from the fact you can use a after the loop in the first example, of course. This is because the cost of placing basic data-types on the stack is trivial and the compiler can optimize this case so the memory is actually reserved once, as if you had declared it outside the loop.
If uint is a class, placing it outside the loop and reusing it could be faster. For example, declaring a std::string before a loop and reusing it inside the loop can be faster than creating a new string each iteration. This is because a string can reuse its existing dynamic memory allocation.
You can look at the disassembly of your compiler to see if there is any difference.
The second example must be slightly slower because a is being redefined for each iteration. This means that space is being set aside to store the maximum possible value of a (in this case, probably 4 bytes of memory). The allocation of that space is a command which will be executed, so I think that it is slower. Does it matter? Probably not.
Do built-in types which are not defined dynamically, always stay in the same piece of memory during the duration of the program?
If it's something I should understand how do I go about and check it?
i.e.
int j = 0;
double k = 2.2;
double* p = &k;
Does the system architecture or compiler move around all these objects if a C/C++ program is, say, highly memory intensive?
Note: I'm not talking about containers such as std::vectors<T>. These can obviously reallocate in certain situations, but again this is dynamic.
side question:
The following scenario will obviously raise a few eyebrows. Just as an example, will this pointer always be valid during the duration of the program?
This side-question is obsolete, thanks to my ignorance!
struct null_deleter
{
void operator() (void const *) const {};
};
int main()
{
// define object
double b=0;
// define shared pointer
std::shared_ptr<double> ptr_store;
ptr_store.reset(&b,null_deleter()); // this works and behaves how you would expect
}
In the abstract machine, an object's address does not change during that object's lifetime.
(The word "object" here does not refer to "object-oriented" anything; an "object" is merely a region of storage.)
That really means that a program must behave as if an object's address never changes. A compiler can generate code that plays whatever games it likes, including moving objects around or not storing them anywhere at all, as long as such games don't affect the visible behavior in a way that violates the standard.
For example, this:
int n;
int *addr1 = &n;
int *addr2 = &n;
if (addr1 == addr2) {
std::cout << "Equal\n";
}
must print "Equal" -- but a clever optimizing compiler could legally eliminate everything but the output statement.
The ISO C standard states this explcitly, in section 6.2.4:
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address, and retains its last-stored value throughout
its lifetime.
with a (non-normative) footnote:
The term "constant address" means that two pointers to the object
constructed at possibly different times will compare equal. The
address may be different during two different executions of the same
program.
I haven't found a similar explicit statement in the C++ standard; either I'm missing it, or the authors considered it too obvious to bother stating.
The compiler is free to do whatever it wants, so long as it doesn't affect the observable program behaviour.
Firstly, consider that local variables might not even get put in memory (they might get stored in registers only, or optimized away entirely).
So even in your example where you take the address of a local variable, that doesn't mean that it has to live in a fixed location in memory. It depends what you go on to do with it, and whether the compiler is smart enough to optimize it. For example, this:
double k = 2.2;
double *p = &k;
*p = 3.3;
is probably equivalent to this:
double k = 3.3;
Yes and no.
Global variables will stay in the same place.
Stack variables (inside a function) will get allocated and deallocated each time the function is called and returns. For example:
void k(int);
void f() {
int x;
k(x);
}
void g() {
f();
}
int main() {
f();
g();
}
Here, the second time f() is called, it's x will be in a different location.
There are several answers to this question, depending on factors you haven't mentioned.
If a data object's address is never taken, then a conforming C program cannot tell whether or not it even has an address. It might exist only in registers, or be optimized completely out; if it does exist in memory, it need not have a fixed address.
Data objects with "automatic" storage duration (to first approximation, function-local variables not declared with static) are created each time their containing function is invoked and destroyed when it exits; there may be multiple copies of them at any given time, and there's no guarantee that a new instance of one has the same address as an old one.
We speak of the & operator as "taking the address" of a data object, but technically speaking that's not what it does. It constructs a pointer to that data object. Pointers are opaque entities in the C standard. If you inspect the bits (by converting to integer) the result is implementation-defined. And if you inspect the bits twice in a row there is no guarantee that you get the same number! A hypothetical garbage-collected C implementation could track all pointers to each datum and update them as necessary when it moved the heap around. (People have actually tried this. It tends to break programs that don't stick to the letter of the rules.)
What does the following statement mean?
Local and dynamically allocated variables have addresses that are not known by the compiler when the source file is compiled
I used to think that local variables are allocated addresses at compile time, but this address can change when it will go out of scope and then come in scope again during function calling. But the above statement says addresess of local variables are not known by the compiler. Then how are local variables allocated? Why can global variables' addresses be known at compile time??
Also, can you please provide a good link to read how local variables and other are allocated?
Thanks in advance!
The above quote is correct - the compiler typically doesn't know the address of local variables at compile-time. That said, the compiler probably knows the offset from the base of the stack frame at which a local variable will be located, but depending on the depth of the call stack, that might translate into a different address at runtime. As an example, consider this recursive code (which, by the way, is not by any means good code!):
int Factorial(int num) {
int result;
if (num == 0)
result = 1;
else
result = num * Factorial(num - 1);
return result;
}
Depending on the parameter num, this code might end up making several recursive calls, so there will be several copies of result in memory, each holding a different value. Consequently, the compiler can't know where they all will go. However, each instance of result will probably be offset the same amount from the base of the stack frame containing each Factorial invocation, though in theory the compiler might do other things like optimizing this code so that there is only one copy of result.
Typically, compilers allocate local variables by maintaining a model of the stack frame and tracking where the next free location in the stack frame is. That way, local variables can be allocated relative to the start of the stack frame, and when the function is called that relative address can be used, in conjunction with the stack address, to look up the location of that variable in the particular stack frame.
Global variables, on the other hand, can have their addresses known at compile-time. They differ from locals primarily in that there is always one copy of a global variable in a program. Local variables might exist 0 or more times depending on how execution goes. As a result of the fact that there is one unique copy of the global, the compiler can hardcode an address in for it.
As for further reading, if you'd like a fairly in-depth treatment of how a compiler can lay out variables, you may want to pick up a copy of Compilers: Principles, Techniques, and Tools, Second Edition by Aho, Lam, Sethi, and Ullman. Although much of this book concerns other compiler construction techniques, a large section of the book is dedicated to implementing code generation and the optimizations that can be used to improve generated code.
Hope this helps!
In my opinion the statement is not talking about runtime access to variables or scoping, but is trying to say something subtler.
The key here is that its "local and dynamically allocated" and "compile time".
I believe what the statement is saying is that those addresses can not be used as compile time constants. This is in contrast to the address of statically allocated variables, which can be used as compile time constants. One example of this is in templates:
template<int *>
class Klass
{
};
int x;
//OK as it uses address of a static variable;
Klass<&::x> x_klass;
int main()
{
int y;
Klass<&y> y_klass; //NOT OK since y is local.
}
It seems there are some additional constraints on templates that don't allow this to compile:
int main()
{
static int y;
Klass<&y> y_klass;
}
However other contexts that use compile time constants may be able to use &y.
And similarly I'd expect this to be invalid:
static int * p;
int main()
{
p = new int();
Klass<p> p_klass;
}
Since p's data is now dynamically allocated (even though p is static).
Address of dynamic variables are not known for the expected reason,
as they are allocated dynamically from memory pool.
Address of local variables are not known, because they reside on
"stack" memory region. Stack winding-unwinding of a program may
defer based on runtime conditions of the code flow.
For example:
void bar(); // forward declare
void foo ()
{
int i; // 'i' comes before 'j'
bar();
}
void bar ()
{
int j; // 'j' comes before 'i'
foo();
}
int main ()
{
if(...)
foo();
else
bar();
}
The if condition can be true or false and the result is known only at runtime. Based on that int i or int j would take place at appropriate offset on stack.
It's a nice question.
While executing the code, program is loaded into memory. Then the local variable gets the address. At compile time, source code is converted into machine language code so that it can be executed