Let's say that I have a function that I call a lot which has an array in it:
char foo[LENGTH];
Depending upon the value of LENGTH this may be expensive to allocate every time the function is called. I have seen:
static char foo[LENGTH];
So that it is only allocated once and that array is always used: https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables
Is that best practice for arrays?
EDIT:
I've seen several responses that static locals are not best. But what about initialization cost? What if I'd called:
char foo[LENGTH] = "lorem ipsum";
Isn't that going to have to be copied every time I call the function?
As LENGTH is supposed to be a compile time constant (C++, no C99 VLA), foo is just going to use space on the stack. Very fast.
First off, time to allocate automatic array of char is not dependent on it's size, and on any sane implementation is a constant time complexity of incrementing stack pointer, which is superfast. Please note, this would be the same even for VLA (which are not valid in C++), only that increment would be a run-time operand. Also please note, the answer would be different if your array would be initialized.
So it is really unclear what performance drawback you are referring to.
On the other hand, if you make the said array static, you would incur no penalty whatsoever in the provided example - since char is not initialized, there will be no normal synchronization which prevents static variables from getting doubly initialized. However, your function will (likely) become thread-unsafe.
Bottom line: premature optimization is the root of evil.
"Allocating" an object of primitive data type and with automatic storage duration is usually not a big deal. The question is more: Do you want that the contents of foo to survive the execution of the function or not?
Consider, for example, following function:
char* bar() {
char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // returning a pointer to a local variable; undefined behaviour if someone will use it.
}
In this case, foo will go out of scope and will not be (legally) accessible when bar has finished.
Everything is OK, however, if you write
char* bar() {
static char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // foo has static storage duration and will be destroyed at the end of your program (not at the end of bar())
}
An issue with large variables with automatic storage duration might arise, if they get so large that they will exceed a (limited) stack size, or if you call the function recursively. To overcome this issue, however, you'd need to use dynamic memory allocation instead (i.e. new/delete).
Related
Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.
char *str_Modify()
{
char buffer[70000] = { 0 };
char targetString[70000] = { 0 };
memset(targetString, '\0', sizeof(targetString));
...
...
...
return targetString;
}
As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns.
My question
Wanted to know what is the static data memory limit?
What can be the quick fix for this code? Is it a good practice to make the variable targetString static?
(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)
It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.
Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.
And pick your language: In C++ there are other techniques.
Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.
Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.
An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.
As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!
It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.
For example:
#include <stdio.h>
int *f()
{
static int a[10]={0};
return a;
}
int main()
{
f();
return 0;
}
It is working fine on GCC compiler. [Live Demo]
But, if you remove static keyword then compiler generates warning:
prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
return a;
^
Also, see this question comments wrote by Ludin.
I believe you are confusing this with int* fun (void) { static int i
= 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is
undefined behavior.
Also, tutorialspoint say's :
Second point to remember is that C does not advocate to return the
address of a local variable to outside of the function, so you would
have to define the local variable as static variable.
Wanted to know what is the static data memory limit?
Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.
What can be the quick fix for this code?
The quick fix is indeed to make the buffer static.
The good fix is to rewrite the function as
char *modify(char *out, size_t outsz) {
// ...
return out;
}
(returning the input is just to simplify reusing the new function in existing code).
Is it a good practice to make the variable targetString static?
No. Sometimes it's the best you can do, but it has a number of problems:
The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
Using static (or global) variables breaks re-entrancy. Simple example: code like
printf("%s,%s\n", str_Modify(1), str_Modify(2));
cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).
Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.
Do built-in types which are not defined dynamically, always stay in the same piece of memory during the duration of the program?
If it's something I should understand how do I go about and check it?
i.e.
int j = 0;
double k = 2.2;
double* p = &k;
Does the system architecture or compiler move around all these objects if a C/C++ program is, say, highly memory intensive?
Note: I'm not talking about containers such as std::vectors<T>. These can obviously reallocate in certain situations, but again this is dynamic.
side question:
The following scenario will obviously raise a few eyebrows. Just as an example, will this pointer always be valid during the duration of the program?
This side-question is obsolete, thanks to my ignorance!
struct null_deleter
{
void operator() (void const *) const {};
};
int main()
{
// define object
double b=0;
// define shared pointer
std::shared_ptr<double> ptr_store;
ptr_store.reset(&b,null_deleter()); // this works and behaves how you would expect
}
In the abstract machine, an object's address does not change during that object's lifetime.
(The word "object" here does not refer to "object-oriented" anything; an "object" is merely a region of storage.)
That really means that a program must behave as if an object's address never changes. A compiler can generate code that plays whatever games it likes, including moving objects around or not storing them anywhere at all, as long as such games don't affect the visible behavior in a way that violates the standard.
For example, this:
int n;
int *addr1 = &n;
int *addr2 = &n;
if (addr1 == addr2) {
std::cout << "Equal\n";
}
must print "Equal" -- but a clever optimizing compiler could legally eliminate everything but the output statement.
The ISO C standard states this explcitly, in section 6.2.4:
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address, and retains its last-stored value throughout
its lifetime.
with a (non-normative) footnote:
The term "constant address" means that two pointers to the object
constructed at possibly different times will compare equal. The
address may be different during two different executions of the same
program.
I haven't found a similar explicit statement in the C++ standard; either I'm missing it, or the authors considered it too obvious to bother stating.
The compiler is free to do whatever it wants, so long as it doesn't affect the observable program behaviour.
Firstly, consider that local variables might not even get put in memory (they might get stored in registers only, or optimized away entirely).
So even in your example where you take the address of a local variable, that doesn't mean that it has to live in a fixed location in memory. It depends what you go on to do with it, and whether the compiler is smart enough to optimize it. For example, this:
double k = 2.2;
double *p = &k;
*p = 3.3;
is probably equivalent to this:
double k = 3.3;
Yes and no.
Global variables will stay in the same place.
Stack variables (inside a function) will get allocated and deallocated each time the function is called and returns. For example:
void k(int);
void f() {
int x;
k(x);
}
void g() {
f();
}
int main() {
f();
g();
}
Here, the second time f() is called, it's x will be in a different location.
There are several answers to this question, depending on factors you haven't mentioned.
If a data object's address is never taken, then a conforming C program cannot tell whether or not it even has an address. It might exist only in registers, or be optimized completely out; if it does exist in memory, it need not have a fixed address.
Data objects with "automatic" storage duration (to first approximation, function-local variables not declared with static) are created each time their containing function is invoked and destroyed when it exits; there may be multiple copies of them at any given time, and there's no guarantee that a new instance of one has the same address as an old one.
We speak of the & operator as "taking the address" of a data object, but technically speaking that's not what it does. It constructs a pointer to that data object. Pointers are opaque entities in the C standard. If you inspect the bits (by converting to integer) the result is implementation-defined. And if you inspect the bits twice in a row there is no guarantee that you get the same number! A hypothetical garbage-collected C implementation could track all pointers to each datum and update them as necessary when it moved the heap around. (People have actually tried this. It tends to break programs that don't stick to the letter of the rules.)
What does the following statement mean?
Local and dynamically allocated variables have addresses that are not known by the compiler when the source file is compiled
I used to think that local variables are allocated addresses at compile time, but this address can change when it will go out of scope and then come in scope again during function calling. But the above statement says addresess of local variables are not known by the compiler. Then how are local variables allocated? Why can global variables' addresses be known at compile time??
Also, can you please provide a good link to read how local variables and other are allocated?
Thanks in advance!
The above quote is correct - the compiler typically doesn't know the address of local variables at compile-time. That said, the compiler probably knows the offset from the base of the stack frame at which a local variable will be located, but depending on the depth of the call stack, that might translate into a different address at runtime. As an example, consider this recursive code (which, by the way, is not by any means good code!):
int Factorial(int num) {
int result;
if (num == 0)
result = 1;
else
result = num * Factorial(num - 1);
return result;
}
Depending on the parameter num, this code might end up making several recursive calls, so there will be several copies of result in memory, each holding a different value. Consequently, the compiler can't know where they all will go. However, each instance of result will probably be offset the same amount from the base of the stack frame containing each Factorial invocation, though in theory the compiler might do other things like optimizing this code so that there is only one copy of result.
Typically, compilers allocate local variables by maintaining a model of the stack frame and tracking where the next free location in the stack frame is. That way, local variables can be allocated relative to the start of the stack frame, and when the function is called that relative address can be used, in conjunction with the stack address, to look up the location of that variable in the particular stack frame.
Global variables, on the other hand, can have their addresses known at compile-time. They differ from locals primarily in that there is always one copy of a global variable in a program. Local variables might exist 0 or more times depending on how execution goes. As a result of the fact that there is one unique copy of the global, the compiler can hardcode an address in for it.
As for further reading, if you'd like a fairly in-depth treatment of how a compiler can lay out variables, you may want to pick up a copy of Compilers: Principles, Techniques, and Tools, Second Edition by Aho, Lam, Sethi, and Ullman. Although much of this book concerns other compiler construction techniques, a large section of the book is dedicated to implementing code generation and the optimizations that can be used to improve generated code.
Hope this helps!
In my opinion the statement is not talking about runtime access to variables or scoping, but is trying to say something subtler.
The key here is that its "local and dynamically allocated" and "compile time".
I believe what the statement is saying is that those addresses can not be used as compile time constants. This is in contrast to the address of statically allocated variables, which can be used as compile time constants. One example of this is in templates:
template<int *>
class Klass
{
};
int x;
//OK as it uses address of a static variable;
Klass<&::x> x_klass;
int main()
{
int y;
Klass<&y> y_klass; //NOT OK since y is local.
}
It seems there are some additional constraints on templates that don't allow this to compile:
int main()
{
static int y;
Klass<&y> y_klass;
}
However other contexts that use compile time constants may be able to use &y.
And similarly I'd expect this to be invalid:
static int * p;
int main()
{
p = new int();
Klass<p> p_klass;
}
Since p's data is now dynamically allocated (even though p is static).
Address of dynamic variables are not known for the expected reason,
as they are allocated dynamically from memory pool.
Address of local variables are not known, because they reside on
"stack" memory region. Stack winding-unwinding of a program may
defer based on runtime conditions of the code flow.
For example:
void bar(); // forward declare
void foo ()
{
int i; // 'i' comes before 'j'
bar();
}
void bar ()
{
int j; // 'j' comes before 'i'
foo();
}
int main ()
{
if(...)
foo();
else
bar();
}
The if condition can be true or false and the result is known only at runtime. Based on that int i or int j would take place at appropriate offset on stack.
It's a nice question.
While executing the code, program is loaded into memory. Then the local variable gets the address. At compile time, source code is converted into machine language code so that it can be executed
what does something like this do?
static int i;
// wrapped in a big loop
void update_text()
{
std::stringstream ss; // this gets called again and again
++i;
ss << i;
text = new_text(ss.str()); // text and new_text are defined elsewhere
show_text(text); // so is this
}
does is create a new instance of ss in the stack with a new address and everything? would it be smarter to use sprintf with a char array?
Each time the function is called, a new, local, instance of std::stringstream ss is pushed upon on the stack. At the end of the function, this instance is destroyed and popped off the stack.
At no point in time does the scope of function update_text have multiple variables in its scope with the identifier ss. So, within the scope of update_text, there is only one ss identifier.
A character array would make no difference. Each time the function is called, the char array, if statically allocated, will be pushed onto the stack and popped off at the end. If you use dynamic memory and dynamically allocate the character array, the new and delete statements would still be executed each time the function was called, and the pointer to this character array would still be pushed and popped off the stack. The std::stringstream is already handling the new and delete for you internally.
Declaring an object multiple times would look like this:
void Function()
{
int x;
int x;
}
This would cause compiler errors.
Be warned, this however, is valid:
void Function()
{
int x;
if(true)
{
int x;
}
}
Because the two variables are of different scopes. The second x exists only within that if statement. As such, the compiler can infer that any reference to x after that declaration and within that scope refers to the second x. Note that the type doesn't matter, it's the identifier or "name" that matters.
Small point: Your question isn't about declaring an object more then once, but encountering a position where an object is initialized more then once.
So to answer your real question: Yes it will create new instance of ss each time the function is called (although if it's called from the loop chances are that the address will actually be the same, but that really shouldn't matter to the programmer).
For your second question: Would it be smarter to use sprintf with a char array? Well if you are new to c++ the answer you should take from this is no, since sprintf is in a way more dangerous to use then streams (lack of typesafety, risk over overflows). The actual answer would be it depends. Use sprintf if you know what you are doing and the performance you get with using stringstreams isn't enough for your purposes (which should happen rarely). Furthermore note that you could reuses stringstreams, which reduces the overhead of creating new ones each time (which is significant for streaming a single int) you can also look at Boost.Lexical_Cast for this type of casting. According to their the performance section of there documentation it should be as fast as sprintf for things like converting int to string (haven't tested it myself, so no guarantees) without exposing the lack of typesafety (and risk of bufferoverflows) of sprintf. C++11 also has std::to_string, which does the conversion without giving up safety (however it's much less flexible then boost::lexical_cast`).
POD means primitive data type without constructor and destructor.
I am curious, how compilers handle lazy initialization of POD static local variables. What is the implication of lazy initialization if the function are meant to be run inside tight loops in multithreaded applications? These are the possible choices. Which one is better?
void foo_1() {
static const int v[4] = {1, 2, 3, 4};
}
void foo_2() {
const int v[4] = {1, 2, 3, 4};
}
How about this? No lazy initialization, but slightly clumsy syntax?
struct Bar
{
static const int v[4];
void foo_3()
{
// do something
}
};
const int My::v[4] = {1, 2, 3, 4};
When a static variable is initialized with constant data, all compilers that I'm familiar with will initialize the values at compile time so that there is no run time overhead whatsoever.
If the variable isn't static it must be allocated on each function invocation, and the values must be copied into it. I suppose it's possible that the compiler might optimize this into a static if it's a const variable, except that const-ness can be cast away.
In foo_1(), v is initialized sometime before main() starts. In foo_2(), v is created and initialized every time foo_2() is called. Use foo_1() to eliminate that extra cost.
In the second example, Bar::v is also initialized sometime before main().
Performance is more complex than just allocation. For example, you could cause an extra cache line to have to be in cache with the static variable, because it's not contiguous with other local memory that you're using, and increase cache pressure, cache misses, and suchlike. In comparison to this cost, I would say that the incredibly tiny overhead of re-allocating the array on the stack every time would be very trivial. Not just that, but any compiler is excellent at optimizing things like that, whereas it can't do anything about static variables.
In any case, I would suggest that the performance difference between the two is minimal - even for inside a tight loop.
Finally, you may as well use foo_2()- the compiler is perfectly within it's rights to make a variable like that static. As it was initially defined as const, const_casting the const away is undefined behaviour, regardless of whether or not it's static. However, it can't choose to make a static constant non-static, as you could be depending upon the ability to return it's address, for example.
An easy method to find out how variables are initialized is to print an assembly language listing of a function that has static and local variables.
Not all compiler initialize variables in the same method. Here is a common practice:
Before the main() method global variables are initialized by copying a section of values into the variables. Many compilers will place the constants into an area so that the data can be assigned using simple assembly move or copy instructions.
Local variables (variables with local scope) may be initialized upon entering the local scope and before the first statement in the scope is executed. This depends upon many factors, one of them is the constness of the variable.
Constants may be placed directly into the executable code, or they may be a pointer to a value in ROM, or copied into memory or register. This is decided by the compiler for best performance or code size, depending on the compiler's settings.
On the technical side, foo_1 and foo_3 are required to initialize their arrays before any functions, including class constructors, are called. That guarantee is essentially as good as no runtime. And in practice, most implementations don't need any runtime to initialize them.
This guarantee applies only to objects of POD type with static storage duration which are initialized with "constant expressions". A few more contrasting examples:
void foo_4() {
static const int v[4] = { firstv(), 2, 3, 4 };
}
namespace { // anonymous
const int foo_5_data[4] = { firstv(), 2, 3, 4 };
}
void foo_5() {
const int (&v)[4] = foo_5_data;
}
The data for foo_4 is initialized the first time foo_4 is called. (Check your compiler documentation to find out whether this is thread-safe!)
The data for foo_5 is initialized at some time before main() but might be after some other dynamic initializations.
But none of this really answers questions about performance, and I'm not qualified to comment on that. #DeadMG's answer looks helpful.
You have a static initialization in all those cases, all your static variables will be initialized by the virtue of loading data segment into memory. The const in foo_2 can be initialized away if compiler finds it possible.
If you had a dynamic initialization, then initialization of variables in the namespace scope can be deferred until their first use. Similarly, dynamic initialization of local static variables in the scope of function can be performed during the first pass through the function or earlier. Additionally, compiler can statically initialize those variables if it's able to do that. I don't remember the exact verbiage from the Standard.