Static data memory limit in C++ application - c++

Someone has written function in our C++ application and is already in the production and I don't know why it's not crashing the application yet. Below is the code.
char *str_Modify()
{
char buffer[70000] = { 0 };
char targetString[70000] = { 0 };
memset(targetString, '\0', sizeof(targetString));
...
...
...
return targetString;
}
As you can see, the function is returning the address of a local variable and the allocated memory will be released once the function returns.
My question
Wanted to know what is the static data memory limit?
What can be the quick fix for this code? Is it a good practice to make the variable targetString static?

(Note that your call to memset has no effect, all the elements are zero-initialised prior to the call.)
It's not crashing the application since one manifestation of the undefined behaviour of your code (returning back a pointer to a now out-of-scope variable with automatic storage duration) is not crashing the application.
Yes, making it static does validate the pointer, but can create other issues centred around concurrent access.
And pick your language: In C++ there are other techniques.

Returning targetString is indeed UB as other answers have said. But there's another supplemental reason why it might crash on some platforms (especially embedded ones): Stack size. The stack segment, where auto variables usually live, is often limited to a few kilobytes; 64K may be common. Two 70K arrays might not be safe to use.
Making targetString static fixes both problems and is an unalloyed improvement IMO; but might still be problematic if the code is used re-entrantly from multiple threads. In some circumstances it could also be considered an inefficent use of memory.
An alternative approach might be to allocate the return buffer dynamically, return the pointer, and have the calling code free it when no longer required.
As for why might it not crash: if the stack segment is large enough and no other function uses enough of it to overwrite buffer[] and that gets pushed first; then targetString[] might survive unscathed, hanging just below the used stack, effectively in a world of its own. Very unsafe though!

It is well defined behaviour in C and C++ because return an address of a static variable exist in memory after function call is over.
For example:
#include <stdio.h>
int *f()
{
static int a[10]={0};
return a;
}
int main()
{
f();
return 0;
}
It is working fine on GCC compiler. [Live Demo]
But, if you remove static keyword then compiler generates warning:
prog.c: In function 'f':
prog.c:6:12: warning: function returns address of local variable [-Wreturn-local-addr]
return a;
^
Also, see this question comments wrote by Ludin.
I believe you are confusing this with int* fun (void) { static int i
= 10; return &i; } versus int* fun (void) { int i = 10; return &i; }, which is another story. The former is well-defined, the latter is
undefined behavior.
Also, tutorialspoint say's :
Second point to remember is that C does not advocate to return the
address of a local variable to outside of the function, so you would
have to define the local variable as static variable.

Wanted to know what is the static data memory limit?
Platform-specific. You haven't specified a platform (OS, compiler, version), so no-one can possibly tell you. It's probably fine though.
What can be the quick fix for this code?
The quick fix is indeed to make the buffer static.
The good fix is to rewrite the function as
char *modify(char *out, size_t outsz) {
// ...
return out;
}
(returning the input is just to simplify reusing the new function in existing code).
Is it a good practice to make the variable targetString static?
No. Sometimes it's the best you can do, but it has a number of problems:
The buffer is always the same size, and always using ~68Kb of memory and/or address space for no good reason. You can't use a bigger one in some contexts, and a smaller one in others. If you really have to memset the whole thing, this incurs a speed penalty in situations where the buffer could be much smaller.
Using static (or global) variables breaks re-entrancy. Simple example: code like
printf("%s,%s\n", str_Modify(1), str_Modify(2));
cannot work sanely, because the second invocation overwrites the first (compare strtok, which can't be used to interleave the tokenizing of two different strings, because it has persistent state).
Since it isn't re-entrant, it also isn't thread-safe, in case you use multiple threads. It's a mess.

Related

How to return a const char * from a C/C++ function?

I have the following C++ function that reads a string from the flash memory and returns it.
I want to avoid using the String class because this is on an Arduino and I've been advised that the String class for the Arduino is buggy.
const char* readFromFlashMemory()
{
char s[FLASH_STOP_ADDR-FLASH_START_ADDR+2];
memset(s, (byte)'\0', FLASH_STOP_ADDR-FLASH_START_ADDR+2);
unsigned short int numChars = 0;
for (int i = FLASH_START_ADDRESS; i <= FLASH_STOP_ADDRESS; i++)
{
byte b = EEPROM.read(i);
if (b == (byte)'\0')
return s;
else
s[numChars++] = (char)b;
}
}
This function seems to work. But the calling method gets back an empty string.
Am I not allowed to return a pointer to a character array that is on this function's stack?
How is the best/most idiomatic way I should write this function so the calling function receives the value that I want to pass to it?
The comments are probably going to lead you astray or at best confuse you. Let me break it down with some options.
Firstly, the problem is as you say: the array whose address you are returning no longer exists when the function is popped off the stack to the caller. Ignoring this results in Undefined Behavior.
Here are a few options, along with some discussion:
The caller owns the buffer
void readFromFlashMemory(char *s, size_t len)
Advantage is that the caller chooses how this memory is allocated, which is useful in embedded environments.
Note you could also choose to return s from this function as a convenience, or to convey some extra meaning.
For me, if I was working in an embedded environment such as Arduino, this would be my preference 100%.
Use std::string, std::vector or similar
std::string readFromFlashMemory()
This is probably the way you'd do it if you didn't care about allocation overhead and other potential issues such as fragmentation over time.
Allocate memory yourself
char* readFromFlashMemory()
If you want to ensure the allocated memory is exactly the right size, then you'd probably read into a local buffer first, and then allocate the memory and copy. Same memory considerations as std::string or other solutions dealing with heap memory.
This form also has the nightmarish property of the caller being responsible for managing the returned pointer and eventually calling delete[]. It's highly inadvisable. It's also distressingly common. :(
Better way to return dynamically allocated memory, if you absolutely must
std::unique_ptr<char[]> readFromFlashMemory()
Same as #3, but the pointer is managed safely. Requires C++11 or later.
Use a static buffer
const char* readFromFlashMemory()
{
static char s[FLASH_STOP_ADDR-FLASH_START_ADDR+2];
// ...
return s;
}
Generally frowned upon. Particularly because this type of pattern results in nasty problems in multi-threaded environments.
People mostly choose this approach because they're lazy and want a simple interface. I guess one benefit is that the caller doesn't have to know anything about what size buffer is acceptable.
Make your own class with an internal buffer
class FlashReader
{
public:
const char* Read();
private:
char buffer[FLASH_STOP_ADDR-FLASH_START_ADDR+2];
};
This is more of a verbose solution, and may start to smell like over-engineering. But it does combine the best parts of both #1 and #5. That is, you get stack allocation of your buffer, you don't need to know the size, and the function itself doesn't need extra arguments.
If you did want to have a static buffer, then you could just define one static instance of the class, but the difference would be a clear intent of this in the code.

Do I need to declare arrays as static local variables?

Let's say that I have a function that I call a lot which has an array in it:
char foo[LENGTH];
Depending upon the value of LENGTH this may be expensive to allocate every time the function is called. I have seen:
static char foo[LENGTH];
So that it is only allocated once and that array is always used: https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables
Is that best practice for arrays?
EDIT:
I've seen several responses that static locals are not best. But what about initialization cost? What if I'd called:
char foo[LENGTH] = "lorem ipsum";
Isn't that going to have to be copied every time I call the function?
As LENGTH is supposed to be a compile time constant (C++, no C99 VLA), foo is just going to use space on the stack. Very fast.
First off, time to allocate automatic array of char is not dependent on it's size, and on any sane implementation is a constant time complexity of incrementing stack pointer, which is superfast. Please note, this would be the same even for VLA (which are not valid in C++), only that increment would be a run-time operand. Also please note, the answer would be different if your array would be initialized.
So it is really unclear what performance drawback you are referring to.
On the other hand, if you make the said array static, you would incur no penalty whatsoever in the provided example - since char is not initialized, there will be no normal synchronization which prevents static variables from getting doubly initialized. However, your function will (likely) become thread-unsafe.
Bottom line: premature optimization is the root of evil.
"Allocating" an object of primitive data type and with automatic storage duration is usually not a big deal. The question is more: Do you want that the contents of foo to survive the execution of the function or not?
Consider, for example, following function:
char* bar() {
char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // returning a pointer to a local variable; undefined behaviour if someone will use it.
}
In this case, foo will go out of scope and will not be (legally) accessible when bar has finished.
Everything is OK, however, if you write
char* bar() {
static char foo[LENGTH];
strcpy(foo, "Hello!");
return foo; // foo has static storage duration and will be destroyed at the end of your program (not at the end of bar())
}
An issue with large variables with automatic storage duration might arise, if they get so large that they will exceed a (limited) stack size, or if you call the function recursively. To overcome this issue, however, you'd need to use dynamic memory allocation instead (i.e. new/delete).

Some basic C++ concepts — initialization and assignment

I was asked in an interview, to tell the exact difference for the following c/c++ code statements
int a = 10;
and
int a;
a = 10;
Though both assign the same value, they told me there is a lot of difference between the two in memory.
Can anyone please explain to me about this?
As far as language concerned, they are two ways to do the same thing, initialize the variable a and assign 10 to it.
The statement
int a; reserves memory for the value a which certainly contains garbage.
Because of that you initialize it with a = 10;
In the statement int a = 10; these two steps are done in the same statement.
First a part of memory is reserved to the variable a, and then the memory is overwritten with the value of 10.
int a = 10;
^^^^^ ^^^^^
reserve memory for the variable a write 10 to that memory location
Regarding memory the first declaration uses less memory on your PC because less characters are used, so your .c file will be smaller.
But after compilation the produced executable files will be the same.
IMPORTANT: if those statements are outside any function they are possibly not the same (although they will produce the same result).
The problem is that the first statement will assign 0 to a in the first statement because most compilers do that to global variables in C (and is defined by C standard).
"Though both assign the same value, but there is a lot of difference between the two in memory."
No, there's no difference in stack memory usage!
The difference is that assigning a value though initialization may cause some extra cost for additional assembler instructions (and thus memory needed to store it, aka. code footprint), the compiler can't optimize out at this point (because it's demanded).
If you initialize a immediately this will have some cost in code. You might want to delay initialization for later use, when the value of a is actually needed:
void foo(int x) {
int a; // int a = 30; may generate unwanted extra assembler instructions!
switch(x) {
case 0:
a = 10;
break;
case 1:
a = 20;
break;
default:
return;
}
// Do something with a correctly initialized a
}
This could have well been an interview question made to you in our company, by particular colleagues of mine. And they'd wanted you to answer, that just having the declaration for int a; in 1st place is the more efficient choice.
I'd say this interview question was made to see, if you're really have an in-depth understanding of c and c++ language (A mean-spirited though!).
Speaking for me personally, I'm more convenient on interviews about such stuff usually.
I consider the effect is just very minimal. Though it could well seriously matter on embedded MCU targets, where you have very limited space left for the code footprint (say less/equal than 256K), and/or need to use compiler toolchains that actually aren't able to optimize this out for themselves.
If you are talking about a global variable (one that doesn't appear in a block of code, but outside of all functions/methods):
int a;
makes a zero-initialized variable. Some (most?) c++ implementations will place this variable in a memory place (segment? section? whatever it is called) dedicated for zero-initialized variables.
int a = 10;
makes a variable initialized to something other than 0. Some implementations have a different region in memory for these. So this variable may have an address (&a) that is very different from the previous case.
This is, I guess, what you mean by "lot of difference between the two in memory".
Practically, this can affect your program if it has severe bugs (memory overruns) - they may get masked if a is defined in one manner or the other.
P.S. To make it clear, I am only talking about global variables here. So if your code is like int main() {int a; a = 10;} - here a is typically allocated on stack and there is no "difference in memory" between initialization and assignment.

Why doesn't the compiler know the addresses of local variables at compile-time?

What does the following statement mean?
Local and dynamically allocated variables have addresses that are not known by the compiler when the source file is compiled
I used to think that local variables are allocated addresses at compile time, but this address can change when it will go out of scope and then come in scope again during function calling. But the above statement says addresess of local variables are not known by the compiler. Then how are local variables allocated? Why can global variables' addresses be known at compile time??
Also, can you please provide a good link to read how local variables and other are allocated?
Thanks in advance!
The above quote is correct - the compiler typically doesn't know the address of local variables at compile-time. That said, the compiler probably knows the offset from the base of the stack frame at which a local variable will be located, but depending on the depth of the call stack, that might translate into a different address at runtime. As an example, consider this recursive code (which, by the way, is not by any means good code!):
int Factorial(int num) {
int result;
if (num == 0)
result = 1;
else
result = num * Factorial(num - 1);
return result;
}
Depending on the parameter num, this code might end up making several recursive calls, so there will be several copies of result in memory, each holding a different value. Consequently, the compiler can't know where they all will go. However, each instance of result will probably be offset the same amount from the base of the stack frame containing each Factorial invocation, though in theory the compiler might do other things like optimizing this code so that there is only one copy of result.
Typically, compilers allocate local variables by maintaining a model of the stack frame and tracking where the next free location in the stack frame is. That way, local variables can be allocated relative to the start of the stack frame, and when the function is called that relative address can be used, in conjunction with the stack address, to look up the location of that variable in the particular stack frame.
Global variables, on the other hand, can have their addresses known at compile-time. They differ from locals primarily in that there is always one copy of a global variable in a program. Local variables might exist 0 or more times depending on how execution goes. As a result of the fact that there is one unique copy of the global, the compiler can hardcode an address in for it.
As for further reading, if you'd like a fairly in-depth treatment of how a compiler can lay out variables, you may want to pick up a copy of Compilers: Principles, Techniques, and Tools, Second Edition by Aho, Lam, Sethi, and Ullman. Although much of this book concerns other compiler construction techniques, a large section of the book is dedicated to implementing code generation and the optimizations that can be used to improve generated code.
Hope this helps!
In my opinion the statement is not talking about runtime access to variables or scoping, but is trying to say something subtler.
The key here is that its "local and dynamically allocated" and "compile time".
I believe what the statement is saying is that those addresses can not be used as compile time constants. This is in contrast to the address of statically allocated variables, which can be used as compile time constants. One example of this is in templates:
template<int *>
class Klass
{
};
int x;
//OK as it uses address of a static variable;
Klass<&::x> x_klass;
int main()
{
int y;
Klass<&y> y_klass; //NOT OK since y is local.
}
It seems there are some additional constraints on templates that don't allow this to compile:
int main()
{
static int y;
Klass<&y> y_klass;
}
However other contexts that use compile time constants may be able to use &y.
And similarly I'd expect this to be invalid:
static int * p;
int main()
{
p = new int();
Klass<p> p_klass;
}
Since p's data is now dynamically allocated (even though p is static).
Address of dynamic variables are not known for the expected reason,
as they are allocated dynamically from memory pool.
Address of local variables are not known, because they reside on
"stack" memory region. Stack winding-unwinding of a program may
defer based on runtime conditions of the code flow.
For example:
void bar(); // forward declare
void foo ()
{
int i; // 'i' comes before 'j'
bar();
}
void bar ()
{
int j; // 'j' comes before 'i'
foo();
}
int main ()
{
if(...)
foo();
else
bar();
}
The if condition can be true or false and the result is known only at runtime. Based on that int i or int j would take place at appropriate offset on stack.
It's a nice question.
While executing the code, program is loaded into memory. Then the local variable gets the address. At compile time, source code is converted into machine language code so that it can be executed

what happens when you declare the same object/variable more than once (newbie)

what does something like this do?
static int i;
// wrapped in a big loop
void update_text()
{
std::stringstream ss; // this gets called again and again
++i;
ss << i;
text = new_text(ss.str()); // text and new_text are defined elsewhere
show_text(text); // so is this
}
does is create a new instance of ss in the stack with a new address and everything? would it be smarter to use sprintf with a char array?
Each time the function is called, a new, local, instance of std::stringstream ss is pushed upon on the stack. At the end of the function, this instance is destroyed and popped off the stack.
At no point in time does the scope of function update_text have multiple variables in its scope with the identifier ss. So, within the scope of update_text, there is only one ss identifier.
A character array would make no difference. Each time the function is called, the char array, if statically allocated, will be pushed onto the stack and popped off at the end. If you use dynamic memory and dynamically allocate the character array, the new and delete statements would still be executed each time the function was called, and the pointer to this character array would still be pushed and popped off the stack. The std::stringstream is already handling the new and delete for you internally.
Declaring an object multiple times would look like this:
void Function()
{
int x;
int x;
}
This would cause compiler errors.
Be warned, this however, is valid:
void Function()
{
int x;
if(true)
{
int x;
}
}
Because the two variables are of different scopes. The second x exists only within that if statement. As such, the compiler can infer that any reference to x after that declaration and within that scope refers to the second x. Note that the type doesn't matter, it's the identifier or "name" that matters.
Small point: Your question isn't about declaring an object more then once, but encountering a position where an object is initialized more then once.
So to answer your real question: Yes it will create new instance of ss each time the function is called (although if it's called from the loop chances are that the address will actually be the same, but that really shouldn't matter to the programmer).
For your second question: Would it be smarter to use sprintf with a char array? Well if you are new to c++ the answer you should take from this is no, since sprintf is in a way more dangerous to use then streams (lack of typesafety, risk over overflows). The actual answer would be it depends. Use sprintf if you know what you are doing and the performance you get with using stringstreams isn't enough for your purposes (which should happen rarely). Furthermore note that you could reuses stringstreams, which reduces the overhead of creating new ones each time (which is significant for streaming a single int) you can also look at Boost.Lexical_Cast for this type of casting. According to their the performance section of there documentation it should be as fast as sprintf for things like converting int to string (haven't tested it myself, so no guarantees) without exposing the lack of typesafety (and risk of bufferoverflows) of sprintf. C++11 also has std::to_string, which does the conversion without giving up safety (however it's much less flexible then boost::lexical_cast`).