Creating a temporary char buffer as a default function argument and binding an r-value reference to it allows us to compose statements on a single line whilst preventing the need to create storage on the heap.
const char* foo(int id, tmp_buf&& buf = tmp_buf()) // buf exists at call-site
Binding a reference/pointer to the temporary buffer and accessing it later yields undefined behaviour, because the temporary no longer exists.
As can be seen from the example app below the destructor for tmp_buf is called after the first output, and before the second output.
My compiler (gcc-4.8.2) doesn't warn that I'm binding a variable to a temporary. This means that using this kind of micro-optimisation to use an auto char buffer rather than std::string with associated heap allocation is very dangerous.
Someone else coming in and capturing the returned const char* could inadvertently introduce a bug.
1. Is there any way to get the compiler to warn for the second case below (capturing the temporary)?
Interestingly you can see that I tried to invalidate the buffer - which I failed to do, so it likely shows I don't fully understand where on the stack tmp_buf is being created.
2. Why did I not trash the memory in tmp_buf when I called try_stomp()? How can I trash tmp_buf?
3. Alternatively - is it safe to use in the manner I have shown? (I'm not expecting this to be true!)
code:
#include <iostream>
struct tmp_buf
{
char arr[24];
~tmp_buf() { std::cout << " [~] "; }
};
const char* foo(int id, tmp_buf&& buf = tmp_buf())
{
sprintf(buf.arr, "foo(%X)", id);
return buf.arr;
}
void try_stomp()
{
double d = 22./7.;
char buf[32];
snprintf(buf, sizeof(buf), "pi=%lf", d);
std::cout << "\n" << buf << "\n";
}
int main()
{
std::cout << "at call site: " << foo(123456789);
std::cout << "\n";
std::cout << "after call site: ";
const char* p = foo(123456789);
try_stomp();
std::cout << p << "\n";
return 0;
}
output:
at call site: foo(75BCD15) [~]
after call site: [~]
pi=3.142857
foo(75BCD15)
For question 2.
The reason you didn't trash the variable is that the compile probably allocated all the stack space it needed at the start of the function call. This includes all the stack space for the temporary objects, and objects that are declared inside a nested scope. You can't guarantee that the compiler does this (I think), rather than push objects on the stack as needed, but it is more efficient and easier to keep track of where your stack variables are this way.
When you call the try_stomp function, that function then allocates its stack after (or before, depending on your system) the stack for the main function.
Note that the default variables for a function call are actually by the compile to the calling code, rather than being part of the called function (which is why the need to be part of the function declaration, rather than the definition, if it was declared separately).
So your stack when in try_stomp looks something like this (there is a lot more going on in the stack, but these are the relevant parts):
main - p
main - temp1
main - temp2
try_stomp - d
try_stomp - buf
So you can't trash the temporary from try_stomp, at least not without doing something really outrageous.
Again, you can't rely on this layout, as it is compile dependent, and is just an exmaple of how the compiler might do it.
The way to trash the temporary buffer would be to do it in the destructor of tmp_buf.
Also interestingly, MSVC seems to allocate stack space for all of the temporary objects separately, rather than re-use the stack space for both objects. This means that even repeated calls to foo won't trash each other. Again, you can't depend on this behavior (I think - I couldn't find an reference to it).
For question 3.
No, don't do this!
Related
After reading the answer in Similar Question, I am still wodering why this code snippet doesn't get segment fault. The temporary object "data" shoule be release after leaving the scope. But apparently, "data" is not released.
Can somebody help me?
Thank you
#include <iostream>
#include <string>
#include <thread>
using namespace std;
void thfunc(string &data)
{
for (;;)
{
cout << data << endl;
}
}
int main()
{
{
string data = "123";
std::thread th1(thfunc, std::ref(data));
th1.detach();
}
for (;;)
{
cout << "main loop" << endl;
}
return 0;
}
Similar Question
The temporary object is released per the standard, leading to undefined behavior. An implementation may do anything, including keeping the object's bytes in stack memory until they are overwritten, which allows your code to (incorrectly) work.
When I disassembled the binary produced by my compiler (clang++ 9.0.1) I noticed that the stack pointer was not "regressed" when the block containing data ended, thus preventing it from being overwritten when
cout << "main loop" << endl; resulted in function calls.
Furthermore, due to short string optimization, the actual ASCII "123" is stored within the std::string object itself and not in a heap-allocated buffer.
The draft standard says the following:
6.6.4.3 Automatic storage duration
Block-scope variables not explicitly declared static, thread_local, or extern have automatic storage duration. The storage for these entities lasts until the block in which they are created exits.
Experimentally, if I make the string long enough to disable short string optimization, the program still silently works since the bytes in the buffer happen to remain intact in my experiment. If I enable ASAN, I get a correct heap use-after-free warning, because the bytes were freed at the end of the string's lifetime, yet were accessed through an illegal use of a pointer to the now-destructed string.
How variable int a is in existence without object creation? It is not of static type also.
#include <iostream>
using namespace std;
class Data
{
public:
int a;
void print() { cout << "a is " << a << endl; }
};
int main()
{
Data *cp;
int Data::*ptr = &Data::a;
cp->*ptr = 5;
cp->print();
}
Your code shows some undefined behavior, let's go through it:
Data *cp;
Creates a pointer on the stack, though, does not initialize it. On it's own not a problem, though, it should be initialized at some point. Right now, it can contain 0x0badc0de for all we know.
int Data::*ptr=&Data::a;
Nothing wrong with this, it simply creates a pointer to a member.
cp->*ptr=5;
Very dangerous code, you are now using cp without it being initialized. In the best case, this crashes your program. You are now assigning 5 to the memory pointed to by cp. As this was not initialized, you are writing somewhere in the memory space. This can include in the best case: memory you don't own, memory without write access. In both cases, your program can crash. In the worst case, this actually writes to memory that you do own, resulting in corruption of data.
cp->print();
Less dangerous, still undefined, so will read the memory. If you reach this statement, the memory is most likely allocated to your program and this will print 5.
It becomes worse
This program might actually just work, you might be able to execute it because your compiler has optimized it. It noticed you did a write, followed by a read, after which the memory is ignored. So, it could actually optimize your program to: cout << "a is "<< 5 <<endl;, which is totally defined.
So if this actually, for some unknown reason, would work, you have a bug in your program which in time will corrupt or crash your program.
Please write the following instead:
int main()
{
int stackStorage = 0;
Data *cp = &stackStorage;
int Data::*ptr=&Data::a;
cp->*ptr=5;
cp->print();
}
I'd like to add a bit more on the types used in this example.
int Data::*ptr=&Data::a;
For me, ptr is a pointer to int member of Data. Data::a is not an instance, so the address operator returns the offset of a in Data, typically 0.
cp->*ptr=5;
This dereferences cp, a pointer to Data, and applies the offset stored in ptr, namely 0, i.e., a;
So the two lines
int Data::*ptr=&Data::a;
cp->*ptr=5;
are just an obfuscated way of writing
cp->a = 5;
This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 8 years ago.
Please consider this simple example:
#include <iostream>
const int CALLS_N = 3;
int * hackPointer;
void test()
{
static int callCounter = 0;
int local = callCounter++;
hackPointer = &local;
}
int main()
{
for(int i = 0; i < CALLS_N; i++)
{
test();
std::cout << *hackPointer << "(" << hackPointer << ")";
std::cout << *hackPointer << "(" << hackPointer << ")";
std::cout << std::endl;
}
}
The output (VS2010, MinGW without optimization) has the same structure:
0(X) Y(X)
1(X) Y(X)
2(X) Y(X)
...
[CALLS_N](X) Y(X)
where X - some address in memory, Y - some rubbish number.
What is done here is the case of undefined behaviour. However I want to understand why there is such behaviour in current conditions (and it is rather stable for two compilers).
It seems that after test() call first read of hackPointer leads to valid memory, but second successive instant read of it leads to rubbish. Also on any call address of local is the same. I always thought that memory for stack variable is allocated on every function call and is released after return but I can't explain output of the program from this point of view.
"Releasing" automatic storage doesn't make the memory go away, or change the pattern of bits stored there. It just makes it available for reuse, and causes undefined behaviour if you try to access the object that used to be there.
Immediately after returning from the function, the memory occupied by the local probably hasn't been overwritten, so reading it will probably give the value that was assigned within the function.
After calling another function (in this case, operator<<()), the memory is likely to have been reused for a variable within that function, so probably has a different value.
You are quite right that this is undefined behaviour.
That aside, what's happening is that std::cout << *hackPointer involves a function call: operator<<() gets called after the value of *hackPointer has been read. In all likelihood, operator<<() uses its own local variables that end up on the stack where local was, wiping out the latter.
For example, if loaded a text file into an std::string, did what I needed to do with it, then called clear() on it, would this release the memory that held the text? Or would I be better off just declaring it as a pointer, calling new when I need it, and deleting it when I'm done?
Calling std::string::clear() merely sets the size to zero. The capacity() won't change (nor will reserve()ing less memory than currently reserved change the capacity). If you want to reclaim the memory allocated for a string, you'll need to do something along the lines of
std::string(str).swap(str);
Copying the string str will generally only reserve a reasonable amount of memory and swapping it with str's representation will install the resulting representation into str. Obviously, if you want the string to be empty you could use
std::string().swap(str);
The only valid method to release unused memory is to use member function shrink_to_fit(). Using swap has no any sense because the Standard does not say that unused memory will be released when this operation is used.
As an example
s.clear();
s.shrink_to_fit();
I realized the OP is old but wanted to add some precision. I found this post when I was attempting to understand a behavior that seemed to contredict the answers provide here.
With gcc4.8.3, clear() might look like it's releasing memory in some scenario.
std::string local("this is a test");
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
As expected I get
Before clear: 14
After clear: 14
Now lets add another string into the mix:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << "Before clear: " << local.capacity() << '\n';
local.clear();
std::cout << "After clear: " << local.capacity() << '\n';
This time I get:
Before clear: 14
After clear: 0
Looks like stdc++ has some optimisation to, whenever possible, share the memory holding the string content. Depending if a string is shared or not the behavior will differ. In the last example, when clear() is called, a new instance of the internal of std::string local is created and it will be empty(). The capacity will be set to 0. One might conclude from the ouput of capacity() that some memory was freed, but that is not the case.
This can be proven with the following code:
std::string local("this is a test");
std::string ref(local); // created from a ref to local
std::cout << (int*)local.data() << ' ' << (int*)ref.data() << '\n';
Will give:
0x84668cc 0x84668cc
The two strings point to the same data, i was not execting that. Add a local.clear() or anything that modifies local or ref and the adresses will then differ, obviously.
Regards
... would this release the memory that held the text?
No.
Or would I be better off just declaring it as a pointer ...
No, you'd be better off declaring the string in the scope in which it is needed, and letting its destructor be called. If you must release the memory in the scope which the string still exists, you can do this:
std::string().swap(the_string_to_clear);
I'm trying to write a simple program to show how variables can be manipulated indirectly on the stack. In the code below everything works as planned: even though the address for a is passed in, I can indirectly change the value of c. However, if I delete the last line of code (or any of the last three), then this no longer applies. Do those lines somehow force the compiler to put my 3 in variables sequentially onto the stack? My expectation was that that would always be the case.
#include <iostream>
using namespace std;
void someFunction(int* intPtr)
{
// write some code to break main's critical output
int* cptr = intPtr - 2;
*cptr = 0;
}
int main()
{
int a = 1;
int b = 2;
int c = 3;
someFunction(&a);
cout << a << endl;
cout << b << endl;
cout << "Critical value is (must be 3): " << c << endl;
cout << &a << endl;
cout << &b << endl;
cout << &c << endl; //when commented out, critical value is 3
}
Your code causes undefined behaviour. You can't pass a pointer to an int and then just subtract an arbitrary amount from it and expect it to point to something meaningful. The compiler can put a, b, and c wherever it likes in whatever order it likes. There is no guaranteed relationship of any kind between them, so you you can't assume someFunction will do anything meaningful.
The compiler can place those wherever and in whatever order it likes in the current stack frame, it may even optimize them out if not used. Just make the compiler do what you want, by using arrays, where pointer arithmetic is safe:
int main()
{
int myVars[3] = {1,2,3};
//In C++, one could use immutable (const) references for convenience,
//which should be optimized/eliminated pretty well.
//But I would never ever use them for pointer arithmetic.
int& const a = myVars[0];
int& const b = myVars[1];
int& const c = myVars[2];
}
What you do is undefined behaviour, so anything may happen. But what is probably going on, is that when you don't take the adress of c by commenting out cout << &c << endl;, the compiler may optimize avay the variable c. It then substitutes cout << c with cout << 3.
As many have answered, your code is wrong since triggering undefined behavior, see also this answer to a similar question.
In your original code the optimizing compiler could place a, b and c in registers, overlap their stack location, etc....
There are however legitimate reasons for wanting to know what are the location of local variables on the stack (precise garbage collection, introspection and reflection, ...).
The correct way would then to pack these variables in a struct (or a class) and to have some way to access that structure (for example, linking them in a list, etc.)
So your code might start with
void fun (void)
{
struct {
int a;
int b;
int c;
} _frame;
#define a _frame.a
#define b _frame.b
#define c _frame.c
do_something_with(&_frame); // e.g. link it
You could also use array members (perhaps even flexible or zero-length arrays for housekeeping routines), and #define a _frame.v[0] etc...
Actually, a good optimizing compiler could optimize that nearly as well as your original code.
Probably, the type of the _frame might be outside of the fun function, and you'll generate housekeeping functions for inspecting, or garbage collecting, that _frame.
Don't forget to unlink the frame at end of the routine. Making the frame an object with a proper constructor and destructor definitely helps. The constructor would link the frame and the destructor would unlink it.
For two examples where such techniques are used (both because a precise garbage collector is needed), see my qish garbage collector and the (generated C++) code of MELT (a domain specific language to extend GCC). See also the (generated) C code of Chicken Scheme or Ocaml runtime conventions (and its <caml/memory.h> header).
In practice, such an approach is much more welcome for generated C or C++ code (precisely because you will also generate the housekeeping code). If writing them manually, consider at least fancy macros (and templates) to help you. See e.g. gcc/melt-runtime.h
I actually believe that this is a deficiency in C. There should be some language features (and compiler implementations) to introspect the stack and to (portably) backtrace on it.