C++ scope guard with zero overhead - c++

In C++ we can ensure foo is called when we exit a scope by putting foo() in the destructor of a local object. That's what I think of when I head "scope guard." There are plenty of generic implementations.
I'm wondering—just for fun—if it's possible to achieve the behavior of a scope guard with zero overhead compared to just writing foo() at every exit point.
Zero overhead, I think:
{
try {
do_something();
} catch (...) {
foo();
throw;
}
foo();
}
Overhead of at least 1 byte to give the scope guard an address:
{
scope_guard<foo> sg;
do_something();
}
Do compilers optimize away giving sg an address?
A slightly more complicated case:
{
Bar bar;
try {
do_something();
} catch (...) {
foo(bar);
throw;
}
foo(bar);
}
versus
{
Bar bar;
scope_guard<[&]{foo(bar);}> sg;
do_something();
}
The lifetime of bar entirely contains the lifetime of sg and its held lambda (destructors are called in reverse order) but the lambda held by sg still has to hold a reference to bar. I mean for example int x; auto l = [&]{return x;}; gives sizeof(l) == 8 on my 64-bit system.
Is there maybe some template metaprogramming magic that achieve the scope_guard sugar without any overhead?

If by overhead you mean how much space is occupied by scope-guard variable then zero overhead is possible if functional object is compile-time value. I've coded small snippet to illustrate this:
Try it online!
#include <iostream>
template <auto F>
class ScopeGuard {
public:
~ScopeGuard() { F(); }
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
char a = 0;
ScopeGuard<&Cleanup> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
{
auto constexpr f = []{
std::cout << "Cleanup lambda..." << std::endl; };
char a = 0;
ScopeGuard<f> sg;
char b = 0;
std::cout << "Stack difference "
<< int(&a - &b - sizeof(char)) << std::endl;
}
}
Output:
Stack difference 0
Cleanup func...
Stack difference 0
Cleanup lambda...
Code above doesn't create even a single byte on a stack, because any class variable that has no fields occupies on stack 0 bytes, this is one of obvious optimizations that is done by any compiler. Of course unless you take a pointer to such object then compiler is obliged to create 1-byte memory object. But in your case you don't take address to scoped guard.
You can see that there is not a single byte occupied by looking at Try it online! link above the code, it shows assembler output of CLang.
To have no fields at all scoped guard class should only use compile-time function object, like global function pointer of lambda without capture. This two kinds of objects are used in my code above.
In code above you can even see that I outputted stack difference of char variable before and after scoped guard variable to show that scoped guard actually occupies 0 bytes.
Lets go a bit further and make possibility to have non-compile-time values of functional objects.
For this again we create class with no fields, but now store all functional objects inside one shared vector with thread local storage.
Again as we have no fields in class and don't take any pointer to scoped guard object then compiler doesn't create not a single byte for scoped guard object on stack.
But instead single shared vector is allocated in heap. This way you can trade stack storage for heap storage if you're out of stack memory.
Also having shared vector will allow us to use as few memory as possible, because vector uses only as much memory as many there are nested blocks that use scoped guard. If all scoped guards are located sequentially in different blocks then vector will have just 1 element inside so using just few bytes of memory for all scoped guards that were used.
Why heap memory of shared vector is more economical memory-wise than stack-stored memory of scoped guard. Because in case of stack memory if you have several sequential blocks of guards:
void test() {
{
ScopeGuard sg(f0);
}
{
ScopeGuard sg(f1);
}
{
ScopeGuard sg(f2);
}
}
then all 3 guards occupy tripple amount of memory on stack, because for each function like test() above compiler allocates stack memory for all used in function's variables, so for 3 guards it allocates tripple amount.
In case of shared vector test() function above will use just 1 vector's element, so vector will have size of 1 at most hence will use just single amount of memory to store functional object.
Hence if you have many non-nested scoped guards inside one function then shared vector will be much more economical.
Now below I present code snippet for shared-vector approach with zero fields and zero stack memory overhead. To remind, this approach allows to use non-compile-time functional objects unlike solution in part one of my answer.
Try it online!
#include <iostream>
#include <vector>
#include <functional>
class ScopeGuard2 {
public:
static auto & Funcs() {
thread_local std::vector<std::function<void()>> funcs_;
return funcs_;
}
ScopeGuard2(std::function<void()> f) {
Funcs().emplace_back(std::move(f));
}
~ScopeGuard2() {
Funcs().at(Funcs().size() - 1)();
Funcs().pop_back();
}
};
void Cleanup() {
std::cout << "Cleanup func..." << std::endl;
}
int main() {
{
ScopeGuard2 sg(&Cleanup);
}
{
auto volatile x = 123;
auto const f = [&]{
std::cout << "Cleanup lambda... x = "
<< x << std::endl;
};
ScopeGuard2 sg(f);
}
}
Output:
Cleanup func...
Cleanup lambda... x = 123

It's not exactly clear what you mean by 'zero overhead' here.
Do compilers optimize away giving sg an address?
Most likely modern mainstream compilers will do it when run in optimizing modes. Unfortunately, that's as much definite as it can get. It depends on the environment and has to be tested to be relied upon.
If the question is if there is a guaranteed way to avoid <anything> in the resulting assembly, the answer is negative. As #Peter said in the comment, compiler is allowed to do anything to produce the equivalent result. It may not ever call foo() at all, even if you write it there verbatim - when it can prove that nothing in the observed program behavior will change.

Related

Accessing data on heap in c++

I have been trying to dive deeper into the limitations of pointers to see how they effect the program behind the scenes. One thing my research has led me to is the variables created by pointers must be deleted in a language like C++, otherwise the data will still be on memory.
My question pertains to accessing the data after a functions lifecycle ends. If I create a pointer variable within a function, and then the function comes to a proper close, how would the data be accessed? Would it actually be just garbage taking up space, or is there supposed to be a way to still reference it without having stored the address in another variable?
There's no automatic garbage collection. If you lose the handle (pointer, reference, index, ...) to your resource, your resource will live ad vitam æternam.
If you want your resources to cease to live when their handle goes out of scope, RAII and smart pointers are the tool you need.
If you want your resources to continue to live after their handle goes out of scope, you need to copy the handle and pass it around.
Using standard smart pointers std::unique_ptr and std::shared_ptr memory is freed when pointer goes out of scope. After scope ends object is immediately destroyed+freed and there is no way to access it anymore. Unless you move/copy pointer out of scope to bigger scope, where it will be deleted.
But there is not so difficult to implement lazy garbage collector. Same as before you use smart pointers everywhere, but lazey variant. Now when pointer goes out of scope its object is not immediately destroyed+freed, but instead is delegated to lazy garbage collector, which will destroy+free it later in a separate thread. Exactly this lazy behaviour I implemented in my code below.
I implemented following code from scratch just for fun and as a demo for you, there is no big point why not to use standard greedy freeing techniques of std::unique_ptr and std::shared_ptr. Although there is one very important use case - std::shared_ptr constructs objects at well known points in code, when you call constructor, and you know construction time well, but destroys objects at different undefined points in code and time, because there are shared copies of shared pointer. Thus you may have long destruction delays at unpredicted points in time, which may harm realtime high performance code. Also destruction time might be too big. Lazy deleting moves destruction into separate thread where it can be deleted at its own pace.
Although smart pointer is lazily disposed at scope end, but yet for some nano seconds (or even micro seconds) you may still have access to its undestroyed/unfreed memory, of course time is not guaranteed. This just means that real destruction can happen much later than scope ends, thus is the name of lazy garbage collector. You can even tweak this kind of lazy garbage collector so that it really deletes objects lets say after 1 milli second after theirs smart pointers have been destroyed.
Real garbage collectors are doing similar thing, they free objects much later in time and usually do it automatically by finding bytes in memory that look like real pointers of heap.
There is a Test() function in my code that shows how my lazy variants of standard pointers are used. Also when code is runned you may see in console output that it shows something like:
Construct Obj( 592)
Construct Obj( 1264)
LazyDeleter Dispose( 1264)
LazyDeleter Dispose( 592)
Test finished
Destroy ~Obj( 1264)
Destroy ~Obj( 592)
Here in parenthesis it shows id of object (lower bits of its pointer). You may see that disposal and destruction is done in order exactly opposite to construction order. Disposal to lazy garbage collector happens before test finishes. While real destruction happens later in a separate thread after test finishes.
Try it online!
#include <deque>
#include <atomic>
#include <mutex>
#include <thread>
#include <array>
#include <memory>
#include <iostream>
#include <iomanip>
using DelObj = void (void *);
void Dispose(void * obj, DelObj * del);
template <typename T>
struct LazyDeleter {
void operator ()(T * ptr) const {
struct SDel { static void Del(void * ptr) { delete (T*)ptr; } };
std::cout << "LazyDeleter Dispose(" << std::setw(5) << uintptr_t(ptr) % (1 << 16) << ")" << std::endl;
Dispose(ptr, &SDel::Del);
}
};
template <typename T>
using lazy_unique_ptr = std::unique_ptr<T, LazyDeleter<T>>;
template <typename T>
std::shared_ptr<T> make_lazy_shared(T * ptr) {
return std::shared_ptr<T>(ptr, LazyDeleter<T>{});
}
void Dispose(void * obj, DelObj * del) {
class AtomicMutex {
public:
auto Locker() { return std::lock_guard<AtomicMutex>(*this); }
void lock() { while (f_.test_and_set(std::memory_order_acquire)) {} }
void unlock() { f_.clear(std::memory_order_release); }
auto & Flag() { return f_; }
private:
std::atomic_flag f_ = ATOMIC_FLAG_INIT;
};
class DisposeThread {
struct Entry {
void * obj = nullptr;
DelObj * del = nullptr;
};
public:
DisposeThread() : thr_([&]{
size_t constexpr block = 32;
while (!finish_.load(std::memory_order_relaxed)) {
while (true) {
std::array<Entry, block> cent{};
size_t cent_cnt = 0;
{
auto lock = mux_.Locker();
if (entries_.empty())
break;
cent_cnt = std::min(block, entries_.size());
std::move(entries_.begin(), entries_.begin() + cent_cnt, cent.data());
entries_.erase(entries_.begin(), entries_.begin() + cent_cnt);
}
for (size_t i = 0; i < cent_cnt; ++i) {
auto & entry = cent[i];
try { (*entry.del)(entry.obj); } catch (...) {}
}
}
std::this_thread::yield();
}
}) {}
~DisposeThread() {
while (!entries_.empty())
std::this_thread::yield();
finish_.store(true, std::memory_order_relaxed);
thr_.join();
}
void Add(void * obj, DelObj * del) {
auto lock = mux_.Locker();
entries_.emplace_back(Entry{obj, del});
}
private:
AtomicMutex mux_{};
std::thread thr_{};
std::deque<Entry> entries_;
std::atomic<bool> finish_ = false;
};
static DisposeThread dt{};
dt.Add(obj, del);
}
void Test() {
struct Obj {
Obj() { std::cout << "Construct Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
~Obj() { std::cout << "Destroy ~Obj(" << std::setw(5) << uintptr_t(this) % (1 << 16) << ")" << std::endl << std::flush; }
};
{
lazy_unique_ptr<Obj> uptr(new Obj());
std::shared_ptr<Obj> sptr = make_lazy_shared(new Obj());
auto sptr2 = sptr;
}
std::cout << "Test finished" << std::endl;
}
int main() {
Test();
}

Is there a way to track memory leaks?

I'm asking myself whether there's a proper way to detect undestroyed instances of an object systematically. Learning that even using smart pointers like std::shared_ptr can result in memory leaks when estalishing circular references I became a bit paranoid abount this topic. When programs become bigger and bigger it becomes impossible to check for memory leaks just by looking at the code.
Therefore I came up with an extremely naive approach within a few minutes to track stuff like that. I would like to ask for your opinion about that and things i might miss here:
My idea is to define a template class which defines every constructor (default, copy, move) and increments on creation an static counter of created instances and decrements that counter on destruction.
Every class I want to track inherits from that class:
Watch this code:
template<typename T>
class creationTracker
{
public:
creationTracker() { ++createdInstances; }
creationTracker(const creationTracker&) {++createdInstances; }
creationTracker(creationTracker&&) noexcept { ++createdInstances; }
virtual ~creationTracker() { --createdInstances; }
static unsigned createdInstances;
};
template<typename T>
unsigned creationTracker<T>::createdInstances{ 0 };
/* example class to demo detection of circular references */
class foo : public creationTracker<foo>
{
public:
void setSharedPointer(std::shared_ptr<foo> p) { pv = p; }
private:
std::shared_ptr<foo> pv;
};
Every class inherited from that creationTracker has its one static counter.
The class foo is used to enforce a circular reference when used in a wrong way:
For example like this:
int main()
{
{//inner scope to be able to print instances before exiting main
std::shared_ptr<foo> foo1(new foo);
std::shared_ptr<foo> foo2(new foo);
foo1->setSharedPointer(foo2);
foo2->setSharedPointer(foo1);
std::cout << "In Scope: " << foo::createdInstances << std::endl;
}
std::cout << "Scope exit: " << foo::createdInstances << std::endl;
}
Console output:
In Scope: 2
Scope exit: 2 // not destroyed as expected
On the otherhand doing this also detects correct destruction of every instance:
int main()
{
{//inner scope to be able to print instances before exiting main
foo foo1{}; // normal construct
foo foo2{ foo1 }; // copy
foo foo3{ std::move(foo1) }; // move construct
foo foo4 = foo2; // copy assignment
foo foo5 = std::move(foo3); // move assignment
std::cout << "In Scope: " << foo::createdInstances << std::endl;
}
std::cout << "Scope exit: " << foo::createdInstances << std::endl;
}
Console output:
In Scope: 5
Scope exit: 0 all instances destroyed as expected
Consider that I didnt spend much time developing this approach so take that into account if I miss trivial things.
I would really appreciate your opinions whether this approach can lead to successful tracking of created and destroyed instances or whether this is an dead end.
Thanks in advance

Techniques / design patterns for postponed delete action from inside of object member function?

Say I get into the situation I know that I will want an object deleted - when I am executing code part of a member function of said object. In other words after the function has returned whatever it is to return I want the object to be destructed. Do there exist techniques or design patterns suitable for this situation? I guess trying to call destructor from inside any object is not safe (or even allowed?)
Answers explaining why this is a bad idea and how to do instead will also be welcome.
I think you want a self containing object.
This can be implemented using an object that "holds" itself with a strong reference (a strong reference in C++ is called shared_ptr which is one of the smart pointers.
#include <iostream>
#include <chrono>
#include <memory>
#include <thread>
using namespace std;
class LengthyOperation {
private:
// Just a marker, for debugging, to differentiated between objects, and to indicate
// a released object if illogical value (or if run under Valgrind / AddressSanitizer)
int i;
// Privatise the constructor, so it can't be constructed without the static factory method.
LengthyOperation(): i(0) {}
LengthyOperation(int i): i(i) {}
// The "Holder", a reference to "this".
weak_ptr<LengthyOperation> holder;
public:
int getId() {
return i;
}
void executeTheOperation() {
// Strongify the weak "holder" reference
// So that no-one would release the object without ending of this function
shared_ptr<LengthyOperation> strongHolder = holder.lock();
// Simulate a "lengthy" operation, by pausing this thread for 1 second
std::this_thread::sleep_for(std::chrono::seconds(1));
cout << "Operation " << i << " ends" << "\n";
// Remove the reference to "this" in the holder.
holder.reset();
// Now, the "strong" reference which was temporary created (strongHolder)
// is removed when the scope ends. So that if it is held somewhere
// else, it will not be released until all other holders release it.
// Make sure you will NOT need it again here, because the object
// may be released from memory.
}
~LengthyOperation() {
cout << "Object with id: " << i << " Will destruct now" << "\n";
}
static shared_ptr<LengthyOperation> factory(int i = 0) {
shared_ptr<LengthyOperation> ret = shared_ptr<LengthyOperation>(new LengthyOperation(i));
// Make the weak pointer "holder", hold a reference to "this"
ret->holder = ret;
return ret;
}
};
int main() {
thread thr1([](){
weak_ptr<LengthyOperation> operation1Weak;
{
shared_ptr<LengthyOperation> operation1 = LengthyOperation::factory(3);
operation1Weak = operation1;
operation1->executeTheOperation();
cout << "Still there is a strong reference: it refers to object with id "
<< operation1->getId() << "\n";
cout << "Releasing the strong reference" << "\n";
}
cout << "No strong reference: it is "
<< (operation1Weak.expired() ? "invalid" : "valid") << "\n";
});
// Wait for a relative long time, to give chance for all threads to end
// One could use "join" as a better approach.
std::this_thread::sleep_for(std::chrono::seconds(2));
// Detach the thread to avoid crashes
thr1.detach();
thread thr2([](){
// Make an operation, an execute it directly without putting any strong reference to
LengthyOperation::factory(5)->executeTheOperation();
});
std::this_thread::sleep_for(std::chrono::seconds(2));
thr2.detach();
thread thr3([](){
// Try to create the object, without executing the operation, to see what
// weakening the "holder" pointer have done.
weak_ptr<LengthyOperation> oper = LengthyOperation::factory(1);
cout << "The weak non-called is " << (oper.expired() ? "expired" : "valid") << "\n";
});
std::this_thread::sleep_for(std::chrono::seconds(1));
thr3.detach();
return 0;
}
It is like calling "delete" in the executeTheOperation, but somewhat safer, by ensuring no other object is needing it.
Also using RAII is better, but this puts the responsibility on the "caller"'s hand. Who instantiated the object, must release it.
(This answer is refined after the comment saying that strong "holder" reference would cause a memory leak if you didn't call the executeTheOperation, one should design his code to be self-correcting if its user couldn't call it correctly)
What you are describing is the whole basis of Resource Acquisition is Initialization (RAII for short). In short, a handler object will hold and own the memory that you allocated, and this held memory is tied to the lifetime of the holder. That means when the holder object goes away, the resource it is carrying is also properly destroyed.
An example of this would be along the following:
class Class { /* definition */ };
int doOperation(/* arguments */) {
// code
// this 'smart pointer' contains an object of type Class
// you create an object of type Class via dynamic allocation and then it is stored within the ptr object
// this will hold the memory until the end of the function
std::unique_ptr<Class> ptr = std::make_unique<Class>(/*arguments to pass to the object*/);
// use ptr
// assign a return value to returnValue
return returnValue;
// as the function ends, the object ptr is automatically destroyed that in turn will
// automatically delete the memory of object Class it held
}
This use of std::unique_ptr is an example of the RAII pattern. Other smart pointers, like std::shared_ptr, also implement this pattern.

Using a variable that was defined in an if statement before

int main() {
if(i = 0) {
myclass1 a = "Example1";
}
else {
myclass2 a = "Example2";
}
cout << a << endl;
}
I know a way to do this is by defining it outside the block but what if I have not decided what type a is before checking the condition of i?
If you are able to use c++17 you can use the std::variant or std::any in case your types haven't common base class. These classes are type-safe containers for any or specified types. An example with std::variant can be the following:
#include <iostream>
#include <string>
#include <variant>
int main() {
bool input = false;
std::cin >> input;
std::variant<int, long, double, std::string> myVariant;
if(input)
myVariant = "Example1";
else
myVariant = 3.14;
std::visit([](auto&& arg) { std::cout << arg << std::endl; }, myVariant);
}
Instead of c++17 you also can use the boost::variant or the boost::any.
C++ is a statically typed language, and requires the type of variables being used in the code to be known at compile time.
There's no way to write a C++ program where a statement like std::cout << a; is compiled and the type of a is not known until run-time.
For that you need a dynamically typed language, like for example Python or JavaScript.
int main() {
auto call = [](auto a) {
std::cout << a << std::endl;
};
if(i = 0)
call(myclass1 { "Example1" });
else
call(myclass2 { "Example2" });
}
You could try polymorphism.
Assuming myclass1 and myclass2 "implement" a class called myclass, you can do something like this:
int main() {
myclass*a;
if (i=0) {
a = new myclass1("Example1");
} else {
a = new myclass2("Example2");
}
cout<<*a<<endl;
}
If you want to actively use the type myclass1 or myclass2 later on, you can use dynamic_cast, but depending on your needs and what behaviour you implement in your inherited classes and your base class, thay may not be necessary.
Note I use a raw pointer here, since it's a short-lived object and it's clear the program has ended. I encourage you to read about smart pointers and use them appropriately to avoid memory leaks. Beware memory leaks in some platforms persist until after a reboot, it may be needed to manually free (delete) allocated memory. More about all that here.
This definitively calls for polymorphism, and optionally, if you want to have it a bit more elegant, the factory pattern. The factory pattern is no magic, it just hides the if within a nice wrapper.
Why not another approach, such as e.g. std::variant which is basically a union in disguise? Well, it's nice if you are able to store different kinds of things, or even any kind (std::any) under the same name, but it is not very useful since you also want to do something meaningful with the object. If you want to do completely different, unrelated things, then you can as well have different objects scoped by the if blocks (and with completely different code). If, however, you want to do the same or similar things on different objects, then they (usually) need to be the same or a related type.
Different types typically do not have the same data members or the same publicly accessible member functions. So, doing the same thing on a source code level with different types typically doesn't work (except by coincidence).
But if two classes do have identical subsets on their interfaces, and you want to be able to do it in one or the other way interchangeably, then inheriting from a base class is the most natural and idiomatic thing to do. That's what polymorphism was invented for. Use the idiomatic thing.
(You can get the same net effect of calling functions with the same name on different, unrelated types via a template helper, and presumed that the names that you use exist, that will just work, but it's not nearly as good style, and it causes huge bloat by instanciating the function twice).
I'll try to give you a practical answer that assumes you're used to doing this sort of thing in JavaScript or something and just trying to write code in C++.
First, you should understand that in C++, cout << a. Can actually call a completely different method depending on the type of a. For that reason, it doesn't make any sense to write cout << a when you don't know anything about that type. In fact, you can't do anything at all with a unless you know enough about the type for C++ to decide which method or operator you want to invoke.
If both of your classes have an acceptable common base, then you could do something this:
int main() {
base_class *pa;
my_class1 a1;
my_class2 a2;
if(i = 0) {
a1 = "Example1";
pa = &a1;
}
else {
a2 = "Example2";
pa = &a2;
}
cout << *pa << endl;
}
Note that when you write cout << *pa, you are not necessarily calling the same method that cout << a would use. In the first case you are calling a method that knows how to output all subclasses of base_class, while in the second case you may be calling a method that was written specifically for myclass1 or myclass2.
When there is no acceptable base class, then we just don't write code like that in C++:
int main() {
if(i = 0) {
myclass1 a = "Example1";
cout << a << endl;
}
else {
myclass2 a = "Example2";
cout << a << endl;
}
}
Remember that the two methods being called in these cases can be completely different methods. It's exactly like calling cout.printClass1(a) vs. cout.printClass2(a). C++ lets you use the same name for completely different methods when it can figure out which one you want to call based on the argument types.
JavaScript doesn't have any magic that could automatically choose between printClass1 and printClass2 when you write cout.callWhatever(a), and neither does C++. In both languages, if you have to call completely different methods for myclass1 vs. myclass2, then you write different calls.
I had such code myself, when I was in fact trying different variations of the same code. Then I realized the best option would be to use a preprocessor #if and it solved my problem:
#define VARIATION 2
...
#if VARIATION == 1
myclass1 a = "Example1";
#else
myclass2 a = "Example2";
#endif
I know it probably doesn't solve yours, but at least it is a workaround.
If it is this specific problem I would think that this would be much more easier
int main(){
if(i == 0) //You wrote i=0 !! silly mistake
std::cout << myclass1("Example1");
else
std::cout << myclass2("Example2");
}
or you can choose
template<class T>
void foo(T out)
{
std::cout << out;
}
int main()
{
if( i==0 )
foo(myclass1("ex1"));
else
foo(myclass2("ex2"));
}
else
this is the way to go
And I would advise against using cout here as it may not have overloads to accept your user defined class.

Is it necessary to clean up stack contents?

We are under a PCI PA-DSS certification and one of its requirements is to avoid writing clean PAN (card number) to disk. The application is not writing such information to disk, but if the operating system (Windows, in this case) needs to swap, the memory contents is written to page file. Therefore the application must clean up the memory to prevent from RAM capturer services to read sensitive data.
There are three situations to handle:
heap allocation (malloc): before freeing the memory, the area can be cleaned up with memset
static or global data: after being used, the area can be cleaned up using memset
local data (function member): the data is put on stack and is not accessible after the function is finished
For example:
void test()
{
char card_number[17];
strcpy(card_number, "4000000000000000");
}
After test executes, the memory still contains the card_number information.
One instruction could zero the variable card_number at the end of test, but this should be for all functions in the program.
memset(card_number, 0, sizeof(card_number));
Is there a way to clean up the stack at some point, like right before the program finishes?
Cleaning the stack right when the program finishes might be too late, it could have already been swapped out during any point at its runtime. You should keep your sentitive data only in memory locked with VirtualLock so it does not get swapped out. This has to happen before said sensitive data is read.
There is a small limit on how much memory you can lock like this so you can propably not lock the whole stack and should avoid storing sensitive data on the stack at all.
I assume you want to get rid of this situation below:
#include <iostream>
using namespace std;
void test()
{
char card_number[17];
strcpy(card_number, "1234567890123456");
cout << "test() -> " << card_number << endl;
}
void test_trash()
{
// don't initialize, so get the trash from previous call to test()
char card_number[17];
cout << "trash from previous function -> " << card_number << endl;
}
int main(int argc, const char * argv[])
{
test();
test_trash();
return 0;
}
Output:
test() -> 1234567890123456
trash from previous function -> 1234567890123456
You CAN do something like this:
#include <iostream>
using namespace std;
class CardNumber
{
char card_number[17];
public:
CardNumber(const char * value)
{
strncpy(card_number, value, sizeof(card_number));
}
virtual ~CardNumber()
{
// as suggested by #piedar, memset_s(), so the compiler
// doesn't optimize it away.
memset_s(card_number, sizeof(card_number), 0, sizeof(card_number));
}
const char * operator()()
{
return card_number;
}
};
void test()
{
CardNumber cardNumber("1234567890123456");
cout << "test() -> " << cardNumber() << endl;
}
void test_trash()
{
// don't initialize, so get the trash from previous call to test()
char card_number[17];
cout << "trash from previous function -> " << card_number << endl;
}
int main(int argc, const char * argv[])
{
test();
test_trash();
return 0;
}
Output:
test() -> 1234567890123456
trash from previous function ->
You can do something similar to clean up memory on the heap or static variables.
Obviously, we assume the card number will come from a dynamic source instead of the hard-coded thing...
AND YES: to explicit answer the title of your question: The stack will not be cleaned automatically... you have to clean it by yourself.
I believe it is necessary, but this is only half of the problem.
There are two issues here:
In principle, nothing prevents the OS from swapping your data while you are still using it. As pointed out in the other answer, you want VirtualLock on windows and mlock on linux.
You need to prevent the optimizer from optimizing out the memset. This also applies to global and dynamically allocated memory. I strongly suggest to take a look at cryptopp SecureWipeBuffer.
In general, you should avoid to do it manually, as it is an error-prone procedure. Instead, consider using a custom allocator or a custom class template for secure data that can be freed in the destructor.
The stack is cleaned up by moving the stack pointer, not by actually popping values from it. The only mechanics are to pop the return into the appropriate registers. You must do it all manually. Also -- volatile can help you avoid optimizations on a per variable basis. You can manually pop the stack clean, but -- you need assembler to do that -- and it is not so simple to start manipulating the stack -- it is not actually your resource -- the compiler owns it as far as you are concerned.