I have the following simplified code at which while writing I thought was fine, but I have seem some random access violations.
Initially I thought as long as the arguments passed to async were on the stack, and not temporary variables, the code would be safe. I also thought that filename and extra data would destruct/considered not there at the brace where they leave scope.
It did some more research and read about the 'as if' principle that apparently compilers use for optimisation. I've often seen stack variables being optimised away in the debugger right after they have been used too.
My question here is basically, is it guaranteed that those stack variables will be around for the entire duration of the async function running. The .get() call on the future obviously synchronises the call before the two stack variables leave scope.
My current thinking is that it's not thread safe as the compiler can't see the variables being used after the call to the function, and therefore think it is safe to remove them. I can easily change the code to eliminate the problem (if there is one), but I really want to understand this.
The randomness of the AV, occurring more on some computers than others suggests it is a problem, and the scheduling order dictates whether this is a problem or not.
Any help is much appreciated.
#include <future>
#include <fstream>
#include <string>
#include <iostream>
int write_some_file(const char * const filename, int * extra_result)
{
std::ofstream fs;
try {
fs.open(filename);
} catch (std::ios_base::failure e) {
return 1;
}
fs << "Hello";
*extra_result = 1;
return 0;
}
int main(void)
{
std::string filename {"myffile.txt"};
int extraResult = 0;
auto result = std::async(std::launch::async, write_some_file, filename.c_str(), &extraResult);
// Do some other work
// ...
int returnCode = result.get();
std::cout << returnCode << std::endl;
std::cout << extraResult << std::endl;
return 0;
}
Related
We are under a PCI PA-DSS certification and one of its requirements is to avoid writing clean PAN (card number) to disk. The application is not writing such information to disk, but if the operating system (Windows, in this case) needs to swap, the memory contents is written to page file. Therefore the application must clean up the memory to prevent from RAM capturer services to read sensitive data.
There are three situations to handle:
heap allocation (malloc): before freeing the memory, the area can be cleaned up with memset
static or global data: after being used, the area can be cleaned up using memset
local data (function member): the data is put on stack and is not accessible after the function is finished
For example:
void test()
{
char card_number[17];
strcpy(card_number, "4000000000000000");
}
After test executes, the memory still contains the card_number information.
One instruction could zero the variable card_number at the end of test, but this should be for all functions in the program.
memset(card_number, 0, sizeof(card_number));
Is there a way to clean up the stack at some point, like right before the program finishes?
Cleaning the stack right when the program finishes might be too late, it could have already been swapped out during any point at its runtime. You should keep your sentitive data only in memory locked with VirtualLock so it does not get swapped out. This has to happen before said sensitive data is read.
There is a small limit on how much memory you can lock like this so you can propably not lock the whole stack and should avoid storing sensitive data on the stack at all.
I assume you want to get rid of this situation below:
#include <iostream>
using namespace std;
void test()
{
char card_number[17];
strcpy(card_number, "1234567890123456");
cout << "test() -> " << card_number << endl;
}
void test_trash()
{
// don't initialize, so get the trash from previous call to test()
char card_number[17];
cout << "trash from previous function -> " << card_number << endl;
}
int main(int argc, const char * argv[])
{
test();
test_trash();
return 0;
}
Output:
test() -> 1234567890123456
trash from previous function -> 1234567890123456
You CAN do something like this:
#include <iostream>
using namespace std;
class CardNumber
{
char card_number[17];
public:
CardNumber(const char * value)
{
strncpy(card_number, value, sizeof(card_number));
}
virtual ~CardNumber()
{
// as suggested by #piedar, memset_s(), so the compiler
// doesn't optimize it away.
memset_s(card_number, sizeof(card_number), 0, sizeof(card_number));
}
const char * operator()()
{
return card_number;
}
};
void test()
{
CardNumber cardNumber("1234567890123456");
cout << "test() -> " << cardNumber() << endl;
}
void test_trash()
{
// don't initialize, so get the trash from previous call to test()
char card_number[17];
cout << "trash from previous function -> " << card_number << endl;
}
int main(int argc, const char * argv[])
{
test();
test_trash();
return 0;
}
Output:
test() -> 1234567890123456
trash from previous function ->
You can do something similar to clean up memory on the heap or static variables.
Obviously, we assume the card number will come from a dynamic source instead of the hard-coded thing...
AND YES: to explicit answer the title of your question: The stack will not be cleaned automatically... you have to clean it by yourself.
I believe it is necessary, but this is only half of the problem.
There are two issues here:
In principle, nothing prevents the OS from swapping your data while you are still using it. As pointed out in the other answer, you want VirtualLock on windows and mlock on linux.
You need to prevent the optimizer from optimizing out the memset. This also applies to global and dynamically allocated memory. I strongly suggest to take a look at cryptopp SecureWipeBuffer.
In general, you should avoid to do it manually, as it is an error-prone procedure. Instead, consider using a custom allocator or a custom class template for secure data that can be freed in the destructor.
The stack is cleaned up by moving the stack pointer, not by actually popping values from it. The only mechanics are to pop the return into the appropriate registers. You must do it all manually. Also -- volatile can help you avoid optimizations on a per variable basis. You can manually pop the stack clean, but -- you need assembler to do that -- and it is not so simple to start manipulating the stack -- it is not actually your resource -- the compiler owns it as far as you are concerned.
I wrote a very simple code to reproduce my problem.
#include <iostream>
#include "tools.h" //contains s_sleep()
#include <thread>
using namespace std;
void change( int *i)
{
while (true)
{
*i = 4356;
}
}
int main()
{
int v=3;
cout << v <<endl;
thread t(change, &v);
s_sleep(1); //sleep one second
cout << v << endl;
t.join();
}
The output is 3, and after a second 3 again. But when I change one line to
//while ( true )
I receive 3, and a second later 4356.
How can that be?
Hope somebody can help.
Please specify what compiler you are using. I am using Microsoft Visual C++ compiler, and with my visual studio, I see for both time, the output is 3 followed by 4356.
Here is the code I ran on my computer.
#include <ctime>
#include <thread>
#include <iostream>
using namespace std;
void change(int *i) {
while (true) { // commented out this later, the result is same.
*i = 4356;
}
}
int main() {
clock_t tstart = clock();
int v = 3;
cout << v << endl;
thread t(change, &v);
while(double(clock() - tstart)/CLOCKS_PER_SEC < 3.0) { // Instead of your s_sleep
int x = 1; // Just a dummy instruction
}
cout << v << endl;
t.join();
return 0;
}
The explanation to my result is that the thread "t" does not know anything about the variable "v". It just gets a pointer of type int and it edits the value at the pointers location directly to the memory. So, when the main(first) thread
again accesses the variable "v", it simply reads the memory assigned to "v" and prints what it gets.
And also, what code is in the "tools.h"? Does it have anything to do with the variable "v".
If it doesn't, then it must be a compiler variance(Your compiler may be different than mine, maybe gcc or g++?). That means, your compiler must have cached(or something like that) the variable for faster access. And as in the current thread, the variable has not been changed, whenever it is accessed, compiler gives the old value(which the compiler sees as unchanged) of the variable. (I AM NOT SURE ABOUT THIS)
This might also be due to caching. You are first reading a variable frome one thread, then manipulating the variable from another thread and reading it again from the first thread. The compiler cannot know that it changed in the meantime.
To safely do this "v" has to be declared volatile.
A followup with reference to the upcoming feature in C++20 from n3721 "Improvements to std::future and related APIs"
#include <iostream>
#include <future>
#include <exception>
using std::cout;
using std::endl;
int main() {
auto prom_one = std::promise<std::future<int>>{};
auto fut_one = prom_one.get_future();
std::thread{[prom_one = std::move(prom_one)]() mutable {
auto prom_two = std::promise<int>{};
auto fut_two = prom_two.get_future();
prom_two.set_value(1);
prom_one.set_value(std::move(fut_two));
}}.detach();
auto inner_fut_unwrap = fut_one.unwrap();
auto inner_fut_get = fut_one.get();
auto th_one = std::thread{[&]() {
cout << inner_fut_unwrap.get() << endl;
}};
auto th_two = std::thread{[&]() {
cout << inner_fut_get.get() < endl;
}};
th_one.join();
th_two.join();
return 0;
}
In the code above, which will win the race to print 1? th_one or th_two?
To clarify what race I was talking about, there are two (potential) racy situations here, the latter being the one that is really confusing me.
The first is in the setting and unwrapping of the inner future; the unwrapped future should act as a suitable proxy for the inner future even when the actual set_value has not been called on the inner future. So unwrap() must return a proxy that exposes a thread safe interface regardless of what happens on the other side.
The other situation is what happens to a get() from a future when a proxy for it already exists elsewhere, in this example inner_fut_unwrap is the proxy for inner_fut_get. In such a situation, which should win the race? The unwrapped future or the future fetched via a call to get() on the outer future?
This code makes me worried that there is some kind of misunderstanding about what futures and promises are, and what .get() does. It's also a bit weird that we have using namespace std; followed by a lot of std::.
Let's break it down. Here's the important part:
#include <iostream>
#include <future>
int main() {
auto prom_one = std::promise<std::future<int>>{};
auto fut_one = prom_one.get_future();
auto inner_fut_unwrap = fut_one.unwrap();
auto inner_fut_get = fut_one.get();
// Boom! throws std::future_error()
So neither thread "wins" the race, since neither thread actually gets a chance to run. Note from the document you linked, for .unwrap(), on p13:
Removes the outer-most future and returns a proxy to the inner future.
So the outer-most future, fut_one, is not valid. When you call .get(), it throws std::future_error1. There is no race.
1: Not guaranteed. Technically undefined behavior.
In c++11 standard we can declare variable in an unusual way. We can declare myVar as int(myVar); instead of int myVar. What is the point of this?
#include <iostream>
using namespace std;
int main() {
int(myVar);
myVar = 1000;
cout << myVar << endl;
return 0;
}
UPD Actually there is a certain reason why I asked this. What looked innocent just stopped to compile when we tried to port some code from MSVC C++03 to GCC C++11.
Here is an example
#include <iostream>
using namespace std;
struct MyAssert
{
MyAssert(bool)
{
}
};
#define ASSERT(cond) MyAssert(cond)
void func(void* ptr)
{
ASSERT(ptr); // error: declaration of ‘MyAssert ptr’
// shadows a parameter #define ASSERT(cond) MyAssert(cond)
}
int main()
{
func(nullptr);
return 0;
}
Sure. I can even do this:
int (main)()
{
}
Parentheses serve to group things in C and C++. They often do not carry additional meaning beyond grouping. Function calls/declarations/definitions are a bit special, but if you need convincing that we should not be allowed to omit them there, just look at Ruby, where parentheses are optional in function calls.
There is not necessarily a point to it. But sometimes being able to slap on some theoretically unnecessary parentheses helps make code easier to read. Not in your particular example, and not in mine, of course.
#include <typeinfo>
#include <iostream>
int main(void)
{
int *first_var[2];
int (*second_var)[2];
std::cout << typeid(first_var).name() << std::endl;
std::cout << typeid(second_var).name() << std::endl;
return 0;
}
Running this on my machine gives :
A2_Pi
PA2_i
The parenthesis in the declaration mostly serve the same purpose they do everywhere, group things that should be together regardless of the default priority order of the language.
Of course parenthesis with only one element inside is equivalent to just typing that element except in cases where parenthesis are mandatory (e.g function calls).
C++ does not break backward compatibility if it can help it.
The C that it was developed from had this syntax. So C++ inherited it.
A side effect of this backward compatibility are the vexing parse problems. They have not proved sufficiently vexing to justify breaking backward compatibility.
I have written a piece of code which writes either to console or to a file depending upon the boolean value set by user.
The code looks like this.
#include <iostream>
#include <fstream>
int main()
{
bool bDump;
std::cout<<"bDump bool"<<std::endl;
std::cin>>bDump;
std::ostream* osPtr;
std::ofstream files;
if(bDump)
{
files.open("dump.txt");
osPtr = &files;
}
else
{
osPtr = &std::cout;
}
std::ostream& stream = *osPtr;
stream<<"hello";
if(bDump)
{
files.close();
}
return 0;
}
Here I am creating a std::ostream pointer and depending upon boolean value I am assinging address of either an ofstream object or std::cout. My only concern here whether the file operation like open or close are done properly or not. As I am new to c++ please help me out. Also point out if any bad programming practice is being followed here.
Its correct and works.
The main thing I would do differently is not to explicitly call close() as this is done automatically by the destructor.
You can simplify your code slightly (and get rid of the pointer) with the ternary operator;
#include <iostream>
#include <fstream>
int main()
{
bool bDump;
std::cout << "bDump bool"<<std::endl;
std::cin >> bDump;
std::ofstream files;
std::ostream& stream = (bDump) ? (files.open("dump.txt"), files)
: std::cout;
stream<<"hello";
}
There's no potential leak. However, if an exception is thrown by
stream<<"hello";
then
files.close();
will never be called, but for your specific example of code there's no concern. ofstream's destructor happens to call close() for you.
You did everything fine, but there is no need for the close() at the end, because in C++ we rely on destructors to clean up for us, and std::ofstream has one which closes the file automatically.
You can also omit the return 0; statement at the bottom of main() in C++: 0 (success, really) will be returned by default.