Clean-up a timed-out future - c++

I need to run a function with a timeout. If it didn't return within the given timeout, I need to discard it and fallback to a different method.
Following is a (greatly) simplified sample code to highlight the problem. (In reality, this is an always running, highly available application. There I first read from the cache and try to read from the database only if the cache has stale data. However, if the database query took long, I need to continue with the stale data.)
My question is, in the case where the future read timed out, do I have to handle the clean-up of the future separately (i.e. keep a copy and check if it is ready time-to-time)? or can I simply ignore it (i.e. keep the code as is).
/* DB query can be time-consuming, but the result is fresh */
string readFromDatabase(){
// ...
// auto dbValue = db.query("select name from users where id=" + _id);
// ...
return dbValue;
}
/* Cache query is instant, but the result could be stale */
string readFromLocalCache(){
// ...
// auto cachedVal = _cache[_id];
// ...
return cachedVal;
}
int getValue(){
// Idea:
// - Try reading from the database.
// - If the db query didn't return within 1 second, fallback to the other method.
using namespace std::chrono_literals;
auto fut = std::async(std::launch::async, [&](){ return readFromDatabase(); });
switch (fut.wait_for(1s)){
case std::future_status::ready: // query returned within allotted time
{
auto freshVal = fut.get();
// update cache
return freshVal;
}
case std::future_status::timeout: // timed out, fallback ------ (*)
{
break;
}
case std::future_status::deferred: // should not be reached
{
break;
}
}
return readFromLocalCache();
// quetion? what happens to `fut`?
}

My question is, in the case where the future read timed out, do I have to handle the clean-up of the future separately (i.e. keep a copy and check if it is ready time-to-time)? or can I simply ignore it (i.e. keep the code as is).
From my personal perspective, it depends on what you want. Under your current (minimal) implementation, the getValue function will be blocked by the future's destructor(see cppreference page and some SO questions).
If you do not want the blocking behavior, there are some solutions, as proposed in this question, like:
move the future to some outside scope
use a detached executor and some handy code/data structure to handle the return status
see if you can replace the future with some timeout support I/O operations like select/poll
etc.

Related

Ending a loop on time expiration

I have a long computation in a loop, which I need to end prematurely if allowed compute time expires (and return a partially computed result). I plan to do it via SIGALARM handler and a timer:
// Alarm handler will set it to true.
bool expired = false;
int compute ()
{
int result;
// Computation loop:
for (...) {
// Computation here.
if (expired)
break;
}
return result;
}
My question is: how to correctly define the expired variable (volatile bool or std::atomic<bool>, or std::sig_atomic_t, etc), how to set it true in the signal handler (just an assignment or atomic operation), and how to check its value in the compute function?
This is a single-threaded C++17 code...
If you aren't using multiple threads, you don't need an atomic operation. Just set the global variable expired = true in the signal handler.
EDIT: as #Frank demonstrated below, the compiler might optimize it out. You can avoid this by declaring expired as volatile bool expired = false;
Unless one iteration takes a considerable amount of time, I would suggest that you don't bother with signals and simply check the gettimeofday at each iteration.

Can't save CDocument in a worker thread -- object is destroyed from memory before thread starts

Overview
I need to save a CDocument in a background worker thread. There is a point in our MFC application which prompts the user to save before continuing. Normally, they are able to continue without saving, and there is no problem. However, occasionally, we need that document later in the process, so if the user clicks "No", we want to save a temp version of the file in the background without making the user wait for the save to continue.
Problem
When I launch AfxBeginThread(SaveDocumentThread, &threadInput) the &threadinput has been cleared from memory before the SaveDocumentThread starts.
Code
BOOL SPackagerDoc::OnSaveDocument( IN LPCTSTR lpszPathName)
{
ProcessDocumentThreadInput threadInput(this, lpszPathName);
// Temp Save Mode
if (m_bTempMode)
{
m_TempSaveThread = AfxBeginThread(SaveDocumentThread, &threadInput);
// This fixes the problem, but is considered unstable
// if (m_TempSaveThread->m_hThread)
// WaitForSingleObject(m_TempSaveThread->m_hThread, 500);
return TRUE;
}
// Normal save mode
SFileLoadingDialog loadingDialog(SFileLoadingDialog::SAVE, lpszPathName, SaveDocumentThread, &threadInput);
BOOL result = (BOOL)loadingDialog.DoModal();
return result;
}
StUInt32 SPackagerDoc::SaveDocumentThread(IN StVoid* pParam)
{
ProcessDocumentThreadInput* input = (ProcessDocumentThreadInput*)pParam;
ASSERT_NOT_NULL(input);
ASSERT_NOT_NULL(input->pPackager);
ASSERT_NOT_NULL(input->pszPathName);
CString path_name(input->pszPathName);
BOOL result = input->pPackager->SPackagerDocBase::OnSaveDocument(path_name);
return result;
}
If I uncommend WaitForSingleObject(..., 500); then the thread starts, all the information is present, and there are no errors. But if I remove those lines then in SaveDocumentThread input is NULL and all data is zeros or garbage.
Is there a way to ensure the SaveDocumentThread has started before moving on. IE, wait for thread to start, but not for a specified amount of time (500 ms). It may be that 500 ms is not a sufficient wait time on some other computers.
Is there an "official" way to do this?
This is the issue of the scope of variable.
Following comments specified the scope of local variable threadInput.
ProcessDocumentThreadInput threadInput(this, lpszPathName); // <=== threadInput created
if (m_bTempMode)
{
m_TempSaveThread = AfxBeginThread(SaveDocumentThread, &threadInput);
// This fixes the problem, but is considered unstable
// if (m_TempSaveThread->m_hThread)
// WaitForSingleObject(m_TempSaveThread->m_hThread, 500);
return TRUE; // <=== threadInput destructed
}
Your workaround WaitForSingleObject() delays the destruction of the variable threadInput and you see the result.
To overcome the scope of local variable.
Store it in a class member variable.
Store it as a (better be smart) pointer and (better not to) handle it's destruction.
Edit:
As #Jabberwocky stated, function OnSaveDocument() might be called more than twice since it's called by background thread.
I'll suggest to refactor the save() function out and let if and else to call them seperately.
As others have pointed out, the problem is the lifetime of threadInput ends before the thread begins.
You can dynamically allocate the instance of ProcessDocumentThreadInput and pass the pointer to that instance to the thread.
auto* threadInput = new ProcessDocumentThreadInput(this, lpszPathName);
...
AfxBeginThread(SaveDocumentThread, threadInput);
However, in this case, the responsibility to release the memory gets messy.
Since you put C++11 tag in your question, you might want to make use of std::shared_ptr or std::unique_ptr and pass it to the thread, which would land you in using std::thread instead of AfxBeginThread. (BTW, I have no experience using MFC.)
BOOL SPackagerDoc::OnSaveDocument( IN LPCTSTR lpszPathName)
{
...
std::thread t(SaveDocumentThread, std::make_unique<ProcessDocumentThreadInput>(this, lpszPathName));
...
}
...
StUInt32 SaveDocumentThread(std::unique_ptr<ProcessDocumentThreadInput>&& threadInput)
{
...
}

I don't understand how can optimistic concurrency be implemented in C++11

I'm trying to implement a protected variable that does not use locks in C++11. I have read a little about optimistic concurrency, but I can't understand how can it be implemented neither in C++ nor in any language.
The way I'm trying to implement the optimistic concurrency is by using a 'last modification id'. The process I'm doing is:
Take a copy of the last modification id.
Modify the protected value.
Compare the local copy of the modification id with the current one.
If the above comparison is true, commit the changes.
The problem I see is that, after comparing the 'last modification ids' (local copy and current one) and before commiting the changes, there is no way to assure that no other threads have modified the value of the protected variable.
Below there is a example of code. Lets suppose that are many threads executing that code and sharing the variable var.
/**
* This struct is pretended to implement a protected variable,
* but using optimistic concurrency instead of locks.
*/
struct ProtectedVariable final {
ProtectedVariable() : var(0), lastModificationId(0){ }
int getValue() const {
return var.load();
}
void setValue(int val) {
// This method is not atomic, other thread could change the value
// of val before being able to increment the 'last modification id'.
var.store(val);
lastModificationId.store(lastModificationId.load() + 1);
}
size_t getLastModificationId() const {
return lastModificationId.load();
}
private:
std::atomic<int> var;
std::atomic<size_t> lastModificationId;
};
ProtectedVariable var;
/**
* Suppose this method writes a value in some sort of database.
*/
int commitChanges(int val){
// Now, if nobody has changed the value of 'var', commit its value,
// retry the transaction otherwise.
if(var.getLastModificationId() == currModifId) {
// Here is one of the problems. After comparing the value of both Ids, other
// thread could modify the value of 'var', hence I would be
// performing the commit with a corrupted value.
var.setValue(val);
// Again, the same problem as above.
writeToDatabase(val);
// Return 'ok' in case of everything has gone ok.
return 0;
} else {
// If someone has changed the value of var while trying to
// calculating and commiting it, return error;
return -1;
}
}
/**
* This method is pretended to be atomic, but without using locks.
*/
void modifyVar(){
// Get the modification id for checking whether or not some
// thread has modified the value of 'var' after commiting it.
size_t currModifId = lastModificationId.load();
// Get a local copy of 'var'.
int currVal = var.getValue();
// Perform some operations basing on the current value of
// 'var'.
int newVal = currVal + 1 * 2 / 3;
if(commitChanges(newVal) != 0){
// If someone has changed the value of var while trying to
// calculating and commiting it, retry the transaction.
modifyVar();
}
}
I know that the above code is buggy, but I don't understand how to implement something like the above in a correct way, without bugs.
Optimistic concurrency doesn't mean that you don't use the locks, it merely means that you don't keep the locks during most of the operation.
The idea is that you split your modification into three parts:
Initialization, like getting the lastModificationId. This part may need locks, but not necessarily.
Actual computation. All expensive or blocking code goes here (including any disk writes or network code). The results are written in such a way that they not obscure previous version. The likely way it works is by storing the new values next to the old ones, indexed by not-yet-commited version.
Atomic commit. This part is locked, and must be short, simple, and non blocking. The likely way it works is that it just bumps the version number - after confirming, that there was no other version commited in the meantime. No database writes at this stage.
The main assumption here is that computation part is much more expensive that the commit part. If your modification is trivial and the computation cheap, then you can just use a lock, which is much simpler.
Some example code structured into these 3 parts could look like this:
struct Data {
...
}
...
std::mutex lock;
volatile const Data* value; // The protected data
volatile int current_value_version = 0;
...
bool modifyProtectedValue() {
// Initialize.
int version_on_entry = current_value_version;
// Compute the new value, using the current value.
// We don't have any lock here, so it's fine to make heavy
// computations or block on I/O.
Data* new_value = new Data;
compute_new_value(value, new_value);
// Commit or fail.
bool success;
lock.lock();
if (current_value_version == version_on_entry) {
value = new_value;
current_value_version++;
success = true;
} else {
success = false;
}
lock.unlock();
// Roll back in case of failure.
if (!success) {
delete new_value;
}
// Inform caller about success or failure.
return success;
}
// It's cleaner to keep retry logic separately.
bool retryModification(int retries = 5) {
for (int i = 0; i < retries; ++i) {
if (modifyProtectedValue()) {
return true;
}
}
return false;
}
This is a very basic approach, and especially the rollback is trivial. In real world example re-creating the whole Data object (or it's counterpart) would be likely infeasible, so the versioning would have to be done somewhere inside, and the rollback could be much more complex. But I hope it shows the general idea.
The key here is acquire-release semantics and test-and-increment. Acquire-release semantics are how you enforce an order of operations. Test-and-increment is how you choose which thread wins in case of a race.
Your problem therefore is the .store(lastModificationId+1). You'll need .fetch_add(1). It returns the old value. If that's not the expected value (from before your read), then you lost the race and retry.
If I understand your question, you mean to make sure var and lastModificationId are either both changed, or neither is.
Why not use std::atomic<T> where T would be structure that hold both the int and the size_t?
struct VarWithModificationId {
int var;
size_t lastModificationId;
};
class ProtectedVariable {
private std::atomic<VarWithModificationId> protectedVar;
// Add your public setter/getter methods here
// You should be guaranteed that if two threads access protectedVar, they'll each get a 'consistent' view of that variable, but the setter will need to use a lock
};
Оptimistic concurrency is used in database engines when it's expected that different users will access the same data rarely. It could go like this:
First user reads data and timestamp. Users handles the data for some time, user checks if the timestamp in the DB hasn't changes since he read the data, if it doesn't then user updates the data and the timestamp.
But, internally DB-engine uses locks for update anyway, during this lock it checks if timestamp has been changed and if it hasn't been, engine updates the data. Just time for which data is locked smaller than with pessimistic concurrency. And you also need to use some kind of locking.

Ensuring that only one instance of a function is running?

I'm just getting into concurrent programming. Most probably my issue is very common, but since I can't find a good name for it, I can't google it.
I have a C++ UWP application where I try to apply MVVM pattern, but I guess that the pattern or even being UWP is not relevant.
First, I have a service interface that exposes an operation:
struct IService
{
virtual task<int> Operation() = 0;
};
Of course, I provide a concrete implementation, but it is not relevant for this discussion. The operation is potentially long-running: it makes an HTTP request.
Then I have a class that uses the service (again, irrelevant details omitted):
class ViewModel
{
unique_ptr<IService> service;
public:
task<void> Refresh();
};
I use coroutines:
task<void> ViewModel::Refresh()
{
auto result = co_await service->Operation();
// use result to update UI
}
The Refresh function is invoked on timer every minute, or in response to a user request. What I want is: if a Refresh operation is already in progress when a new one is started or requested, then abandon the second one and just wait for the first one to finish (or time out). In other words, I don't want to queue all the calls to Refresh - if a call is already in progress, I prefer to skip a call until the next timer tick.
My attempt (probably very naive) was:
mutex refresh;
task<void> ViewModel::Refresh()
{
unique_lock<mutex> lock(refresh, try_to_lock);
if (!lock)
{
// lock.release(); commented out as harmless but useless => irrelevant
co_return;
}
auto result = co_await service->Operation();
// use result to update UI
}
Edit after the original post: I commented out the line in the code snippet above, as it makes no difference. The issue is still the same.
But of course an assertion fails: unlock of unowned mutex. I guess that the problem is the unlock of mutex by unique_lock destructor, which happens in the continuation of the coroutine and on a different thread (other than the one it was originally locked on).
Using Visual C++ 2017.
use std::atomic_bool:
std::atomic_bool isRunning = false;
if (isRunning.exchange(true, std::memory_order_acq_rel) == false){
try{
auto result = co_await Refresh();
isRunning.store(false, std::memory_order_release);
//use result
}
catch(...){
isRunning.store(false, std::memory_order_release);
throw;
}
}
Two possible improvements : wrap isRunning.store in a RAII class and use std::shared_ptr<std::atomic_bool> if the lifetime if the atomic_bool is scoped.

Concurrently processing data. What do I need to watch out for?

I have a routine that is meant to load and parse data from a file. There is a possibility that the data from the same file might need to be retrieved from two places at once, i.e. during a background caching process and from a user request.
Specifically I am using C++11 thread and mutex libraries. We compile with Visual C++ 11 (aka 2012), so are limited by whatever it lacks.
My naive implementation went something like this:
map<wstring,weak_ptr<DataStruct>> data_cache;
mutex data_cache_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file_path) {
lock_guard<mutex> lock(data_cache_mutex);
auto cache_iter = data_cache.find(file_path);
if (cache_iter != end(data_cache)) {
auto data_ptr = cache_iter->second.lock();
if (data_ptr)
return data_ptr;
// reference died, remove it
data_cache.erase(cache_iter);
}
auto data_ptr = ParseDataFile(file_path);
if (data_ptr)
data_cache.emplace(make_pair(file_path, data_ptr));
return data_ptr;
}
My goals were two-fold:
Allow multiple threads to load separate files concurrently
Ensure that a file is only processed once
The problem with my current approach is that it doesn't allow concurrent parsing of multiple files at all. If I understand what will happen correctly, they're each going to hit the lock and end up processing linearly, one thread at a time. It may change from run to run the order which the threads pass through the lock first, but the end result is the same.
One solution I've considered was to create a second map:
map<wstring,mutex> data_parsing_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
/* etc. */
data_parsing_mutex.erase(file_path);
}
But now I have to be concerned with how data_parsing_mutex is being updated. So I guess I need another mutex?
map<wstring,mutex> data_parsing_mutex;
mutex data_parsing_mutex_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
unique_lock<mutex> super_lock(data_parsing_mutex_mutex);
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
super_lock.unlock();
/* etc. */
super_lock.lock();
data_parsing_mutex.erase(file_path);
}
In fact, looking at this, it's not going to avoid necessarily double-processing a file if it hasn't been completed by the background process when the user requests it, unless I check the cache yet again.
But by now my spidey senses are saying There must be a better way. Is there? Would futures, promises, or atomics help me at all here?
From what you described, it sounds like you're trying to do a form of lazy initialization of the DataStruct using a thread pool, along with a reference counted cache. std::async should be able to provide a lot of the dispatch and synchronization necessary for something like this.
Using std::async, the code would look something like this...
map<wstring,weak_ptr<DataStruct>> cache;
map<wstring,shared_future<shared_ptr<DataStruct>>> pending;
mutex cache_mutex, pending_mutex;
shared_ptr<DataStruct> ParseDataFromFile(wstring file) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file) {
shared_future<weak_ptr<DataStruct>> pf;
shared_ptr<DataStruct> ce;
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (!(ci == cache.end() || ci->second.expired()))
return ci->second.lock();
}
{
lock_guard(pending_mutex);
auto fi = pending.find(file);
if (fi == pending.end() || fi.second.get().expired()) {
pf = async(ParseDataFromFile, file).share();
pending.insert(fi, make_pair(file, pf));
} else {
pf = pi->second;
}
}
pf.wait();
ce = pf.get();
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (ci == cache.end() || ci->second.expired())
cache.insert(ci, make_pair(file, ce));
}
{
lock_guard(pending_mutex);
auto pi = pending.find(file);
if (pi != pending.end())
pending.erase(pi);
}
return ce;
}
This can probably be optimized a bit, but the general idea should be the same.
On a typical computer there is little point in trying to load files concurrently, since disk access will be the bottleneck. Instead, it's better to have a single thread load files (or use asynchronous I/O) and dish out the parsing to a thread pool. Then store the results in a shared container.
Regarding preventing double work, you should consider if this is really necessary. If you are only doing this out of premature optimization, you'd probably make users happier by focussing on making the program responsive, rather than efficient. That is, make sure the user gets what they ask for quickly, even if it means doing double work.
OTOH, if there is a technical reason for not parsing a file twice, you can keep track of the status of each file (loading, parsing, parsed) in the shared container.