Waiting Thread or Create New Thread? - c++

I have a decision to make regarding the way I code something, which is running on an embedded platform and am hoping there is a general "rule-of-thumb" that can be used in this case. Coding both my ideas and then benchmarking would obviously be the best way to go, but to get any meaningful, or rather accurate results out of this platform, in my particular case, would be quite tricky. I'm also sure that there may be others that are having the same question on their respective platforms, so I decided to ask it here. Please be kind, as I'm not very familiar with the threading library, so constructive feedback would be useful.
I have many threads (well, about 10-20 at maximum) all wanting to write to this hardware device. So I decided on using a simple ring-buffer consisting of 2 buffers (primary/secondary) of 8k each. This way each in-coming thread could be dealt with in a timely fashion. An arriving thread would obtain a mutex and write into the primary buffer and then release its mutex ready for the next thread. Now when the primary buffer is full, new incoming threads obviously switch to using the secondary buffer and then you start to write the primary buffer to the hardware device.
So the question really is... How best to write to the hardware device??? I'm thinking that there are two choices:
As soon as the buffer is full, create a new thread that does the write operation.
Signal a pre-created waiting worker-thread to do the write operation.
Both of the options seem to come with their respective pros/cons. Option 1 is the simplest to code and there are a number of ways to do this, but its effectiveness is dependent on how expensive it is to create/start the thread. The thread would be created, it would perform the write operation and then it would die. Option 2 however seems to be the most performant, but if you're going to have a reusable thread, you're going to need a mutex and a couple condition variables to control it. One to notify the thread that data is ready and another to ask for the thread to terminate when the program ends. Add to that a sprinkle of atomics for spurious wake-ups/missing notifications etc, and you've got quite an intricate solution to get right.
So what is the best method here? Are threads in general heavy to create/start or is this something that is completely platform dependent and benchmarking is the only way to know? Is there any benefit to using one method over the other that I've not thought about?
-- This is for the people not suffering from TL;DR syndrome --
I'm sure some of you have already wondered what happens if the secondary buffer becomes full before the write operation has finished? The answer in my case is fairly simple: this should never happen! Although the write operation is slow, it would never be slow enough such that the secondary buffer is filled before the write is complete. However, if someone is going to use this ring-buffer method, they must be prepared for this contingency. The way I thought about tackling this is to have a second mutex that is held during the write operation. This would mean that the thread that was due to write to the buffer would block until the write completed and the mutex was released.
Here's what I roughly ended up with after going with Option 2, but it seems awfully messy. I actually wanted to use promise/futures to avoid the spin-lock predicates on the condition variable, but couldn't think of a good way of moving a promise to an already created thread. Anyway... nice feedback is appreciated, bad-feedback, well, I'm not overly familiar with the threading library.
class Bar
{
public:
Bar(const size_t size) : buffer(new uint8_t[size]), buffer_size(size), used_size(0) {}
const size_t GetRemainingBufferSize(void) const { return buffer_size - used_size; }
const size_t GetUsedBufferSize(void) const { return used_size; }
const uint8_t* GetBuffer(void) const { return buffer.get(); }
const size_t GetBufferSize() const { return buffer_size; }
void ResetBuffer(void) { used_size = 0; }
void WriteIntoBuffer(const vector<uint8_t>& data)
{
std::copy(data.begin(), data.end(), buffer.get() + used_size);
used_size += data.size();
}
private:
std::unique_ptr<uint8_t[]> buffer;
size_t buffer_size;
size_t used_size;
};
class Foo
{
public:
Foo(const size_t buffer_size = 8192) : bar_buffers{ buffer_size, buffer_size }, primary_buffer(&bar_buffers[0]), secondary_buffer(&bar_buffers[1]),
write_predicate(false), quit_predicate(false), write_buffer(primary_buffer)
{
foo_thread = std::thread(&Foo::WriteHWThread, this);
}
~Foo()
{
quit_predicate = true;
begin_write.notify_one();
if (foo_thread.joinable())
foo_thread.join();
}
Foo(const Foo&) = delete;
Foo& operator=(const Foo&) = delete;
void WriteData(const std::vector<uint8_t>& data)
{
if (std::lock_guard<std::mutex> foo_lk(foo_lock); primary_buffer->GetRemainingBufferSize() < data.size())
{
std::unique_lock<std::mutex> write_lk(write_lock);
write_buffer = primary_buffer;
write_lk.unlock();
std::swap(primary_buffer, secondary_buffer);
primary_buffer->ResetBuffer();
write_predicate = true;
begin_write.notify_one();
}
primary_buffer->WriteIntoBuffer(data);
}
void WriteHWThread(void)
{
do
{
std::unique_lock<std::mutex> write_lk(write_lock);
begin_write.wait(write_lk, [&]() -> bool { return write_predicate.load() || quit_predicate.load(); });
write_predicate = false;
if (write_buffer.load()->GetUsedBufferSize())
<<< WRITE TO DEDICATED HARDWARE >>>
write_lk.unlock();
} while (!quit_predicate);
}
private:
Bar bar_buffers[2];
Bar* primary_buffer, *secondary_buffer;
std::atomic<bool> write_predicate, quit_predicate;
std::atomic<Bar*> write_buffer;
std::mutex foo_lock, write_lock;
std::thread foo_thread;
std::condition_variable begin_write;
};

Related

Trying to control multithreaded access to array using std::atomic

I'm trying to control multithreaded access to a vector of data which is fixed in size, so threads will wait until their current position in it has been filled before trying to use it, or will fill it themselves if no-one else has yet. (But ensure no-one is waiting around if their position is already filled, or no-one has done it yet)
However, I am struggling to understand a good way to do this, especially involving std::atomic. I'm just not very familiar with C++ multithreading concepts aside from basic std::thread usage.
Here is a very rough example of the problem:
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (!myData[position])
{
bar(myData[position]);
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* &data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
Now imagine if foo() is being called multi-threaded, and multiple threads may (or may not) have the same position at once. If that happens, there's a chance that a thread may (between when the Data was created and when bar() is finished, try to actually use the data.
So, what are the options?
1: Make a std::mutex for every position in myData. What if there are 10,000 elements in myData? That's 10,000 std::mutexes, not great.
2: Put a lock_guard around it like this:
std::mutex myMutex;
{
const std::lock_guard<std::mutex> lock(myMutex);
if (!myData[position])
{
bar(myData[position]);
}
}
While this works, it also means if different threads are working in different positions, they wait needlessly, wasting all of the threading advantage.
3: Use a vector of chars and a spinlock as a poor man's mutex? Here's what that might look like:
static std::vector<char> positionInProgress;
static std::vector<char> positionComplete;
class myClass
{
struct Data
{
int res1;
};
std::vector<Data*> myData;
int foo(unsigned long position)
{
if (positionInProgress[position])
{
while (positionInProgress[position])
{
; // do nothing, just wait until it is done
}
}
else
{
if (!positionComplete[position])
{
// Fill the data and prevent anyone from using it until it is complete
positionInProgress[position] = true;
bar(myData[position]);
positionInProgress[position] = false;
positionComplete[position] = true;
}
}
// Do something with the data
return 5 * myData[position]->res1;
}
void bar(Data* data)
{
data = new Data;
// Do a whole bunch of calculations and so-on here
data->res1 = 42;
}
};
This seems to work, but none of the test or set operations are atomic, so I have a feeling I'm just getting lucky.
4: What about std::atomic and std::atomic_flag? Well, there are a few problems.
std::atomic_flag doesn't have a way to test without setting in C++11...which makes this kind of difficult.
std::atomic is not movable or copy-constructable, so I cannot make a vector of them (I do not know the number of positions during construction of myClass)
Conclusion:
This is the simplest example that (likely) compiles I can think of that demonstrates my real problem. In reality, myData is a 2-dimensional vector implemented using a special hand-rolled solution, Data itself is a vector of pointers to more complex data types, the data isn't simply returned, etc. This is the best I could come up with.
The biggest problem you're likely to have is that a vector itself is not thread-safe, so you can't do ANY operation that might chage the vector (invalidate references to elements of the vector) while another thread might be accessing it, such as resize or push_back. However, if you vector is effectively "fixed" (you set the size prior to ever spawning threads and thereafter only ever access elements using at or operator[] and never ever modify the vector itself), you can get away with using a vector of atomic objects. In this case you could have:
std::vector<std::atomic<Data*>> myData;
and your code to setup and use an element could look like:
if (!myData[position]) {
Data *tmp = new Data;
if (!mydata[position].compare_exchange_strong(nullptr, tmp)) {
// some other thread did the setup
delete tmp; } }
myData[position]->bar();
Of course you still need to make sure that the operations done on members of Data in bar are themselves threadsafe, as you can get mulitple threads calling bar on the same Data instance here.

Multithreaded C++: force read from memory, bypassing cache

I'm working on a personal hobby-time game engine and I'm working on a multithreaded batch executor. I was originally using a concurrent lockless queue and std::function all over the place to facilitate communication between the master and slave threads, but decided to scrap it in favor of a lighter-weight way of doing things that give me tight control over memory allocation: function pointers and memory pools.
Anyway, I've run into a problem:
The function pointer, no matter what I try, is only getting read correctly by one thread while the others read a null pointer and thus fail an assert.
I'm fairly certain this is a problem with caching. I have confirmed that all threads have the same address for the pointer. I've tried declaring it as volatile, intptr_t, std::atomic, and tried all sorts of casting-fu and the threads all just seem to ignore it and continue reading their cached copies.
I've modeled the master and slave in a model checker to make sure the concurrency is good, and there is no livelock or deadlock (provided that the shared variables all synchronize correctly)
void Executor::operator() (int me) {
while (true) {
printf("Slave %d waiting.\n", me);
{
std::unique_lock<std::mutex> lock(batch.ready_m);
while(!batch.running) batch.ready.wait(lock);
running_threads++;
}
printf("Slave %d running.\n", me);
BatchFunc func = batch.func;
assert(func != nullptr);
int index;
if (batch.store_values) {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, data);
}
}
else {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, *data);
}
}
// at least one thread finished, so make sure we won't loop back around
batch.running = false;
if (running_threads.fetch_sub(1) == 1) { // I am the last one
batch.done = true; // therefore all threads are done
batch.complete.notify_all();
}
}
}
void Executor::run_batch() {
assert(!batch.running);
if (batch.func == nullptr || batch.n_items == 0) return;
batch.item.store(0);
batch.running = true;
batch.done = false;
batch.ready.notify_all();
printf("Master waiting.\n");
{
std::unique_lock<std::mutex> lock(batch.complete_m);
while (!batch.done) batch.complete.wait(lock);
}
printf("Master ready.\n");
batch.func = nullptr;
batch.n_items = 0;
}
batch.func is being set by another function
template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
assert(std::is_pod<ItemT>::value || !byValue);
assert(!batch.running);
batch.func = reinterpret_cast<volatile BatchFunc>(func);
memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
batch.store_values = byValue;
if (byValue) {
batch.item_size = sizeof(ItemT);
}
else { // store pointers instead of values
batch.item_size = sizeof(ItemT*);
}
batch.n_items = 0;
}
and here is the struct (and typedef) that it's dealing with
typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
volatile BatchFunc func;
void* const share_data = operator new(SHARED_DATA_MAXSIZE);
intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
volatile size_t item_size;
std::atomic<int> item; // Index into the data array
volatile int n_items = 0;
std::condition_variable complete; // slave -> master signal
std::condition_variable ready; // master -> slave signal
std::mutex complete_m;
std::mutex ready_m;
bool store_values = false;
volatile bool running = false; // there is work to do in the batch
volatile bool done = false; // there is no work left to do
JobBatch();
} batch;
How do I make sure that all the necessary reads and writes to batch.func get synchronized properly between threads?
Just in case it matters: I'm using Visual Studio and compiling an x64 Debug Windows executable. Intel i5, Windows 10, 8GB RAM.
So I did a little reading on the C++ memory model and I managed to hack together a solution using atomic_thread_fence. Everything is probably super broken because I'm crazy and shouldn't roll my own system here, but hey, it's fun to learn!
Basically, whenever you're done writing things that you want other threads to see, you need to call atomic_thread_fence(std::memory_order_release)
On the receiving thread(s), you call atomic_thread_fence(std::memory_order_acquire) before reading shared data.
In my case, release should be done immediately before waiting on a condition variable and acquire should be done immediately before using data written by other threads.
This ensures that the writes on one thread are seen by the others.
I'm no expert, so this is probably not the right way to tackle the problem and will likely be faced with certain doom. For instance, I still have a deadlock/livelock problem to sort out.
tl;dr: it's not exactly a cache thing: threads may not have their data totally in sync with each other unless you enforce that with atomic memory fences.

std::function in combination with thread c++11 fails debug assertion in vector

I want to build a helper class that can accept an std::function created via std::bind) so that i can call this class repeaded from another thread:
short example:
void loopme() {
std::cout << "yay";
}
main () {
LoopThread loop = { std::bind(&loopme) };
loop.start();
//wait 1 second
loop.stop();
//be happy about output
}
However, when calling stop() my current implementation will raise the following error: debug assertion Failed , see Image: i.stack.imgur.com/aR9hP.png.
Does anyone know why the error is thrown ?
I don't even use vectors in this example.
When i dont call loopme from within the thread but directly output to std::cout, no error is thrown.
Here the full implementation of my class:
class LoopThread {
public:
LoopThread(std::function<void(LoopThread*, uint32_t)> function) : function_{ function }, thread_{ nullptr }, is_running_{ false }, counter_{ 0 } {};
~LoopThread();
void start();
void stop();
bool isRunning() { return is_running_; };
private:
std::function<void(LoopThread*, uint32_t)> function_;
std::thread* thread_;
bool is_running_;
uint32_t counter_;
void executeLoop();
};
LoopThread::~LoopThread() {
if (isRunning()) {
stop();
}
}
void LoopThread::start() {
if (is_running_) {
throw std::runtime_error("Thread is already running");
}
if (thread_ != nullptr) {
throw std::runtime_error("Thread is not stopped yet");
}
is_running_ = true;
thread_ = new std::thread{ &LoopThread::executeLoop, this };
}
void LoopThread::stop() {
if (!is_running_) {
throw std::runtime_error("Thread is already stopped");
}
is_running_ = false;
thread_->detach();
}
void LoopThread::executeLoop() {
while (is_running_) {
function_(this, counter_);
++counter_;
}
if (!is_running_) {
std::cout << "end";
}
//delete thread_;
//thread_ = nullptr;
}
I used the following Googletest code for testing (however a simple main method containing the code should work):
void testfunction(pft::LoopThread*, uint32_t i) {
std::cout << i << ' ';
}
TEST(pfFiles, TestLoop)
{
pft::LoopThread loop{ std::bind(&testfunction, std::placeholders::_1, std::placeholders::_2) };
loop.start();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
loop.stop();
std::this_thread::sleep_for(std::chrono::milliseconds(2500));
std::cout << "Why does this fail";
}
Your use of is_running_ is undefined behavior, because you write in one thread and read in another without a synchronization barrier.
Partly due to this, your stop() doesn't stop anything. Even without this UB (ie, you "fix" it by using an atomic), it just tries to say "oy, stop at some point", by the end it does not even attempt to guarantee the stop happened.
Your code calls new needlessly. There is no reason to use a std::thread* here.
Your code violates the rule of 5. You wrote a destructor, then neglected copy/move operations. It is ridiculously fragile.
As stop() does nothing of consequence to stop a thread, your thread with a pointer to this outlives your LoopThread object. LoopThread goes out of scope, destroying what the pointer your std::thread stores. The still running executeLoop invokes a std::function that has been destroyed, then increments a counter to invalid memory (possibly on the stack where another variable has been created).
Roughly, there is 1 fundamental error in using std threading in every 3-5 lines of your code (not counting interface declarations).
Beyond the technical errors, the design is wrong as well; using detach is almost always a horrible idea; unless you have a promise you make ready at thread exit and then wait on the completion of that promise somewhere, doing that and getting anything like a clean and dependable shutdown of your program is next to impossible.
As a guess, the vector error is because you are stomping all over stack memory and following nearly invalid pointers to find functions to execute. The test system either puts an array index in the spot you are trashing and then the debug vector catches that it is out of bounds, or a function pointer that half-makes sense for your std function execution to run, or somesuch.
Only communicate through synchronized data between threads. That means atomic data, or mutex guarded, unless you are getting ridiculously fancy. You don't understand threading enough to get fancy. You don't understand threading enough to copy someone who got fancy and properly use it. Don't get fancy.
Don't use new. Almost never, ever use new. Use make_shared or make_unique if you absolutely have to. But use those rarely.
Don't detach a thread. Period. Yes this means you might have to wait for it to finish a loop or somesuch. Deal with it, or write a thread manager that does the waiting at shutdown or somesuch.
Be extremely clear about what data is owned by what thread. Be extremely clear about when a thread is finished with data. Avoid using data shared between threads; communicate by passing values (or pointers to immutable shared data), and get information from std::futures back.
There are a number of hurdles in learning how to program. If you have gotten this far, you have passed a few. But you probably know people who learned along side of you that fell over at one of the earlier hurdles.
Sequence, that things happen one after another.
Flow control.
Subprocedures and functions.
Looping.
Recursion.
Pointers/references and dynamic vs automatic allocation.
Dynamic lifetime management.
Objects and Dynamic dispatch.
Complexity
Coordinate spaces
Message
Threading and Concurrency
Non-uniform address spaces, Serialization and Networking
Functional programming, meta functions, currying, partial application, Monads
This list is not complete.
The point is, each of these hurdles can cause you to crash and fail as a programmer, and getting each of these hurdles right is hard.
Threading is hard. Do it the easy way. Dynamic lifetime management is hard. Do it the easy way. In both cases, extremely smart people have mastered the "manual" way to do it, and the result is programs that exhibit random unpredictable/undefined behavior and crash a lot. Muddling through manual resource allocation and deallocation and multithreaded code can be made to work, but the result is usually someone whose small programs work accidentally (they work insofar as you fixed the bugs you noticed). And when you master it, initial mastery comes in the form of holding an entire program's "state" in uour head and understanding how it works; this fails to scale to large many-developer code bases, so younusually graduate to having large programs that work accidentally.
Both make_unique style and only-immutable-shared-data based threading are composible strategies. This means if small pieces are correct, and you put them together, the resulting program is correct (with regards to resource lifetime and concurrency). That permits local mastery of small-scale threading or resource management to apply to larfe-scale programs in the domain that these strategies work.
After following the guide from #Yakk i decided to restructure my programm:
bool is_running_ will change to td::atomic<bool> is_running_
stop() will not only trigger the stopping, but will activly wait for the thread to stop via a thread_->join()
all calls of new are replaced with std::make_unique<std::thread>( &LoopThread::executeLoop, this )
I have no experience with copy or move constructors. So i decided to forbid them. This should prevent me from accidently using this. If i sometime in the future will need those i have to take a deepter look on thoose
thread_->detach() was replaced by thread_->join() (see 2.)
This is the end of the list.
class LoopThread {
public:
LoopThread(std::function<void(LoopThread*, uint32_t)> function) : function_{ function }, is_running_{ false }, counter_{ 0 } {};
LoopThread(LoopThread &&) = delete;
LoopThread(const LoopThread &) = delete;
LoopThread& operator=(const LoopThread&) = delete;
LoopThread& operator=(LoopThread&&) = delete;
~LoopThread();
void start();
void stop();
bool isRunning() const { return is_running_; };
private:
std::function<void(LoopThread*, uint32_t)> function_;
std::unique_ptr<std::thread> thread_;
std::atomic<bool> is_running_;
uint32_t counter_;
void executeLoop();
};
LoopThread::~LoopThread() {
if (isRunning()) {
stop();
}
}
void LoopThread::start() {
if (is_running_) {
throw std::runtime_error("Thread is already running");
}
if (thread_ != nullptr) {
throw std::runtime_error("Thread is not stopped yet");
}
is_running_ = true;
thread_ = std::make_unique<std::thread>( &LoopThread::executeLoop, this );
}
void LoopThread::stop() {
if (!is_running_) {
throw std::runtime_error("Thread is already stopped");
}
is_running_ = false;
thread_->join();
thread_ = nullptr;
}
void LoopThread::executeLoop() {
while (is_running_) {
function_(this, counter_);
++counter_;
}
}
TEST(pfThread, TestLoop)
{
pft::LoopThread loop{ std::bind(&testFunction, std::placeholders::_1, std::placeholders::_2) };
loop.start();
std::this_thread::sleep_for(std::chrono::milliseconds(50));
loop.stop();
}

Concurrently processing data. What do I need to watch out for?

I have a routine that is meant to load and parse data from a file. There is a possibility that the data from the same file might need to be retrieved from two places at once, i.e. during a background caching process and from a user request.
Specifically I am using C++11 thread and mutex libraries. We compile with Visual C++ 11 (aka 2012), so are limited by whatever it lacks.
My naive implementation went something like this:
map<wstring,weak_ptr<DataStruct>> data_cache;
mutex data_cache_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file_path) {
lock_guard<mutex> lock(data_cache_mutex);
auto cache_iter = data_cache.find(file_path);
if (cache_iter != end(data_cache)) {
auto data_ptr = cache_iter->second.lock();
if (data_ptr)
return data_ptr;
// reference died, remove it
data_cache.erase(cache_iter);
}
auto data_ptr = ParseDataFile(file_path);
if (data_ptr)
data_cache.emplace(make_pair(file_path, data_ptr));
return data_ptr;
}
My goals were two-fold:
Allow multiple threads to load separate files concurrently
Ensure that a file is only processed once
The problem with my current approach is that it doesn't allow concurrent parsing of multiple files at all. If I understand what will happen correctly, they're each going to hit the lock and end up processing linearly, one thread at a time. It may change from run to run the order which the threads pass through the lock first, but the end result is the same.
One solution I've considered was to create a second map:
map<wstring,mutex> data_parsing_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
/* etc. */
data_parsing_mutex.erase(file_path);
}
But now I have to be concerned with how data_parsing_mutex is being updated. So I guess I need another mutex?
map<wstring,mutex> data_parsing_mutex;
mutex data_parsing_mutex_mutex;
shared_ptr<DataStruct> ParseDataFile(wstring file_path) {
unique_lock<mutex> super_lock(data_parsing_mutex_mutex);
lock_guard<mutex> lock(data_parsing_mutex[file_path]);
super_lock.unlock();
/* etc. */
super_lock.lock();
data_parsing_mutex.erase(file_path);
}
In fact, looking at this, it's not going to avoid necessarily double-processing a file if it hasn't been completed by the background process when the user requests it, unless I check the cache yet again.
But by now my spidey senses are saying There must be a better way. Is there? Would futures, promises, or atomics help me at all here?
From what you described, it sounds like you're trying to do a form of lazy initialization of the DataStruct using a thread pool, along with a reference counted cache. std::async should be able to provide a lot of the dispatch and synchronization necessary for something like this.
Using std::async, the code would look something like this...
map<wstring,weak_ptr<DataStruct>> cache;
map<wstring,shared_future<shared_ptr<DataStruct>>> pending;
mutex cache_mutex, pending_mutex;
shared_ptr<DataStruct> ParseDataFromFile(wstring file) {
auto data_ptr = make_shared<DataStruct>();
/* Parses and processes the data, may take a while */
return data_ptr;
}
shared_ptr<DataStruct> CreateStructFromData(wstring file) {
shared_future<weak_ptr<DataStruct>> pf;
shared_ptr<DataStruct> ce;
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (!(ci == cache.end() || ci->second.expired()))
return ci->second.lock();
}
{
lock_guard(pending_mutex);
auto fi = pending.find(file);
if (fi == pending.end() || fi.second.get().expired()) {
pf = async(ParseDataFromFile, file).share();
pending.insert(fi, make_pair(file, pf));
} else {
pf = pi->second;
}
}
pf.wait();
ce = pf.get();
{
lock_guard(cache_mutex);
auto ci = cache.find(file);
if (ci == cache.end() || ci->second.expired())
cache.insert(ci, make_pair(file, ce));
}
{
lock_guard(pending_mutex);
auto pi = pending.find(file);
if (pi != pending.end())
pending.erase(pi);
}
return ce;
}
This can probably be optimized a bit, but the general idea should be the same.
On a typical computer there is little point in trying to load files concurrently, since disk access will be the bottleneck. Instead, it's better to have a single thread load files (or use asynchronous I/O) and dish out the parsing to a thread pool. Then store the results in a shared container.
Regarding preventing double work, you should consider if this is really necessary. If you are only doing this out of premature optimization, you'd probably make users happier by focussing on making the program responsive, rather than efficient. That is, make sure the user gets what they ask for quickly, even if it means doing double work.
OTOH, if there is a technical reason for not parsing a file twice, you can keep track of the status of each file (loading, parsing, parsed) in the shared container.

How and what data must be synced in multithreaded c++

I build a little application which has a render thread and some worker threads for tasks which can be made nearby the rendering, e.g. uploading files onto some server. Now in those worker threads I use different objects to store feedback information and share these with the render thread to read them for output purpose. So render = output, worker = input. Those shared objects are int, float, bool, STL string and STL list.
I had this running a few months and all was fine except 2 random crashes during output, but I learned about thread syncing now. I read int, bool, etc do not require syncing and I think it makes sense, but when I look at string and list I fear potential crashes if 2 threads attempt to read/write an object the same time. Basically I expect one thread changes the size of the string while the other might use the outdated size to loop through its characters and then read from unallocated memory. Today evening I want to build a little test scenario with 2 threads writing/reading the same object in a loop, however I was hoping to get some ideas here aswell.
I was reading about the CriticalSection in Win32 and thought it may be worth a try. Yet I am unsure what the best way would be to implement it. If I put it at the start and at the end of a read/function it feels like some time was wasted. And if I wrap EnterCriticalSection and LeaveCriticalSection in Set and Get Functions for each object I want to have synced across the threads, it is alot of adminstration.
I think I must crawl through more references.
Okay I am still not sure how to proceed. I was studying the links provided by StackedCrooked but do still have no image of how to do this.
I put copied/modified together this now and have no idea how to continue or what to do: someone has ideas?
class CSync
{
public:
CSync()
: m_isEnter(false)
{ InitializeCriticalSection(&m_CriticalSection); }
~CSync()
{ DeleteCriticalSection(&m_CriticalSection); }
bool TryEnter()
{
m_isEnter = TryEnterCriticalSection(&m_CriticalSection)==0 ? false:true;
return m_isEnter;
}
void Enter()
{
if(!m_isEnter)
{
EnterCriticalSection(&m_CriticalSection);
m_isEnter=true;
}
}
void Leave()
{
if(m_isEnter)
{
LeaveCriticalSection(&m_CriticalSection);
m_isEnter=false;
}
}
private:
CRITICAL_SECTION m_CriticalSection;
bool m_isEnter;
};
/* not needed
class CLockGuard
{
public:
CLockGuard(CSync& refSync) : m_refSync(refSync) { Lock(); }
~CLockGuard() { Unlock(); }
private:
CSync& m_refSync;
CLockGuard(const CLockGuard &refcSource);
CLockGuard& operator=(const CLockGuard& refcSource);
void Lock() { m_refSync.Enter(); }
void Unlock() { m_refSync.Leave(); }
};*/
template<class T> class Wrap
{
public:
Wrap(T* pp, const CSync& sync)
: p(pp)
, m_refSync(refSync)
{}
Call_proxy<T> operator->() { m_refSync.Enter(); return Call_proxy<T>(p); }
private:
T* p;
CSync& m_refSync;
};
template<class T> class Call_proxy
{
public:
Call_proxy(T* pp, const CSync& sync)
: p(pp)
, m_refSync(refSync)
{}
~Call_proxy() { m_refSync.Leave(); }
T* operator->() { return p; }
private:
T* p;
CSync& m_refSync;
};
int main
{
CSync sync;
Wrap<string> safeVar(new string);
// safeVar what now?
return 0;
};
Okay so I was preparing a little test now to see if my attempts do something good, so first I created a setup to make the application crash, I believed...
But that does not crash!? Does that mean now I need no syncing? What does the program need to effectively crash? And if it does not crash why do I even bother. It seems I am missing some point again. Any ideas?
string gl_str, str_test;
void thread1()
{
while(true)
{
gl_str = "12345";
str_test = gl_str;
}
};
void thread2()
{
while(true)
{
gl_str = "123456789";
str_test = gl_str;
}
};
CreateThread( NULL, 0, (LPTHREAD_START_ROUTINE)thread1, NULL, 0, NULL );
CreateThread( NULL, 0, (LPTHREAD_START_ROUTINE)thread2, NULL, 0, NULL );
Just added more stuff and now it crashes when calling clear(). Good.
void thread1()
{
while(true)
{
gl_str = "12345";
str_test = gl_str;
gl_str.clear();
gl_int = 124;
}
};
void thread2()
{
while(true)
{
gl_str = "123456789";
str_test = gl_str;
gl_str.clear();
if(gl_str.empty())
gl_str = "aaaaaaaaaaaaa";
gl_int = 244;
if(gl_int==124)
gl_str.clear();
}
};
The rules is simple: if the object can be modified in any thread, all accesses to it require synchronization. The type of object doesn't matter: even bool or int require external synchronization of some sort (possibly by means of a special, system dependent function, rather than with a lock). There are no exceptions, at least in C++. (If you're willing to use inline assembler, and understand the implications of fences and memory barriers, you may be able to avoid a lock.)
I read int, bool, etc do not require syncing
This is not true:
A thread may store a copy of the variable in a CPU register and keep using the old value even in the original variable has been modified by another thread.
Simple operations like i++ are not atomic.
The compiler may reorder reads and writes to the variable. This may cause synchronization issues in multithreaded scenarios.
See Lockless Programming Considerations for more details.
You should use mutexes to protect against race conditions. See this article for a quick introduction to the boost threading library.
First, you do need protection even for accessing the most primitive of data types.
If you have an int x somewhere, you can write
x += 42;
... but that will mean, at the lowest level: read the old value of x, calculate a new value, write the new value to the variable x. If two threads do that at about the same time, strange things will happen. You need a lock/critical section.
I'd recommend using the C++11 and related interfaces, or, if that is not available, the corresponding things from the boost::thread library. If that is not an option either, critical sections on Win32 and pthread_mutex_* for Unix.
NO, Don't Start Writing Multithreaded Programs Yet!
Let's talk about invariants first.
In a (hypothetical) well-defined program, every class has an invariant.
The invariant is some logical statement that is always true about an instance's state, i.e. about the values of all its member variables. If the invariant ever becomes false, the object is broken, corrupted, your program may crash, bad things have already happened. All your functions assume that the invariant is true when they are called, and they make sure that it is still true afterwards.
When a member function changes a member variable, the invariant might temporarily become false, but that is OK because the member function will make sure that everything "fits together" again before it exits.
You need a lock that protects the invariant - whenever you do something that might affect the invariant, take the lock and do not release it until you've made sure that the invariant is restored.