C++ Threading using 2 Containers - c++

I have the following problem. I use a vector that gets filled up with values from a temperature sensor. This function runs in one thread. Then I have another thread responsible for publishing all the values into a data base which runs once every second. Now the publishing thread will lock the vector using a mutex, so the function that fills it with values will get blocked. However, while the thread that publishes the values is using the vector I want to use another vector to save the temperature values so that I don't lose any values while the data is getting published. How do I get around this problem? I thought about using a pointer that points to the containers and then switching it to the other container once it gets locked to keep saving values, but I dont quite know how.
I tried to add a minimal reproducable example, I hope it kind of explains my situation.
void publish(std::vector<temperature> &inputVector)
{
//this function would publish the values into a database
//via mqtt and also runs in a thread.
}
int main()
{
std::vector<temperature> testVector;
std::vector<temperature> testVector2;
while(1)
{
//I am repeatedly saving values into the vector.
//I want to do this in a thread but if the vector locked by a mutex
//i want to switch over to the other vector
testVector.push_back(testSensor.getValue());
}
}

Assuming you are using std::mutex, you can use mutex::try_lock on the producer side. Something like this:
while(1)
{
if (myMutex.try_lock()) {
// locking succeeded - move all queued values and push the new value
std::move(testVector2.begin(), testVector2.end(), std::back_inserter(testVector));
testVector2.clear();
testVector.push_back(testSensor.getValue());
myMutex.unlock();
} else {
// locking failed - queue the value
testVector2.push_back(testSensor.getValue());
}
}
Of course publish() needs to lock the mutex, too.
void publish(std::vector<temperature> &inputVector)
{
std::lock_guard<std::mutex> lock(myMutex);
//this function would publish the values into a database
//via mqtt and also runs in a thread.
}

This seems like the perfect opportunity for an additional (shared) buffer or queue, that's protected by the lock.
main would be essentially as it is now, pushing your new values into the shared buffer.
The other thread would, when it can, lock that buffer and take the new values from it. This should be very fast.
Then, it does not need to lock the shared buffer while doing its database things (which take longer), as it's only working on its own vector during that procedure.
Here's some pseudo-code:
std::mutex pendingTempsMutex;
std::vector<temperature> pendingTemps;
void thread2()
{
std::vector<temperature> temps;
while (1)
{
// Get new temps if we have any
{
std::scoped_lock l(pendingTempsMutex);
temps.swap(pendingTemps);
}
if (!temps.empty())
publish(temps);
}
}
void thread1()
{
while (1)
{
std::scoped_lock l(pendingTempsMutex);
pendingTemps.push_back(testSensor.getValue());
/*
Or, if getValue() blocks:
temperature newValue = testSensor.getValue();
std::scoped_lock l(pendingTempsMutex);
pendingTemps.push_back(newValue);
*/
}
}
Usually you'd use a std::queue for pendingTemps though. I don't think it really matters in this example, because you're always consuming everything in thread 2, but it's more conventional and can be more efficient in some scenarios. It can't lose you much as it's backed by a std::deque. But you can measure/test to see what's best for you.
This solution is pretty much what you already proposed/explored in the question, except that the producer shouldn't be in charge of managing the second vector.
You can improve it by having thread2 wait to be "informed" that there are new values, with a condition variable, otherwise you're going to be doing a lot of busy-waiting. I leave that as an exercise to the reader ;) There should be an example and discussion in your multi-threaded programming book.

Related

Multiple threads access shared resources

I'm currently working on a particle system, which uses one thread in which the particles are first updated, then drawn. The particles are stored in a std::vector. I would like to move the update function to a separate thread to improve the systems performance. However this means that I encounter problems when the update thread and the draw thread are accessing the std::vector at the same time. My update function will change the values for the position, and colour of all particles, and also almost always resize the std::vector.
Single thread approach:
std::vector<Particle> particles;
void tick() //tick would be called from main update loop
{
//slow as must wait for update to draw
updateParticles();
drawParticles();
}
Multithreaded:
std::vector<Particle> particles;
//quicker as no longer need to wait to draw and update
//crashes when both threads access the same data, or update resizes vector
void updateThread()
{
updateParticles();
}
void drawThread()
{
drawParticles();
}
To fix this problem I have investigated using std::mutex however in practice, with a large amount of particles, the constant locking of threads meant that performance didn't increase. I have also investigated std::atomic however, neither the particles nor std::vector are trivially copyable and so can't use this either.
Multithreaded using mutex:
NOTE: I am using SDL mutex, as far as I am aware, the principles are the same.
SDL_mutex mutex = SDL_CreateMutex();
SDL_cond canDraw = SDL_CreateCond();
SDL_cond canUpdate = SDL_CreateCond();
std::vector<Particle> particles;
//locking the threads leads to the same problems as before,
//now each thread must wait for the other one
void updateThread()
{
SDL_LockMutex(lock);
while(!canUpdate)
{
SDL_CondWait(canUpdate, lock);
}
updateParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canDraw);
}
void drawThread()
{
SDL_LockMutex(lock);
while(!canDraw)
{
SDL_CondWait(canDraw, lock);
}
drawParticles();
SDL_UnlockMutex(lock);
SDL_CondSignal(canUpdate);
}
I am wondering if there are any other ways to implement the multi threaded approach? Essentially preventing the same data from being accessed by both threads at the same time, without having to make each thread wait for the other. I have thought about making a local copy of the vector to draw from, but this seems like it would be inefficient, and may run into the same problems if the update thread changes the vector while it's being copied?
I would use a more granular locking strategy. Instead of storing a particle object in your vector, I would store a pointer to a different object.
struct lockedParticle {
particle* containedParticle;
SDL_mutex lockingObject;
};
In updateParticles() I would attempt to obtain the individual locking objects using SDL_TryLockMutex() - if I fail to obtain control of the mutex I would add the pointer to this particular lockedParticle instance to another vector, and retry later to update them.
I would follow a similar strategy inside the drawParticles(). This relies on the fact that draw order does not matter for particles, which is often the case.
If data consistency is not a concern you can avoid blocking the whole vector by encapsulating vector in a custom class and setting mutex on single read/write operations only, something like:
struct SharedVector
{
// ...
std::vector<Particle> vec;
void push( const& Particle particle )
{
SDL_LockMutex(lock);
vec.push_back(particle);
SDL_UnlockMutex(lock);
}
}
//...
SharedVector particles;
Then of course, you need to amend updateParticles() and drawParticles() to use new type instead of std::vector.
EDIT:
You can avoid creating new structure by using mutexes in updateParticles() and drawParticles() methods, e.g.
void updateParticles()
{
//... get Particle particle object
SDL_LockMutex(lock);
particles.push_back(particle);
SDL_UnlockMutex(lock);
}
The same should be done for drawParticles() as well.
If the vector is changing all the time, you can use two vectors. drawParticles would have its own copy, and updateParticles would write to another one. Once both functions are done, swap, copy, or move the vector used by updateParticles to the to be used by drawParticles. (updateParticles can read from the same vector used by drawParticles to get at the current particle positions, so you shouldn't need to create a complete new copy.) No locking necessary.

Why does my lock-free message queue segfault :(?

As a purely mental exercise I'm trying to get this to work without locks or mutexes. The idea is that when the consumer thread is reading/executing messages it atomically swaps which std::vector the producer thread uses for writes. Is this possible? I've tried playing around with thread fences to no avail. There's a race condition here somewhere because it occasionally seg faults. I imagine it's somewhere in the enqueue function. Any ideas?
// should execute functions on the original thread
class message_queue {
public:
using fn = std::function<void()>;
using queue = std::vector<fn>;
message_queue() : write_index(0) {
}
// should only be called from consumer thread
void run () {
// atomically gets the current pending queue and switches it with the other one
// for example if we're writing to queues[0], we grab a reference to queue[0]
// and tell the producer to write to queues[1]
queue& active = queues[write_index.fetch_xor(1)];
// skip if we don't have any messages
if (active.size() == 0) return;
// run all messages/callbacks
for (auto fn : active) {
fn();
}
// clear the active queue so it can be re-used
active.clear();
// swap active and pending threads
write_index.fetch_xor(1);
}
void enqueue (fn value) {
// loads the current pending queue and append some work
queues[write_index.load()].push_back(value);
}
private:
queue queues[2];
std::atomic<bool> is_empty; // unused for now
std::atomic<int> write_index;
};
int main(int argc, const char * argv[])
{
message_queue queue{};
// flag to stop the message loop
// doesn't actually need to be atomic because it's only read/wrote on the main thread
std::atomic<bool> done(false);
std::thread worker([&queue, &done] {
int count = 100;
// send 100 messages
while (--count) {
queue.enqueue([count] {
// should be executed in the main thread
std::cout << count << "\n";
});
}
// finally tell the main thread we're done
queue.enqueue([&] {
std::cout << "done!\n";
done = true;
});
});
// run messages until the done flag is set
while(!done) queue.run();
worker.join();
}
if I understand your code correctly, there are data races, e.g.:
// producer
int r0 = write_index.load(); // r0 == 0
// consumer
int r1 = write_index.fetch_xor(1); // r1 == 0
queue& active = queues[r1];
active.size();
// producer
queue[r0].push_back(...);
Now both threads access the same queue at the same time. That's a data race, and that means undefined behaviour.
Your lock-free queue fails to work because you did not start with at least a semi-formal proof of correctness, then turn that proof into an algorithm with the proof being the primary text, comments connecting the proof to the code, all interconnected with the code.
Unless you are copy/pasting someone else's implementation who did do that, any attempt to write a lock-free algorithm will fail. If you are copy-pasting someone else's implementation, please provide it.
Lock free algorithms are not robust unless you have such a proof that they are correct, because the kind of errors that make them fail are subtle, and extreme care must be taken. Simply "rolling" a lock free algorithm, even if it fails to result in apparent problems during testing, is a recipe for unreliable code.
One way to get around writing a formal proof in this kind of situation is to track down someone who has written proven correct pseudo code or the like. Sketch out the pseudo code, together with the proof of correctness, in comments. Then fill in the code in the holes.
In general, proving an "almost correct" lock free algorithm is flawed is harder than writing a solid proof that a lock free algorithm is correct if implemented in a particular way, then implementing it. Now, if your algorithm is so flawed that it is easy to find the flaws, then you aren't showing a basic understanding of the problem domain.
In short, by posting "why is my algorithm wrong", you are approaching how to write lock free algorithms incorrectly. "Where is the flaw in my proof?", "I proved this pseudo-code correct here, and then I implemented it, why do my tests show deadlocks?" are good lock-free questions. "Here is a bunch of code with comments that merely describe what the next line of code does, and no comments describing why I do the next line of code, or how that line of code maintains my lock-free invariants" is not a good lock-free question.
Step back. Find some proven-correct algorithms. Learn how the proof work. Implement some proven correct algorithms via monkey-see monkey-do. Look at the footnotes to note the issues their proof overlooked (like A-B issues). After you have a bunch of those under your belt, try a variation, and do the proof, and check the proof, and do the implementation, and check the implementation.

How to use C++11 <thread> designing a system which pulls data from sources

This question comes from:
C++11 thread doesn't work with virtual member function
As suggested in a comment, my question in previous post may not the right one to ask, so here is the original question:
I want to make a capturing system, which will query a few sources in a constant/dynamic frequency (varies by sources, say 10 times / sec), and pull data to each's queues. while the sources are not fixed, they may add/remove during run time.
and there is a monitor which pulls from queues at a constant freq and display the data.
So what is the best design pattern or structure for this problem.
I'm trying to make a list for all the sources pullers, and each puller holds a thread, and a specified pulling function (somehow the pulling function may interact with the puller, say if the source is drain, it will ask to stop the pulling process on that thread.)
Unless the operation where you query a source is blocking (or you have lots of them), you don't need to use threads for this. We could start with a Producer which will work with either synchronous or asynchronous (threaded) dispatch:
template <typename OutputType>
class Producer
{
std::list<OutputType> output;
protected:
int poll_interval; // seconds? milliseconds?
virtual OutputType query() = 0;
public:
virtual ~Producer();
int next_poll_interval() const { return poll_interval; }
void poll() { output.push_back(this->query()); }
std::size_t size() { return output.size(); }
// whatever accessors you need for the queue here:
// pop_front, swap entire list, etc.
};
Now we can derive from this Producer and just implement the query method in each subtype. You can set poll_interval in the constructor and leave it alone, or change it on every call to query. There's your general producer component, with no dependency on the dispatch mechanism.
template <typename OutputType>
class ThreadDispatcher
{
Producer<OutputType> *producer;
bool shutdown;
std::thread thread;
static void loop(ThreadDispatcher *self)
{
Producer<OutputType> *producer = self->producer;
while (!self->shutdown)
{
producer->poll();
// some mechanism to pass the produced values back to the owner
auto delay = // assume millis for sake of argument
std::chrono::milliseconds(producer->next_poll_interval());
std::this_thread::sleep_for(delay);
}
}
public:
explicit ThreadDispatcher(Producer<OutputType> *p)
: producer(p), shutdown(false), thread(loop, this)
{
}
~ThreadDispatcher()
{
shutdown = true;
thread.join();
}
// again, the accessors you need for reading produced values go here
// Producer::output isn't synchronised, so you can't expose it directly
// to the calling thread
};
This is a quick sketch of a simple dispatcher that would run your producer in a thread, polling it however often you ask it to. Note that passing produced values back to the owner isn't shown, because I don't know how you want to access them.
Also note I haven't synchronized access to the shutdown flag - it should probably be atomic, but it might be implicitly synchronized by whatever you choose to do with the produced values.
With this organization, it'd also be easy to write a synchronous dispatcher to query multiple producers in a single thread, for example from a select/poll loop, or using something like Boost.Asio and a deadline timer per producer.

C++ multithreading, simple consumer / producer threads, LIFO, notification, counter

I am new to multi-thread programming, I want to implement the following functionality.
There are 2 threads, producer and consumer.
Consumer only processes the latest value, i.e., last in first out (LIFO).
Producer sometimes generates new value at a faster rate than consumer can
process. For example, producer may generate 2 new value in 1
milli-second, but it approximately takes consumer 5 milli-seconds to process.
If consumer receives a new value in the middle of processing an old
value, there is no need to interrupt. In other words, consumer will finish current
execution first, then start an execution on the latest value.
Here is my design process, please correct me if I am wrong.
There is no need for a queue, since only the latest value is
processed by consumer.
Is notification sent from producer being queued automatically???
I will use a counter instead.
ConsumerThread() check the counter at the end, to make sure producer
doesn't generate new value.
But what happen if producer generates a new value just before consumer
goes to sleep(), but after check the counter???
Here is some pseudo code.
boost::mutex mutex;
double x;
void ProducerThread()
{
{
boost::scoped_lock lock(mutex);
x = rand();
counter++;
}
notify(); // wake up consumer thread
}
void ConsumerThread()
{
counter = 0; // reset counter, only process the latest value
... do something which takes 5 milli-seconds ...
if (counter > 0)
{
... execute this function again, not too sure how to implement this ...
}
else
{
... what happen if producer generates a new value here??? ...
sleep();
}
}
Thanks.
If I understood your question correctly, for your particular application, the consumer only needs to process the latest available value provided by the producer. In other words, it's acceptable for values to get dropped because the consumer cannot keep up with the producer.
If that's the case, then I agree that you can get away without a queue and use a counter. However, the shared counter and value variables will be need to be accessed atomically.
You can use boost::condition_variable to signal notifications to the consumer that a new value is ready. Here is a complete example; I'll let the comments do the explaining.
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition_variable.hpp>
#include <boost/thread/locks.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
boost::mutex mutex;
boost::condition_variable condvar;
typedef boost::unique_lock<boost::mutex> LockType;
// Variables that are shared between producer and consumer.
double value = 0;
int count = 0;
void producer()
{
while (true)
{
{
// value and counter must both be updated atomically
// using a mutex lock
LockType lock(mutex);
value = std::rand();
++count;
// Notify the consumer that a new value is ready.
condvar.notify_one();
}
// Simulate exaggerated 2ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
}
void consumer()
{
// Local copies of 'count' and 'value' variables. We want to do the
// work using local copies so that they don't get clobbered by
// the producer when it updates.
int currentCount = 0;
double currentValue = 0;
while (true)
{
{
// Acquire the mutex before accessing 'count' and 'value' variables.
LockType lock(mutex); // mutex is locked while in this scope
while (count == currentCount)
{
// Wait for producer to signal that there is a new value.
// While we are waiting, Boost releases the mutex so that
// other threads may acquire it.
condvar.wait(lock);
}
// `lock` is automatically re-acquired when we come out of
// condvar.wait(lock). So it's safe to access the 'value'
// variable at this point.
currentValue = value; // Grab a copy of the latest value
// while we hold the lock.
}
// Now that we are out of the mutex lock scope, we work with our
// local copy of `value`. The producer can keep on clobbering the
// 'value' variable all it wants, but it won't affect us here
// because we are now using `currentValue`.
std::cout << "value = " << currentValue << "\n";
// Simulate exaggerated 5ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(500));
}
}
int main()
{
boost::thread c(&consumer);
boost::thread p(&producer);
c.join();
p.join();
}
ADDENDUM
I was thinking about this question recently, and realized that this solution, while it may work, is not optimal. Your producer is using all that CPU just to throw away half of the computed values.
I suggest that you reconsider your design and go with a bounded blocking queue between the producer and consumer. Such a queue should have the following characteristics:
Thread-safe
The queue has a fixed size (bounded)
If the consumer wants to pop the next item, but the queue is empty, the operation will be blocked until notified by the producer that an item is available.
The producer can check if there's room to push another item and block until the space becomes available.
With this type of queue, you can effectively throttle down the producer so that it doesn't outpace the consumer. It also ensures that the producer doesn't waste CPU resources computing values that will be thrown away.
Libraries such as TBB and PPL provide implementations of concurrent queues. If you want to attempt to roll your own using std::queue (or boost::circular_buffer) and boost::condition_variable, check out this blogger's example.
The short answer is that you're almost certainly wrong.
With a producer/consumer, you pretty much need a queue between the two threads. There are basically two alternatives: either your code won't will simply lose tasks (which usually equals not working at all) or else your producer thread will need to block for the consumer thread to be idle before it can produce an item -- which effectively translates to single threading.
For the moment, I'm going to assume that the value you get back from rand is supposed to represent the task to be executed (i.e., is the value produced by the producer and consumed by the consumer). In that case, I'd write the code something like this:
void producer() {
for (int i=0; i<100; i++)
queue.insert(random()); // queue.insert blocks if queue is full
queue.insert(-1.0); // Tell consumer to exit
}
void consumer() {
double value;
while ((value = queue.get()) != -1) // queue.get blocks if queue is empty
process(value);
}
This, relegates nearly all the interlocking to the queue. The rest of the code for both threads pretty much ignores threading issues entirely.
Implementing a pipeline is actually quite tricky if you are doing it ground-up. For example, you'd have to use condition variable to avoid the kind of race condition you described in your question, avoid busy waiting when implementing the mechanism for "waking up" the consumer etc... Even using a "queue" of just 1 element won't save you from some of these complexities.
It's usually much better to use specialized libraries that were developed and extensively tested specifically for this purpose. If you can live with Visual C++ specific solution, take a look at Parallel Patterns Library, and the concept of Pipelines.

Multithreaded data processing pipeline in Qt

What would be a good way to solve the following problem in Qt:
I have a sensor class, which continuously produces data. On this data, several operations have to be performed after another, which may take quite long. For this I have some additional classes. Basically, every time a new data item is recorded, the first class should get the data, process it, pass it to the next and so on.
sensor --> class 1 --> ... --> last class
I want to put the individual classes of the pipeline into their own threads, so that class 1 may already work on sample n+1 when class 2 is processing sample n...
Also, as the individual steps may differ greatly in their performance (e.g. the sensor is way faster than the rest) and I'm not interested in outdated data, I want class 1 (and everything after it) to always get the newest data from their predecessor, discarding old data. So, no big buffer between the steps of the pipeline.
First I thought about using Qt::QueuedConnections for signals/slots, but I guess that this would introduce a queue full of outdated samples waiting to be processed by the slower parts of the pipeline?
Just build your own one-element "queue" class. It should have:
A piece of data (or pointer to data)
A Boolean "dataReady"
A mutex
A condition variable
The "enqueue" function is just:
lock mutex
Replace data with new data
dataReady = true
signal condition variable
The "dequeue" function is just:
lock mutex
while (!dataReady) cond_wait(condition, mutex)
tmpData = data
data = NULL (or zero)
dataReady = false
unlock mutext
return tmpData
The type of the data can be a template parameter.
What you are dealing with is a Producer Consumer Pattern. You can find a general overview of that here. http://en.wikipedia.org/wiki/Producer-consumer_problem
You want to use a QMutex to limit access to the data to one thread at a time. Use the QMutexLocker to lock it.
For a VERY simplified example:
QList<quint32> data;
QMutex mutex;
// Consumer Thread calls this
int GetData()
{
quint32 result(-1); // if =1 is a valid value, you may have to return a bool and
// get the value through a reference to an int
// in the parameter list.
QMutexLocker lock(&mutex);
if (data.size())
{
result = data.front(); // or back
data.clear();
}
return result;
}
// Producer Thread calls this
void SetData(quint32 value)
{
QMutexLocker lock(&mutex);
data.push_back(value);
}