A multi-threaded writer: concurrency issues using cpp - c++

I am trying to build a library that would write to a single file, and would be able to work in a multi-threaded environment. The requirements are:
No concurrency problems will occur while writing to the file.
The order in which threads are handled is not important.
The library should be non blocking, i.e. the write and flush functions will return before the given buffer had been written.
Here's what I have so far:
int write2device(char *buffer, int length) {
Task * task = new Task(id++,buffer,length);
pthread_t * thread = new pthread_t;
Argument * arg = new Argument; //A sturct with pthread_t and task fields
arg->task = task;
arg->thread = thread;
return 0;
void wait(Argument * arg) {
//manager is a singleton class that handles the threads database and related
manager->lock(arg->task->getId()); //mutex - only one thread can write
void * deamonWrite(void * arg) {
Argument * temp = (Argument *) arg;
//critical section
//will add signal() later
return NULL;
The idea is that for every thread calling write2device I open a thread that runs deamonWrite(). This function has the structure of wait() -> critical section -> signal().
In wait, if someone else is writing I will (haven't done yet) suspend the thread so that the user won't wait till it's done writing.
I have two questions:
How do I implement the mutex (lock function)? I understand that This must be an atomic function, sense several threads trying to acquire a lock might result in chaos.
Is my general structure in the right way?
I am new to concurrency and would appreciate any thoughts on this matter - thanks!

Push the Task structures to a queue/vector and process them sequentially from a single thread instead of multiple threads for each task individually. The only place where you'll need a mutex is when pushing to and pulling from the queue. As Ben correctly noted in the comments, you should leave the implementation of thread synchronization primitives (mutex, critical section) to the OS and/or whatever system API you're allowed to use.


is there any way to wakeup multiple threads at the same time in c/c++

well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
notify_others(); // this is what I'm looking for
} else {
// this is the waiting thread
void runner() {
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
// ...
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
} else {
// worker thread
// outer loop
void looper() {
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.
It sounds like you are looking for call_once
#include <mutex>
void example()
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
if(! i_did_once)
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.

std::async analogue for specified thread

I need to work with several objects, where each operation may take a lot of time.
The processing could not be placed in a GUI (main) thread, where I start it.
I need to make all the communications with some objects on asynchronous operations, something similar to std::async with std::future or QtConcurrent::run() in my main framework (Qt 5), with QFuture, etc., but it doesn't provide thread selection. I need to work with a selected object (objects == devices) in only one additional thread always,
I need to make a universal solution and don't want to make each class thread-safe
For example, even if make a thread-safe container for QSerialPort, Serial port in Qt cannot be accessed in more than one thread:
Note: The serial port is always opened with exclusive access (that is, no other process or thread can access an already opened serial port).
Usually a communication with a device consists of transmit a command and receive an answer. I want to process each Answer exactly in the place where Request was sent and don't want to use event-driven-only logic.
So, my question.
How can the function be implemented?
MyFuture<T> fut = myAsyncStart(func, &specificLiveThread);
It is necessary that one live thread can be passed many times.
Let me answer without referencing to Qt library since I don't know its threading API.
In C++11 standard library there is no straightforward way to reuse created thread. Thread executes single function and can be only joined or detachted. However, you can implement it with producer-consumer pattern. The consumer thread needs to execute tasks (represented as std::function objects for instance) which are placed in queue by producer thread. So if I am correct you need a single threaded thread pool.
I can recommend my C++14 implementation of thread pools as tasks queues. It isn't commonly used (yet!) but it is covered with unit tests and checked with thread sanitizer multiple times. The documentation is sparse but feel free to ask anything in github issues!
Library repository: https://github.com/Ravirael/concurrentpp
And your use case:
#include <task_queues.hpp>
int main() {
// The single threaded task queue object - creates one additional thread.
concurrent::n_threaded_fifo_task_queue queue(1);
// Add tasks to queue, task is executed in created thread.
std::future<int> future_result = queue.push_with_result([] { return 4; });
// Blocks until task is completed.
int result = future_result.get();
// Executes task on the same thread as before.
std::future<int> second_future_result = queue.push_with_result([] { return 4; });
If you want to follow the Active Object approach here is an example using templates:
The WorkPackage and it's interface are just for storing functions of different return type in a vector (see later in the ActiveObject::async member function):
class IWorkPackage {
virtual void execute() = 0;
virtual ~IWorkPackage() {
template <typename R>
class WorkPackage : public IWorkPackage{
std::packaged_task<R()> task;
WorkPackage(std::packaged_task<R()> t) : task(std::move(t)) {
void execute() final {
std::future<R> get_future() {
return task.get_future();
Here's the ActiveObject class which expects your devices as a template. Furthermore it has a vector to store the method requests of the device and a thread to execute those methods one after another. Finally the async function is used to request a method call from the device:
template <typename Device>
class ActiveObject {
Device servant;
std::thread worker;
std::vector<std::unique_ptr<IWorkPackage>> work_queue;
std::atomic<bool> done;
std::mutex queue_mutex;
std::condition_variable cv;
void worker_thread() {
while(done.load() == false) {
std::unique_ptr<IWorkPackage> wp;
std::unique_lock<std::mutex> lck {queue_mutex};
cv.wait(lck, [this] {return !work_queue.empty() || done.load() == true;});
if(done.load() == true) continue;
wp = std::move(work_queue.back());
if(wp) wp->execute();
ActiveObject(): done(false) {
worker = std::thread {&ActiveObject::worker_thread, this};
~ActiveObject() {
std::unique_lock<std::mutex> lck{queue_mutex};
template<typename R, typename ...Args, typename ...Params>
std::future<R> async(R (Device::*function)(Params...), Args... args) {
std::unique_ptr<WorkPackage<R>> wp {new WorkPackage<R> {std::packaged_task<R()> { std::bind(function, &servant, args...) }}};
std::future<R> fut = wp->get_future();
std::unique_lock<std::mutex> lck{queue_mutex};
return fut;
// In case you want to call some functions directly on the device
Device* operator->() {
return &servant;
You can use it as follows:
ActiveObject<QSerialPort> ao_serial_port;
// direct call:
//async call:
std::future<void> buf_future = ao_serial_port.async(&QSerialPort::setReadBufferSize, size);
std::future<Parity> parity_future = ao_serial_port.async(&QSerialPort::parity);
// Maybe do some other work here
buf_future.get(); // wait until calculations are ready
Parity p = parity_future.get(); // blocks if result not ready yet, i.e. if method has not finished execution yet
EDIT to answer the question in the comments: The AO is mainly a concurrency pattern for multiple reader/writer. As always, its use depends on the situation. And so this pattern is commonly used in distributed systems/network applications, for example when multiple clients request a service from a server. The clients benefit from the AO pattern as they are not blocked, when waiting for the server to answer.
One reason why this pattern is not used so often in fields other then network apps might be the thread overhead. When creating a thread for every active object results in a lot of threads and thus thread contention if the number of CPUs is low and many active objects are used at once.
I can only guess why people think it is a strange issue: As you already found out it does require some additional programming. Maybe that's the reason but I'm not sure.
But I think the pattern is also very useful for other reasons and uses. As for your example, where the main thread (and also other background threads) require a service from singletons, for example some devices or hardware interfaces, which are only availabale in a low number, slow in their computations and require concurrent access, without being blocked waiting for a result.
It's Qt. It's signal-slot mechanism is thread-aware. On your secondary (non-GUI) thread, create a QObject-derived class with an execute slot. Signals connected to this slot will marshal the event to that thread.
Note that this QObject can't be a child of a GUI object, since children need to live in their parents thread, and this object explicitly does not live in the GUI thread.
You can handle the result using existing std::promise logic, just like std::future does.

Use same boost:thread variable to create multiple threads

In the following example(not all the code included just the necessary portions):
class A
void FlushToDisk(char* pData, unsigned int uiSize)
char* pTmp = new char[uiSize];
memcpy(pTmp, pData, uiSize);
m_Thread = boost::thread(&CSimSwcFastsimExporter::WriteToDisk, this, pTmp, uiSize);
void WriteToDisk(char* pData, unsigned int uiSize)
m_ExportFile.write(pData, uiSize);
delete[] pData;
boost::thread m_Thread;
boost::mutex m_Mtx
is it safe to use the m_Thread that way since the FlushToDisk method can be called while the created thread is executing the WriteToDisk method.
Or should I do something like:
m_Thread = boost::thread(&CSimSwcFastsimExporter::WriteToDisk, this, pTmp, uiSize);
Would this second solution be slower than the first?
From what i saw at http://www.boost.org/doc/libs/1_59_0/doc/html/thread/thread_management.html#thread.thread_management.tutorial
"When the boost::thread object that represents a thread of execution is destroyed the thread becomes detached. Once a thread is detached, it will continue executing until the invocation of the function or callable object supplied on construction has completed, or the program is terminated".
So in my case the threads should not be interrupted or?
Thanks in advance.
The second solution will pause the main thread to wait until the writer thread completes. You would be able to remove mutex if you go this way. You are guaranteed to have one file writing thread.
The first solution is going to allow main thread to continue, and will create an uncontrolled writing thread - serialized on the mutex. While you might believe this is better (main thread will not wait) I do not like this solution for several reasons.
First, you do not have any control over the number of created threads. If the function is called often, and the operation is slow, you can easily run out of threads! Second, and much more important, you will accumulate a backlog of detached threads waiting on mutex. If your main application decides to exit, all those threads will be silently killed and the updates will be lost.

Access pthread shared std:map without data race

My scenario is to have a main thread and tens of worker threads. Worker threads will process incoming messages from different ports.
What I want to do is to have main and worker threads share a same map, the worker threads save data into map (in different bucket). And the main thread grep the map content periodically.
The code goes like:
struct cStruct
std::map<string::string> map1;
pthread_mutex_t mutex1;
pthread_mutex_t mutex2;
int main(){
struct cStruct cStruct1;
while (condition){
pthread_t th;
int th_rc=pthread_create(&th,NULL,&task,(void *) &cStruct1);
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
// add a new bucket to the map
void* main_thread_task(void* arg){
main_thread_work = true;
// main_thread reads the std::map
main_thread_work = false;
pthread_cond_broadcast(&count_cond, &cs.mutex1)
My questions are:
For map size change, I should use lock to protect the map.
But for map with certain key update, can I let different threads modify the map concurrently? (assume no two identical buckets of map will be accessed at same time)
For the main thread greps the map, I thought of use conditional wait to hold all the worker threads while main thread is grepping the map content, and do a pthread_cond_broadcast to wake then up. The problem is that if a worker thread is updating map while main starts to work, there will be data race.
Please share some ideas to help me improve my design.
Edit 1:
Add main_thread_task().
The thing I want to avoid is worker thread arriving pthread_cond_wait "after" pthread_cond_broadcast and the logic goes wrong.
So I false the main_thread_work before main thread broadcasts workers thread.
while (coming data){
if (main_thread_work){
pthread_cond_wait(&count_cond, &cs.mutex1)
This clearly can't be right. You can't check main_thread_work unless you hold the lock that protects it. How can the call to pthread_cond_wait release a lock it doesn't hold?!
This should be something like:
void* task(void* arg){
struct cStruct cs = * (struct cStruct*) arg;
// Acquire the lock so we can check shared state
// Wait until the shared state is what we need it to be
while (main_thread_work)
pthread_cond_wait(&count_cond, &cs.mutex1)
// Do whatever it is we're supposed to do when the
// shared state is in this state
// release the lock
You should use mutex locking mechanism on each access to the map (in your case) and not only on adding a new 'bucket'. In case T1 tries to write some value to the map while T2 inserts a new bucket, the pointer/iterator which is used by T1 becomes invalid.
Regarding the pthread_cond_wait. It may do the job in case the only thing that the other threads do is just modifying the map. If they perform other calculations or process some non shared data, it is better to use the same mutex just to protect access to the map and let other threads do their job which may be at that point not related to the shared map.

Avoding multiple thread spawns in pthreads

I have an application that is parallellized using pthreads. The application has a iterative routine call and a thread spawn within the rountine (pthread_create and pthread_join) to parallelize the computation intensive section in the routine. When I use an instrumenting tool like PIN to collect the statistics the tool reports statistics for several threads(no of threads x no of iterations). I beleive it is because it is spawning new set of threads each time the routine is called.
How can I ensure that I create the thread only once and all successive calls use the threads that have been created first.
When I do the same with OpenMP and then try to collect the statistics, I see that the threads are created only once. Is it beacause of the OpenMP runtime ?
im jus giving a simplified version of the code.
int main()
//some code
do {
compute_distance(objects,clusters, &delta); //routine with pthread
} while (delta > threshold )
void compute_distance(double **objects,double *clusters, double *delta)
//some code again
//computation moved to a separate parallel routine..
for (i=0, i<nthreads;i++)
for (i=0, i<nthreads;i++)
rc = pthread_join(thread[i], &status);
I hope this clearly explains the problem.
How do we save the thread id and test if was already created?
You can make a simple thread pool implementation which creates threads and makes them sleep. Once a thread is required, instead of "pthread_create", you can ask the thread pool subsystem to pick up a thread and do the required work.. This will ensure your control over the number of threads..
An easy thing you can do with minimal code changes is to write some wrappers for pthread_create and _join. Basically you can do something like:
typedef struct {
volatile int go;
volatile int done;
pthread_t h;
void* (*fn)(void*);
void* args;
} pthread_w_t;
void* pthread_w_fn(void* args) {
pthread_w_t* p = (pthread_w_t*)args;
// just let the thread be killed at the end
for(;;) {
while (!p->go) { pthread_yield(); }; // yields are good
p->go = 0; // don't want to go again until told to
p->done = 1;
int pthread_create_w(pthread_w_t* th, pthread_attr_t* a,
void* (*fn)(void*), void* args) {
if (!th->h) {
th->done = 0;
th->go = 0;
th->fn = fn;
th->args = args;
th->done = 0; //make sure join won't return too soon
th->go = 1; //and let the wrapper function start the real thread code
int pthread_join_w(pthread_w_t*th) {
while (!th->done) { pthread_yield(); };
and then you'll have to change your calls and pthread_ts, or create some #define macros to change pthread_create to pthread_create_w etc....and you'll have to init your pthread_w_ts to zero.
Messing with those volatiles can be troublesome though. you'll probably need to spend some time getting my rough outline to actually work properly.
To ensure something that several threads might try to do only happens once, use pthread_once(). To ensure something only happens once that might be done by a single thread, just use a bool (likely one in static storage).
Honestly, it would be far easier to answer your question for everyone if you would edit your question – not comment, since that destroys formatting – to contain the real code in question, including the OpenMP pragmas.