c++11 shared_ptr using in multi-threads - c++

Recently I'm thinking a high performance event-driven multi-threads framework using c++11. And it mainly takes c++11 facilities such as std::thread, std::condition_variable, std::mutex, std::shared_ptr etc into consideration. In general, this framework has three basic components: job, worker and streamline, well, it seems to be a real factory. When user construct his business model in server end, he just needs to consider the data and its processor. Once the model is established, user only needs to construct data class inherited job and processor class inherited worker.
For example:
class Data : public job {};
class Processsor : public worker {};
When server get data, it just new a Data object through auto data = std::make_shared<Data>() in the data source callback thread and call the streamline. job_dispatch to transfer the processor and data to other thread. Of course user doesn't have to think to free memory. The streamline. job_dispatch mainly do below stuff:
void evd_thread_pool::job_dispatch(std::shared_ptr<evd_thread_job> job) {
auto task = std::make_shared<evd_task_wrap>(job);
task->worker = streamline.worker;
// worker has been registered in streamline first of all
std::unique_lock<std::mutex> lck(streamline.mutex);
The evd_task_wrap used in the job_dispatch defined as:
struct evd_task_wrap {
std::shared_ptr<evd_thread_job> order;
std::shared_ptr<evd_thread_processor> worker;
evd_task_wrap(std::shared_ptr<evd_thread_job>& o)
:order(o) {}
Finally the task_wrap will be dispatched into the processing thread through task_list that is a std::list object. And the processing thread mainly do the stuff as:
void evd_factory_impl::thread_proc() {
std::shared_ptr<evd_task_wrap> wrap = nullptr;
while (true) {
std::unique_lock<std::mutex> lck(streamline.mutex);
if (streamline.task_list.empty())
[&]()->bool{return !streamline.task_list.empty();});
wrap = std::move(streamline.task_list.front());
if (-1 == wrap->order->get_type())
But I don't know why the process will often crash in the thread_proc function. And the coredump prompt that sometimes the wrap is a empty shared_ptr or segment fault happened in _Sp_counted_ptr_inplace::_M_dispose that is called in wrap.reset(). And I supposed the shared_ptr has the thread synchronous problem in this scenario while I know the control block in shared_ptr is thread-safety. And of course the shared_ptr in job_dispatch and thread_proc is different shared_ptr object even though they point to the same storage. Does anyone has more specific suggestion on how to solve this problem? Or if there exists similar lightweight framework with automatic memory management using c++11
The example of process_task such as:
void log_handle::process_task(std::shared_ptr<crx::evd_thread_job> job) {
auto j = std::dynamic_pointer_cast<log_job>(job);
j->log->Printf(0, j->print_str.c_str());
write(STDOUT_FILENO, j->print_str.c_str(), j->print_str.size());
class log_factory {
log_factory(const std::string& name);
virtual ~log_factory();
void print_ts(const char *format, ...) { //here dispatch the job
char log_buf[4096] = {0};
va_list args;
va_start(args, format);
vsprintf(log_buf, format, args);
auto job = std::make_shared<log_job>(log_buf, &m_log);
E15_Log m_log;
std::shared_ptr<log_handle> m_log_handle;
crx::evd_thread_pool m_log_th;

I detected a problem in your code, which may or may not be related:
You use notify_all from your condition variable. That will awaken ALL threads from sleep. It is OK if you wrap your wait in a while loop, like:
while (streamline.task_list.empty())
streamline.cv.wait(lck, [&]()->bool{return !streamline.task_list.empty();});
But since you are using an if, all threads leave the wait. If you dispatch a single product and having several consumer threads, all but one thread will call wrap = std::move(streamline.task_list.front()); while the tasklist is empty and cause UB.


How to initiate a thread in a class in C++ 14?

class ThreadOne {
void RealThread();
void EnqueueJob(s_info job);
std::queue<s_info> q_jobs;
H5::H5File* targetFile = new H5::H5File("file.h5", H5F_ACC_TRUNC);
std::condition_variable cv_condition;
std::mutex m_job_q_;
ThreadOne::ThreadOne() {
void ThreadOne::RealThread() {
while (true) {
std::unique_lock<std::mutex> lock(m_job_q_);
cv_condition.wait(lock, [this]() { return !this->q_jobs.empty(); });
s_info info = std::move(q_jobs.front());
//* DO THE JOB *//
void ThreadOne::EnqueueJob(s_info job) {
std::lock_guard<std::mutex> lock(m_job_q_);
ThreadOne *tWrite = new ThreadOne();
I want to make a thread and send it a pointer of an array and its name as a struct(s_info), and then make the thread write it into a file. I think that it's better than creating a thread whenever writing is needed.
I could make a thread pool and allocate jobs to it, but it's not allowed to write the same file concurrently in my situation, I think that just making a thread will be enough and the program will still do CPU-bound jobs when writing job is in process.
To sum up, this class (hopefully) gets array pointers and their dataset names, puts them in q_jobs and RealThread writes the arrays into a file.
I referred to a C++ thread pool program and the program initiates threads like this:
std::vector<std::thread> vec_worker_threads;
vector_worker_threads.emplace_back([this]() { this->RealThread(); });
I'm new to C++ and I understand what the code above does, but I don't know how to initiate RealThread in my class without a vector. How can I make an instance of the class that has a thread(RealThread) that's already ready inside it?
From what I can gather, and as already discussed in the comments, you simply want a std::thread member for ThreadOne:
class ThreadOne {
std::thread thread;
ThreadOne::ThreadOne() {
thread = std::thread{RealThread, this};
ThreadOne::~ThreadOne() {
// (potentially) notify thread to finish first
ThreadOne tWrite;
Note that I did not start the thread in the member-initializer-list of the constructor in order to avoid the thread accessing other members that have not been initialized yet. (The default constructor of std::thread does not start any thread.)
I also wrote a destructor which will wait for the thread to finish and join it. You must always join threads before destroying the std::thread object attached to it, otherwise your program will call std::terminate and abort.
Finally, I replaced tWrite from being a pointer to being a class type directly. There is probably no reason for you to use dynamic allocation there and even if you have a need for it, you should be using
auto tWrite = std::make_unique<ThreadOne>();
or equivalent, instead, so that you are not going to rely on manually deleteing the pointer at the correct place.
Also note that your current RealThread function seems to never finish. It must return at some point, probably after receiving a notification from the main thread, otherwise thread.join() will wait forever.

std::async analogue for specified thread

I need to work with several objects, where each operation may take a lot of time.
The processing could not be placed in a GUI (main) thread, where I start it.
I need to make all the communications with some objects on asynchronous operations, something similar to std::async with std::future or QtConcurrent::run() in my main framework (Qt 5), with QFuture, etc., but it doesn't provide thread selection. I need to work with a selected object (objects == devices) in only one additional thread always,
I need to make a universal solution and don't want to make each class thread-safe
For example, even if make a thread-safe container for QSerialPort, Serial port in Qt cannot be accessed in more than one thread:
Note: The serial port is always opened with exclusive access (that is, no other process or thread can access an already opened serial port).
Usually a communication with a device consists of transmit a command and receive an answer. I want to process each Answer exactly in the place where Request was sent and don't want to use event-driven-only logic.
So, my question.
How can the function be implemented?
MyFuture<T> fut = myAsyncStart(func, &specificLiveThread);
It is necessary that one live thread can be passed many times.
Let me answer without referencing to Qt library since I don't know its threading API.
In C++11 standard library there is no straightforward way to reuse created thread. Thread executes single function and can be only joined or detachted. However, you can implement it with producer-consumer pattern. The consumer thread needs to execute tasks (represented as std::function objects for instance) which are placed in queue by producer thread. So if I am correct you need a single threaded thread pool.
I can recommend my C++14 implementation of thread pools as tasks queues. It isn't commonly used (yet!) but it is covered with unit tests and checked with thread sanitizer multiple times. The documentation is sparse but feel free to ask anything in github issues!
Library repository: https://github.com/Ravirael/concurrentpp
And your use case:
#include <task_queues.hpp>
int main() {
// The single threaded task queue object - creates one additional thread.
concurrent::n_threaded_fifo_task_queue queue(1);
// Add tasks to queue, task is executed in created thread.
std::future<int> future_result = queue.push_with_result([] { return 4; });
// Blocks until task is completed.
int result = future_result.get();
// Executes task on the same thread as before.
std::future<int> second_future_result = queue.push_with_result([] { return 4; });
If you want to follow the Active Object approach here is an example using templates:
The WorkPackage and it's interface are just for storing functions of different return type in a vector (see later in the ActiveObject::async member function):
class IWorkPackage {
virtual void execute() = 0;
virtual ~IWorkPackage() {
template <typename R>
class WorkPackage : public IWorkPackage{
std::packaged_task<R()> task;
WorkPackage(std::packaged_task<R()> t) : task(std::move(t)) {
void execute() final {
std::future<R> get_future() {
return task.get_future();
Here's the ActiveObject class which expects your devices as a template. Furthermore it has a vector to store the method requests of the device and a thread to execute those methods one after another. Finally the async function is used to request a method call from the device:
template <typename Device>
class ActiveObject {
Device servant;
std::thread worker;
std::vector<std::unique_ptr<IWorkPackage>> work_queue;
std::atomic<bool> done;
std::mutex queue_mutex;
std::condition_variable cv;
void worker_thread() {
while(done.load() == false) {
std::unique_ptr<IWorkPackage> wp;
std::unique_lock<std::mutex> lck {queue_mutex};
cv.wait(lck, [this] {return !work_queue.empty() || done.load() == true;});
if(done.load() == true) continue;
wp = std::move(work_queue.back());
if(wp) wp->execute();
ActiveObject(): done(false) {
worker = std::thread {&ActiveObject::worker_thread, this};
~ActiveObject() {
std::unique_lock<std::mutex> lck{queue_mutex};
template<typename R, typename ...Args, typename ...Params>
std::future<R> async(R (Device::*function)(Params...), Args... args) {
std::unique_ptr<WorkPackage<R>> wp {new WorkPackage<R> {std::packaged_task<R()> { std::bind(function, &servant, args...) }}};
std::future<R> fut = wp->get_future();
std::unique_lock<std::mutex> lck{queue_mutex};
return fut;
// In case you want to call some functions directly on the device
Device* operator->() {
return &servant;
You can use it as follows:
ActiveObject<QSerialPort> ao_serial_port;
// direct call:
//async call:
std::future<void> buf_future = ao_serial_port.async(&QSerialPort::setReadBufferSize, size);
std::future<Parity> parity_future = ao_serial_port.async(&QSerialPort::parity);
// Maybe do some other work here
buf_future.get(); // wait until calculations are ready
Parity p = parity_future.get(); // blocks if result not ready yet, i.e. if method has not finished execution yet
EDIT to answer the question in the comments: The AO is mainly a concurrency pattern for multiple reader/writer. As always, its use depends on the situation. And so this pattern is commonly used in distributed systems/network applications, for example when multiple clients request a service from a server. The clients benefit from the AO pattern as they are not blocked, when waiting for the server to answer.
One reason why this pattern is not used so often in fields other then network apps might be the thread overhead. When creating a thread for every active object results in a lot of threads and thus thread contention if the number of CPUs is low and many active objects are used at once.
I can only guess why people think it is a strange issue: As you already found out it does require some additional programming. Maybe that's the reason but I'm not sure.
But I think the pattern is also very useful for other reasons and uses. As for your example, where the main thread (and also other background threads) require a service from singletons, for example some devices or hardware interfaces, which are only availabale in a low number, slow in their computations and require concurrent access, without being blocked waiting for a result.
It's Qt. It's signal-slot mechanism is thread-aware. On your secondary (non-GUI) thread, create a QObject-derived class with an execute slot. Signals connected to this slot will marshal the event to that thread.
Note that this QObject can't be a child of a GUI object, since children need to live in their parents thread, and this object explicitly does not live in the GUI thread.
You can handle the result using existing std::promise logic, just like std::future does.

Synchronizing method calls on shared object from multiple threads

I am thinking about how to implement a class that will contain private data that will be eventually be modified by multiple threads through method calls. For synchronization (using the Windows API), I am planning on using a CRITICAL_SECTION object since all the threads will spawn from the same process.
Given the following design, I have a few questions.
template <typename T> class Shareable
const LPCRITICAL_SECTION sync; //Can be read and used by multiple threads
T *data;
Shareable(LPCRITICAL_SECTION cs, unsigned elems) : sync{cs}, data{new T[elems]} { }
~Shareable() { delete[] data; }
void sharedModify(unsigned index, T &datum) //<-- Can this be validly called
//by multiple threads with synchronization being implicit?
The critical section of code involving reads & writes to 'data'
// Somewhere else ...
DWORD WINAPI ThreadProc(LPVOID lpParameter)
Shareable<ActualType> *ptr = static_cast<Shareable<ActualType>*>(lpParameter);
T copyable = /* initialization */;
ptr->sharedModify(validIndex, copyable); //<-- OK, synchronized?
return 0;
The way I see it, the API calls will be conducted in the context of the current thread. That is, I assume this is the same as if I had acquired the critical section object from the pointer and called the API from within ThreadProc(). However, I am worried that if the object is created and placed in the main/initial thread, there will be something funky about the API calls.
When sharedModify() is called on the same object concurrently,
from multiple threads, will the synchronization be implicit, in the
way I described it above?
Should I instead get a pointer to the
critical section object and use that instead?
Is there some other
synchronization mechanism that is better suited to this scenario?
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
It's not implicit, it's explicit. There's only only CRITICAL_SECTION and only one thread can hold it at a time.
Should I instead get a pointer to the critical section object and use that instead?
No. There's no reason to use a pointer here.
Is there some other synchronization mechanism that is better suited to this scenario?
It's hard to say without seeing more code, but this is definitely the "default" solution. It's like a singly-linked list -- you learn it first, it always works, but it's not always the best choice.
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
Implicit from the caller's perspective, yes.
Should I instead get a pointer to the critical section object and use that instead?
No. In fact, I would suggest giving the Sharable object ownership of its own critical section instead of accepting one from the outside (and embrace RAII concepts to write safer code), eg:
template <typename T>
class Shareable
std::vector<T> data;
struct SyncLocker
SyncLocker(CRITICAL_SECTION &cs) : sync(cs) { EnterCriticalSection(&sync); }
~SyncLocker() { LeaveCriticalSection(&sync); }
Shareable(unsigned elems) : data(elems)
Shareable(const Shareable&) = delete;
Shareable(Shareable&&) = delete;
SyncLocker lock(sync);
void sharedModify(unsigned index, const T &datum)
SyncLocker lock(sync);
data[index] = datum;
Shareable& operator=(const Shareable&) = delete;
Shareable& operator=(Shareable&&) = delete;
Is there some other synchronization mechanism that is better suited to this scenario?
That depends. Will multiple threads be accessing the same index at the same time? If not, then there is not really a need for the critical section at all. One thread can safely access one index while another thread accesses a different index.
If multiple threads need to access the same index at the same time, a critical section might still not be the best choice. Locking the entire array might be a big bottleneck if you only need to lock portions of the array at a time. Things like the Interlocked API, or Slim Read/Write locks, might make more sense. It really depends on your thread designs and what you are actually trying to protect.

Static Class variable for Thread Count in C++

I am writing a thread based application in C++. The following is sample code showing how I am checking the thread count. I need to ensure that at any point in time, there are only 20 worker threads spawned from my application:
using namespace std;
class ThreadWorkerClass
static int threadCount;
void ThreadWorkerClass()
threadCount ++;
static int getThreadCount()
return threadCount;
void run()
/* The worker thread execution
* logic is to be written here */
//Reduce count by 1 as worker thread would finish here
threadCount --;
int main()
ThreadWorkerClass twObj;
//Use Boost to start Worker Thread
//Assume max 20 worker threads need to be spawned
if(ThreadWorkerClass::getThreadCount() <= 20)
boost::thread *wrkrThread = new boost::thread(
//Wait for the threads to join
//Something like (*wrkrThread).join();
return 0;
Will this design require me to take a lock on the variable threadCount? Assume that I will be running this code in a multi-processor environment.
The design is not good enough. The problem is that you exposed the constructor, so whether you like it or not, people will be able to create as many instances of your object as they want. You should do some sort of threads pooling. i.e. You have a class maintaining a set of pools and it gives out threads if available. something like
class MyThreadClass {
//the method obtaining that thread is reponsible for returning it
class ThreadPool {
//create 20 instances of your Threadclass
//This is a blocking function
MyThreadClass getInstance() {
//if a thread from the pool is free give it, else wait
So everything is maintaned internally by the pooling class. Never give control over that class to the others. you can also add query functions to the pooling class, like hasFreeThreads(), numFreeThreads() etc...
You can also enhance this design through giving out smart pointer so you can follow how many people are still owning the thread.
Making the people obtaining the thread responsible for releasing it is sometimes dangerous, as processes crashes and they never give the tread back, there are many solutions to that, the simplest one is to maintain a clock on each thread, when time runs out the thread is taken back by force.

How to use C++11 <thread> designing a system which pulls data from sources

This question comes from:
C++11 thread doesn't work with virtual member function
As suggested in a comment, my question in previous post may not the right one to ask, so here is the original question:
I want to make a capturing system, which will query a few sources in a constant/dynamic frequency (varies by sources, say 10 times / sec), and pull data to each's queues. while the sources are not fixed, they may add/remove during run time.
and there is a monitor which pulls from queues at a constant freq and display the data.
So what is the best design pattern or structure for this problem.
I'm trying to make a list for all the sources pullers, and each puller holds a thread, and a specified pulling function (somehow the pulling function may interact with the puller, say if the source is drain, it will ask to stop the pulling process on that thread.)
Unless the operation where you query a source is blocking (or you have lots of them), you don't need to use threads for this. We could start with a Producer which will work with either synchronous or asynchronous (threaded) dispatch:
template <typename OutputType>
class Producer
std::list<OutputType> output;
int poll_interval; // seconds? milliseconds?
virtual OutputType query() = 0;
virtual ~Producer();
int next_poll_interval() const { return poll_interval; }
void poll() { output.push_back(this->query()); }
std::size_t size() { return output.size(); }
// whatever accessors you need for the queue here:
// pop_front, swap entire list, etc.
Now we can derive from this Producer and just implement the query method in each subtype. You can set poll_interval in the constructor and leave it alone, or change it on every call to query. There's your general producer component, with no dependency on the dispatch mechanism.
template <typename OutputType>
class ThreadDispatcher
Producer<OutputType> *producer;
bool shutdown;
std::thread thread;
static void loop(ThreadDispatcher *self)
Producer<OutputType> *producer = self->producer;
while (!self->shutdown)
// some mechanism to pass the produced values back to the owner
auto delay = // assume millis for sake of argument
explicit ThreadDispatcher(Producer<OutputType> *p)
: producer(p), shutdown(false), thread(loop, this)
shutdown = true;
// again, the accessors you need for reading produced values go here
// Producer::output isn't synchronised, so you can't expose it directly
// to the calling thread
This is a quick sketch of a simple dispatcher that would run your producer in a thread, polling it however often you ask it to. Note that passing produced values back to the owner isn't shown, because I don't know how you want to access them.
Also note I haven't synchronized access to the shutdown flag - it should probably be atomic, but it might be implicitly synchronized by whatever you choose to do with the produced values.
With this organization, it'd also be easy to write a synchronous dispatcher to query multiple producers in a single thread, for example from a select/poll loop, or using something like Boost.Asio and a deadline timer per producer.