I'm writing a multithreaded app.
I was using the boost::interprocess classes (version 1.36.0)
Essentially, I have worker threads that need to be notified when work is available for them to do.
I tried both the "semaphore" and "condition" approaches.
In both cases, the CSwitch (context switch) for the worker threads seemed very high, like 600 switches per second.
I had a gander at the code and it seems like it just checks a flag (atomically using a mutex) and then yields the timeslice before trying again next time.
I was expecting the code to use WaitForSingleObject or something.
Ironically, this was exactly how I was doing it before deciding to do it "properly" and use Boost! (i.e. using a mutex to check the status of a flag regularly). The only difference was, in my approach I was sleeping like 50ms between checks so I didn't have the high CSwitch problem (and yes it's fine for me that work won't start for up to 50ms).
Several questions:
Does this "high" CSwitch value matter?
Would this occur if the boost library was using CRITICAL_SECTIONS instead of semaphores (I don't care about inter-process syncing - all threads are in same process)?
Would this occur if boost was using WaitForSingleObject?
Is there another approach in the Boost libs that uses the aforementioned Win32 wait methods (WaitForXXX) which I assume won't suffer from this CSwitch issue.
Update: Here is a pseudo code sample. I can't add the real code because it would be a bit complex. But this is pretty much what I'm doing. This just starts a thread to do a one-off asynchronous activity.
NOTE: These are just illustrations! There is loads missing from this sample, e.g. if you call injectWork() before the thread has hit the "wait" it just won't work. I just wanted to illustrate my use of boost.
The usage is something like:
int main(int argc, char** args)
MyWorkerThread thread;
thread.injectWork("hello world");
Here is the example using boost.
class MyWorkerThread
/// Do work asynchronously
void injectWork(string blah)
this->blah = blah;
// Notify semaphore
void startThread()
// Start the thread (Pseudo code)
CreateThread(threadHelper, this, ...);
static void threadHelper(void* param)
/// The thread method
void thread()
// Wait for semaphore to be invoked
cout << blah << endl;
string blah;
boost::interprocess::interprocess_semaphore* semaphore;
And here was my "naive" polling code:
class MyWorkerThread_NaivePolling
workReady = false;
/// Do work asynchronously
void injectWork(string blah)
this->blah = blah;
this->workReady = true;
void startThread()
// Start the thread (Pseudo code)
CreateThread(threadHelper, this, ...);
/// Uses Win32 CriticalSection
class MyCriticalSection
void lock();
void unlock();
MyCriticalSection section;
static void threadHelper(void* param)
/// The thread method
void thread()
while (true)
bool myWorkReady = false;
string myBlah;
// See if work set
if (this->workReady)
myWorkReady = true;
myBlah = this->blah;
if (myWorkReady)
cout << blah << endl;
// No work so sleep for a while
string blah;
bool workReady;

On non-POSIX systems, it seems that interprocess_condition is emulated using a loop, as you describe in your in question. And interprocess_semaphore is emulated with a mutex and an interprocess_condition, so wait()-ing ends up in the same loop.
Since you mention that you don't need the interprocess synchronization, you should look at Boost.Thread, which offers a portable implementation of condition variables. Amusingly, it seems that it is implemented on Windows in the "classical" way, using a... Semaphore.

If you do not mind a Windows specific (newer versions on windows), check the link for light weight condition variables CONDITION_VARIABLE (like critical sections):


Running a task in a separate thread which shold be able to stop on request

I am trying to design an infinite (or a user-defined length) loop that would be independent of my GUI process. I know how to start that loop in a separate thread, so the GUI process is not blocked. However, I would like to have a possibility to interrupt the loop at a press of a button. The complete scenario may look like this:
GUI::startButton->myClass::runLoop... ---> starts a loop in a new thread
GUI::stopButton->myClass::terminateLoop ---> should be able to interrupt the started loop
The problem I have is figuring out how to provide the stop functionality. I am sure there is a way to achieve this in C++. I was looking at a number of multithreading related posts and articles, as well as some lectures on how to use async and futures. Most of the examples did not fit my intended use and/or were too complex for my current state of skills.
MyClass *myClass = new MyClass;
void MyWidget::on_pushButton_start_clicked()
void MyWidget::on_pushButton_stop_clicked()
myClass->stop(); // TBD: how to implement the stop functionality?
std::thread MyClass::start()
return std::thread(&MyClass::runLoop, this);
void MyClass::runLoop()
for(int i = 0; i < 999999; i++)
// do some work
As far as i know, there is no standard way to terminate a STL thread. And even if possible, this is not advisable since it can leave your application in an undefined state.
It would be better to add a check to your MyClass::runLoop method that stops execution in a controlled way as soon as an external condition is fulfilled. This might, for example, be a control variable like this:
std::thread MyClass::start()
_threadRunning = true;
if(_thread.joinable() == true) // If thr thread is joinable...
// Join before (re)starting the thread
_thread = std::thread(&MyClass::runLoop, this);
return _thread;
void MyClass::runLoop()
for(int i = 0; i < MAX_ITERATION_COUNT; i++)
if(_threadRunning == false) { break; }
// do some work
Then you can end the thread with:
void MyClass::stopLoop()
_threadRunning = false;
_threadRunning would here be a member variable of type bool or, if your architecture for some reason has non-atomic bools, std::atomic<bool>.
With x86, x86_64, ARM and ARM64, however, you should be fine without atomic bools. It, however is advised to use them. Also to hint at the fact that the variable is used in a multithreading context.
Possible MyClass.h:
MyClass() : _threadRunning(false) {}
std::thread start();
std::thread runLoop();
std::thread stopLoop();
std::thread _thread;
std::atomic<bool> _threadRunning;
It might be important to note that, depending on the code in your loop, it might take a while before the thread really stops.
Therefore it might be wise to std::thread::join the thread before restarting it, to make sure only one thread runs at a time.

c++ thread worker failure under high load

I have been working on a idea for a system where I can have many workers that are triggered on a regular basis by a a central timer class. The part I'm concerned about here is a TriggeredWorker which, in a loop, uses the mutex & conditionVariable approach to wait to be told to do work. It has a method trigger that is called (by a different thread) that triggers work to be done. It is an abstract class that has to be subclassed for the actual work method to be implemented.
I have a test that shows that this mechanism works. However, as I increase the load by reducing the trigger interval, the test starts to fail. When I delay 20 microseconds between triggers, the test is 100% reliable. As I reduce down to 1 microsecond, I start to get failures in that the count of work performed reduces from 1000 (expected) to values like 986, 933, 999 etc..
My questions are: (1) what is it that is going wrong and how can I capture what is going wrong so I can report it or do something about it? And, (2) is there some better approach that I could use that would be better? I have to admit that my experience with c++ is limited to the last 3 months, although I have worked with other languages for several years.
Many thanks for reading...
Here are the key bits of code:
Triggered worker header file:
#include <thread>
#include <plog/Log.h>
class TriggeredWorker {
std::mutex mutex_;
std::condition_variable condVar_;
std::atomic<bool> running_{false};
std::atomic<bool> ready_{false};
void workLoop();
virtual void work() {};
void start();
void stop();
void trigger();
Triggered worker implementation:
#include "TriggeredWorker.h"
void TriggeredWorker::workLoop() {
PLOGD << "workLoop started...";
while(true) {
std::unique_lock<std::mutex> lock(mutex_);
condVar_.wait(lock, [this]{
bool ready = this->ready_;
bool running = this->running_;
return ready | !running; });
this->ready_ = false;
if (!this->running_) {
PLOGD << "Calling work()...";
PLOGD << "Worker thread completed.";
void TriggeredWorker::start() {
PLOGD << "Worker start...";
this->running_ = true;
auto thread = std::thread(&TriggeredWorker::workLoop, this);
void TriggeredWorker::stop() {
PLOGD << "Worker stop.";
this->running_ = false;
void TriggeredWorker::trigger() {
PLOGD << "Trigger.";
std::unique_lock<std::mutex> lock(mutex_);
ready_ = true;
and the test:
#include "catch.hpp"
#include "TriggeredWorker.h"
#include <thread>
TEST_CASE("Simple worker performs work when triggered") {
static std::atomic<int> twt_count{0};
class SimpleTriggeredWorker : public TriggeredWorker {
void work() override {
PLOGD << "Incrementing counter.";
SimpleTriggeredWorker worker;
for (int i = 0; i < 1000; i++) {
CHECK(twt_count == 1000);
What happens when worker.trigger() is called twice before workLoop acquires the lock? You loose one of those "triggers". Smaller time gap means higher probability of test failure, because of higher probability of multiple consecutive worker.trigger() calls before workLoop wakes up. Note that there's nothing that guarantees that workLoop will acquire the lock after worker.trigger() but before another worker.trigger() happens, even when those calls happen one after another (i.e. not in parallel). This is governed by the OS scheduler and we have no control over it.
Anyway the core problem is that setting ready_ = true twice looses information. Unlike incrementing an integer twice. And so the simplest solution is to replace bool with int and do inc/dec with == 0 checks. This solution is also known as semaphore. More advanced (potentially better, especially when you need to pass some data to the worker) approach is to use a (bounded?) thread safe queue. That depends on what exactly you are trying to achieve.
BTW 1: all your reads and updates, except for stop() function (and start() but this isn't really relevant), happen under the lock. I suggest you fix stop() to be under lock as well (since it is rarely called anyway) and turn atomics into non-atomics. There's an unnecessary overhead of atomics at the moment.
BTW 2: I suggest not using thread.detach(). You should store the std::thread object on TriggeredWorker and add destructor that does stop with join. These are not independent beings and so without detach() you make your code safer (one should never die without the other).

avoid busy waiting and mode switches between realtime and non realtime threading

I have the following problem:
we do have a controller implemented with ros_control that runs on a Real Time, Xenomai linux-patched system. The control loop is executed by iteratively calling an update function. I need to communicate some of the internal state of the controller, and for this task I'm using LCM, developed in MIT. Regardless of the internal behaviour of LCM, the publication method is breaking the real time, therefore I've implemented in C++11 a publication loop running on a separated thread. But the loop it is gonna publish at infinite frequency if I don't synchronize the secondary thread with the controller. Therefore I'm using also condition variables.
Here's an example for the controller:
MyClass mc;
// This is called just once
void init(){
// Control loop function (e.g., called every 5 ms in RT)
void update(const ros::Time& time, const ros::Duration& period) {
double value = time.toSec();
And for the class which is trying to publish:
double myvalue;
std::mutex mutex;
std::condition_variable cond;
bool go = true;
void MyClass::init(){
std::thread thread(&MyClass::body, this);
void MyClass::setValue(double value){
myvalue = value;
std::lock_guard<std::mutex> lk(mutex);
go = true;
void MyClass::body() {
while(true) {
cond.wait(lk, [this] {return go;});
publish(myvalue); // the dangerous call
go = false;
This code produces mode switches (i.e., is breaking real time). Probably because of the lock on the condition variable, which I use to synchronize the secondary thread with the main controller and is in contention with the thread. If I do something like this:
void MyClass::body() {
while(true) {
go = false;
void MyClass::setValue(double value){
myvalue = value;
go = true;
I would not produce mode switches, but also it would be unsafe and most of all I would have busy waiting for the secondary thread.
Is there a way to have non-blocking synch between main thread and secondary thread (i.e., having body doing something only when setValue is called) which is also non-busy waiting?
Use a lock free data structure.
In your case here you don't even need a data structure, just use an atomic for go. No locks necessary. You might look into using a semaphore instead of a condition variable to avoid the now-unused lock too. And if you need a semaphore to avoid using a lock you're going to end up using your base OS semaphores, not C++11 since C++11 doesn't even have them.
This isn't perfect, but it should reduce your busy-wait frequency with only occasional loss of responsiveness.
The idea is to use a naked condition variable wake up while passing a message through an atomic.
template<class T>
struct non_blocking_poke {
std::atomic<T> message;
std::atomic<bool> active;
std::mutex m;
std::condition_variable v;
void poke(T t) {
message = t;
active = true;
template<class Rep, class Period>
T wait_for_poke(const std::chrono::duration<Rep, Period>& busy_time) {
std::unique_lock<std::mutex> l(m);
while( !v.wait_for(l, busy_time, [&]{ return active; } ))
active = false;
return message;
The waiting thread wakes up every busy_time to see if it missed a message. However, it will usually get a message faster than that (there is a race condition where it misses a message). In addition, multiple messages can be sent without the reliever getting them. However, if a message is sent, within about 1 second the receiver will get that message or a later message.
non_blocking_poke<double> poker;
// in realtime thread:
// in non-realtime thread:
while(true) {
using namespace std::literals::chrono_literals;
double d = poker.wait_for_poke( 1s );
std::cout << d << '\n';
In an industrial quality solution, you'll also want an abort flag or message to stop the loops.

Is possible to get a thread-locking mechanism in C++ with a std::atomic_flag?

Using MS Visual C++2012
A class has a member of type std::atomic_flag
class A {
std::atomic_flag lockFlag;
A () { std::atomic_flag_clear (&lockFlag); }
There is an object of type A
A object;
who can be accessed by two (Boost) threads
void thr1(A* objPtr) { ... }
void thr2(A* objPtr) { ... }
The idea is wait the thread if the object is being accessed by the other thread.
The question is: do it is possible construct such mechanism with an atomic_flag object? Not to say that for the moment, I want some lightweight that a boost::mutex.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
I've tryed in each thread some as:
thr1 (A* objPtr) {
while (std::atomic_flag_test_and_set_explicit (&objPtr->lockFlag, std::memory_order_acquire)) {
... /* Zone to portect */
std::atomic_flag_clear_explicit (&objPtr->lockFlag, std::memory_order_release);
... /* the process continues */
But with no success, because the second thread hangs. In fact, I don't completely understand the mechanism involved in the atomic_flag_test_and_set_explicit function. Neither if such function returns inmediately or can delay until the flag can be locked.
Also it is a mistery to me how to get a lock mechanism with such a function who always set the value, and return the previous value. with no option to only read the actual setting.
Any suggestion are welcome.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
Such a zone is known as the critical section. The simplest way to work with a critical section is to lock by mutual exclusion.
The mutex solution suggested is indeed the way to go, unless you can prove that this is a hotspot and the lock contention is a performance problem. Lock-free programming using just atomic and intrinsics is enormously complex and cannot be recommended at this level.
Here's a simple example showing how you could do this (live on http://liveworkspace.org/code/6af945eda5132a5221db823fa6bde49a):
#include <iostream>
#include <thread>
#include <mutex>
struct A
std::mutex mux;
int x;
A() : x(0) {}
void threadf(A* data)
for(int i=0; i<10; ++i)
std::lock_guard<std::mutex> lock(data->mux);
int main(int argc, const char *argv[])
A instance;
auto t1 = std::thread(threadf, &instance);
auto t2 = std::thread(threadf, &instance);
std::cout << instance.x << std::endl;
return 0;
It looks like you're trying to write a spinlock. Yes, you can do that with std::atomic_flag, but you are better off using std::mutex instead. Don't use atomics unless you really know what you're doing.
To actually answer the question asked: Yes, you can use std::atomic_flag to create a thread locking object called a spinlock.
#include <atomic>
class atomic_lock
void lock()
while ( lock_.test_and_set() ) { } // Spin until the lock is acquired.
void unlock()
std::atomic_flag lock_;

Simple C++ Threading

I am trying to create a thread in C++ (Win32) to run a simple method. I'm new to C++ threading, but very familiar with threading in C#. Here is some pseudo-code of what I am trying to do:
static void MyMethod(int data)
void RunStuff(int data)
//long running operation here
I want to to call RunStuff from MyMethod without it blocking. What would be the simplest way of running RunStuff on a separate thread?
Edit: I should also mention that I want to keep dependencies to a minimum. (No MFC... etc)
#include <boost/thread.hpp>
static boost::thread runStuffThread;
static void MyMethod(int data)
runStuffThread = boost::thread(boost::bind(RunStuff, data));
// elsewhere...
runStuffThread.join(); //blocks
C++11 available with more recent compilers such as Visual Studio 2013 has threads as part of the language along with quite a few other nice bits and pieces such as lambdas.
The include file threads provides the thread class which is a set of templates. The thread functionality is in the std:: namespace. Some thread synchronization functions use std::this_thread as a namespace (see Why the std::this_thread namespace? for a bit of explanation).
The following console application example using Visual Studio 2013 demonstrates some of the thread functionality of C++11 including the use of a lambda (see What is a lambda expression in C++11?). Notice that the functions used for thread sleep, such as std::this_thread::sleep_for(), uses duration from std::chrono.
// threading.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
int funThread(const char *pName, const int nTimes, std::mutex *myMutex)
// loop the specified number of times each time waiting a second.
// we are using this mutex, which is shared by the threads to
// synchronize and allow only one thread at a time to to output.
for (int i = 0; i < nTimes; i++) {
std::cout << "thread " << pName << " i = " << i << std::endl;
// delay this thread that is running for a second.
// the this_thread construct allows us access to several different
// functions such as sleep_for() and yield(). we do the sleep
// before doing the unlock() to demo how the lock/unlock works.
return 0;
int _tmain(int argc, _TCHAR* argv[])
// create a mutex which we are going to use to synchronize output
// between the two threads.
std::mutex myMutex;
// create and start two threads each with a different name and a
// different number of iterations. we provide the mutex we are using
// to synchronize the two threads.
std::thread myThread1(funThread, "one", 5, &myMutex);
std::thread myThread2(funThread, "two", 15, &myMutex);
// wait for our two threads to finish.
auto fun = [](int x) {for (int i = 0; i < x; i++) { std::cout << "lambda thread " << i << std::endl; std::this_thread::sleep_for(std::chrono::seconds(1)); } };
// create a thread from the lambda above requesting three iterations.
std::thread xThread(fun, 3);
return 0;
CreateThread (Win32) and AfxBeginThread (MFC) are two ways to do it.
Either way, your MyMethod signature would need to change a bit.
Edit: as noted in the comments and by other respondents, CreateThread can be bad.
_beginthread and _beginthreadex are the C runtime library functions, and according to the docs are equivalent to System::Threading::Thread::Start
Consider using the Win32 thread pool instead of spinning up new threads for work items. Spinning up new threads is wasteful - each thread gets 1 MB of reserved address space for its stack by default, runs the system's thread startup code, causes notifications to be delivered to nearly every DLL in your process, and creates another kernel object. Thread pools enable you to reuse threads for background tasks quickly and efficiently, and will grow or shrink based on how many tasks you submit. In general, consider spinning up dedicated threads for never-ending background tasks and use the threadpool for everything else.
Before Vista, you can use QueueUserWorkItem. On Vista, the new thread pool API's are more reliable and offer a few more advanced options. Each will cause your background code to start running on some thread pool thread.
// Vista
VOID CALLBACK MyWorkerFunction(PTP_CALLBACK_INSTANCE instance, PVOID context);
// Returns true on success.
TrySubmitThreadpoolCallback(MyWorkerFunction, context, NULL);
// Pre-Vista
DWORD WINAPI MyWorkerFunction(PVOID context);
// Returns true on success
QueueUserWorkItem(MyWorkerFunction, context, WT_EXECUTEDEFAULT);
Simple threading in C++ is a contradiction in terms!
Check out boost threads for the closest thing to a simple approach available today.
For a minimal answer (which will not actually provide you with all the things you need for synchronization, but answers your question literally) see:
Also static means something different in C++.
Is this safe:
unsigned __stdcall myThread(void *ArgList) {
//Do stuff here
_beginthread(myThread, 0, &data);
Do I need to do anything to release the memory (like CloseHandle) after this call?
Another alternative is pthreads - they work on both windows and linux!
CreateThread (Win32) and AfxBeginThread (MFC) are two ways to do it.
Be careful to use _beginthread if you need to use the C run-time library (CRT) though.
For win32 only and without additional libraries you can use
CreateThread function
If you really don't want to use third party libs (I would recommend boost::thread as explained in the other anwsers), you need to use the Win32API:
static void MyMethod(int data)
int data = 3;
HANDLE hThread = ::CreateThread(NULL,
// you can do whatever you want here
::WaitForSingleObject(hThread, INFINITE);
static DWORD WINAPI RunStuff(LPVOID param)
int data = reinterpret_cast<int>(param);
//long running operation here
return 0;
There exists many open-source cross-platform C++ threading libraries you could use:
Among them are:
TBB Boost thread
The way you describe it, I think either Intel TBB or Boost thread will be fine.
Intel TBB example:
class RunStuff
// TBB mandates that you supply () operator
void operator ()()
// long running operation here
// Here's sample code to instantiate it
#include <tbb/tbb_thread.h>
tbb::tbb_thread my_thread(RunStuff);
Boost thread example:
Qt example:
(I dont think this suits your needs, but just included here for completeness; you have to inherit QThread, implement void run(), and call QThread::start()):
If you only program on Windows and dont care about crossplatform, perhaps you could use Windows thread directly: