C++ : undetectable change of variable, artificial "volatile" mutex not works

How it is possible, that is such a line just after if statement with unequal, variables are already equal in pull() method? I have already added Mutex variable, but it not helped.
int fQ::pull(void){ // pull element from the queue
MutexF = 1;
if (last != first){
MutexF = 0;
return 0;
MutexF = 0;
return 1;
STL containers are to heavy for me, I preparing it for a tiny MCU, that's why, I tried to avoid all this complex staff like (std::mutex, std::atomic, std::mutex etc.). Those multitheading is needed only for test purpose, instead of testing with the tiny MCU's interrupts, for a while. I supposed not use any stl/thread libraries at all
first, you'd better use std::atomic and/or std::mutex for synchronization purposes. At least use std::flag. volatile has issues in general - it isn't suited for atomic operations - it has a different purpose altogether.
Second in your code, there is a bug and I don't know to solve it properly with volatile.
MutexF = 1;
Imagine, someone set MutexF to 0, then two threads simultaneously exited the while loop before setting MutexF=1. What do you think is gonna happen?
Perhaps you can synchronize two thread - one for pull and one for push in this manner - but you'd better abandon such approach.

#include <mutex> // std::mutex
typedef void(*FunctionPointer)(void);
class fQ {
std::atomic<int> first;
std::atomic<int> last;
FunctionPointer * fQueue;
int lengthQ;
std::mutex mtx;
fQ(int sizeQ);
int push(FunctionPointer);
int pull(void);
fQ::fQ(int sizeQ){ // initialization of Queue
fQueue = new FunctionPointer[sizeQ];
last = 0;
first = 0;
lengthQ = sizeQ;
fQ::~fQ(){ // initialization of Queue
delete [] fQueue;
int fQ::push(FunctionPointer pointerF){ // push element from the queue
if ((last+1)%lengthQ == first){
return 1;
fQueue[last++] = pointerF;
last = last%lengthQ;
return 0;
int fQ::pull(void){ // pull element from the queue
if (last != first){
first = first%lengthQ;
return 0;
return 1;


Deadlock occuring after multiple iterations of queries in threads (multithreading)

I encounter deadlocks while executing the code snippet below as a thread.
void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query,
Queue<std::string, QUEUE_SIZE>& output_queue,
std::vector<Object>& pgs,
bool* pgs_executed, // Initialized to array false-values
std::mutex& pgs_executed_mutex,
std::atomic<uint32_t>& atomic_pgs_finished
bool iter_bool = false;
std::tuple<int64_t, int64_t, uint8_t> next_query;
std::string output = "";
int64_t lower, upper;
while(true) {
// Get next query
next_query = query.pop_front();
// Stop Condition reached terminate thread
if (std::get<2>(next_query) == uint8_t(-1)) break;
//Set query params
lower = std::get<0>(next_query);
upper = std::get<1>(next_query);
// Scan bool array
for (uint32_t i = 0; i < pgs.size(); i++){
// first lock for reading
if (pgs_executed[i] == iter_bool) {
pgs_executed[i] = !pgs_executed[i];
// Unlock and execute the query
output = pgs.at(i).get_result(lower, upper);
// If query yielded a result, then add it to the output
if (output.length() != 0) {
// Inform main thread in case of last result
if (++atomic_pgs_finished >= pgs.size()) {
} else {
//finally flip for next query
iter_bool = !iter_bool;
I have a vector of objects containing information which can be queried (similar to as a table in a database). Each thread can access the objects and all of them iterate the vector ONCE to query the objects which have not been queried and return results, if any.
In the next query it goes through the vector again, and so on... I use the bool* array to denote the entries which are currently queried, so that the processes can synchronize and determine which query should be executed next.
If all have been executed, the last thread having possibly the last results will also return an identifier for the main thread in order to inform that all objects have been queried.
My Question:
Regarding the bool* as well as atomic-pgs_finished, can there be a scenario, in-which a deadlock can occur. As far as i can think, i cannot see a deadlock in this snippet. However, executing this and running this for a while results into a deadlock.
I am seriously considering that a bit (byte?) has randomly flipped causing this deadlock (on ECC-RAM), so that 1 or more objects actually were not executed. Is this even possible?
Maybe another implementation could help?
Edit, Implementation of the Queue:
template<class T, size_t MaxQueueSize>
class Queue
std::condition_variable consumer_, producer_;
std::mutex mutex_;
using unique_lock = std::unique_lock<std::mutex>;
std::queue<T> queue_;
template<class U>
void push_back(U&& item) {
unique_lock lock(mutex_);
while(MaxQueueSize == queue_.size())
T pop_front() {
unique_lock lock(mutex_);
auto full = MaxQueueSize == queue_.size();
auto item = queue_.front();
return item;
Thanks to #Ulrich Eckhardt (,
#PaulMcKenzie and all the other comments, thank you for the brainstorming!). I probably have found the cause of the deadlock. I tried to reduce this example even more and thought on removing atomic_pgs_finished, a variable indicating whether all pgs have been queried. Interestingly: ++atomic_pgs_finished >= pgs.size() returns not only once but multiple times true, so that multiple threads are in this specific if-clause.
I simply fixed it by using another mutex around this if-clause. Maybe someone can explain why ++atomic_pgs_finished >= pgs.size() is not atomic and causes true for multiple threads.
Below i have updated the code (mostly the same as in the question) with comments, so that it might be more understandable.
void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query, // The input queue containing queries, in my case triples
Queue<std::string, QUEUE_SIZE>& output_queue, // The Output Queue of results
std::vector<Object>& pgs, // Objects which should be queried
bool* pgs_executed, // Initialized to an array of false-values
std::mutex& pgs_executed_mutex, // a mutex, protecting pgs_executed
std::atomic<uint32_t>& atomic_pgs_finished // atomic counter to count how many have been executed (to send a end signal)
// Initialize variables
std::tuple<int64_t, int64_t, uint8_t> next_query;
std::string output = "";
int64_t lower, upper;
// Set the first iteration to false for the very first query
// This flips on the second iteration to reuse pgs_executed with true values and so on...
bool iter_bool = false;
// Execute as long as valid queries are received
while(true) {
// Get next query
next_query = query.pop_front();
// Stop Condition reached terminate thread
if (std::get<2>(next_query) == uint8_t(-1)) break;
// "Parse query" to query the objects in pgs
lower = std::get<0>(next_query);
upper = std::get<1>(next_query);
// Now iterate through the pgs and pgs_executed (once)
for (uint32_t i = 0; i < pgs.size(); i++){
// Lock to read and write into pgs_executed
if (pgs_executed[i] == iter_bool) {
pgs_executed[i] = !pgs_executed[i];
// Unlock since we now execute the query on the object (which was not queried before)
// Query Execution
output = pgs.at(i).get_result(lower, upper);
// If the query yielded a result, then add it to the output for the main thread to read
if (output.length() != 0) {
// Here i would like to inform the main thread that we exexuted the query on
// every object in pgs, so that it should no longer wait for other results
if (++atomic_pgs_finished >= pgs.size()) {
// In this if-clause multiple threads are present at once!
// This is not intended and causes a deadlock, push_back-ing
// multiple times "LAST_RESULT_IDENTIFIER" in-which the main-thread
// assumed that a query has finished. The main thread then simply added the next query, while the
// previous one was not finished causing threads to race each other on two queries simultaneously
// and not having the same iter_bool!
} else {
// This case happens when the next element in the list was already executed (by another process),
// simply unlock pgs_executed and continue with the next element in pgs
continue; // This is uneccessary and could be removed
//finally flip for the next query in order to reuse bool* (which now has trues if a second query is incoming)
iter_bool = !iter_bool;

c++ multithreading return value

I am using 3 threads to chunk a for loop, and the 'data' is a global array so I want to lock that part in 'calculateAll' function,
std::vector< int > calculateAll(int ***data,std::vector<LineIndex> indexList)
std::vector<int> v_a=std::vector<int>();
for(int a=0;a<indexList.size();a++)
v_b.push_back(/*something related with data*/);
return v_a;
for(int i=0;i<3;i++)
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
for (int i = 0; i < 3; ++i)
my question is, how can I get the return value which is a vector from each thread and then combine them together? The reason I want to do that is because if I declare ’v_a‘ as a global vector, when each thread trys to push_back their value in this vector 'v_a' it will be some crash(or not?).So I am thinking to declare a vector for each of the thread, then combine them into a new vector for further use( like I do it without thread ).
Or is there some better methods to deal with that concurrency problem? The order of 'v_a' does not matter.
I appreciate for any suggestion.
First of all, rather than the explicit lock() and unlock() seen in your code, always use std::lock_guard where possible. Secondly, it would be better you use std::futures to this thing. Launch each thread with std::async then get the results in another loop while aggregating the results at the same time. Like this:
using Vector = std::vector<int>;
using Future = std::future<Vector>;
std::vector<Future> futures;
for(int i=0;i<3;i++)
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
auto fut = std::async(std::launch::async, calculateAll, data, indexList, s, e);
futures.push_back( std::move(fut) );
//Combine the results
std::vector<int> result;
for(auto& fut : futures){ //Iterate through in the order the future was created
auto vec = fut.get(); //Get the result of the future
result.insert(result.end(), vec.begin(), vec.end()); //append it to results in order
Here is a minimal, complete and working example based on your code - which demonstrates what I mean: Live On Coliru
Create a structure with a vector and a lock. Pass an instance of that structure, with the lock pre-locked, to each thread as it starts. Wait for all locks to become unlocked. Each thread then does its work and unlocks its lock when done.

Decrement atomic counter - but <only> under a condition

I want to realize something on this lines:
inline void DecrementPendingWorkItems()
if(this->pendingWorkItems != 0) //make sure we don't underflow and get a very high number
How can I do this so that both operations are atomic as a block, without using locks ?
You can just check the result of InterlockedDecrement() and if it happens to be negative (or <= 0 if that's more desirable) undo the decrement by calling InterlockedIncrement(). In otherwise proper code that should be just fine.
The simplest solution is just to use a mutex around the entire section
(and for all other accesses to this->pendingWorkItems). If for some
reason this isn't acceptable, then you'll probably need compare and
void decrementPendingWorkItems()
int count = std::atomic_load( &pendingWorkItems );
while ( count != 0
&& ! std::atomic_compare_exchange_weak(
&pendingWorkItems, &count, count - 1 ) ) {
(This supposes that pendingWorkItems has type std::atomic_int.)
There is such a thing called "SpinLock". This is a very lightweight synchronisation.
This is the idea:
// This lock should be used only when operation with protected resource
// is very short like several comparisons or assignments.
class SpinLock
__forceinline SpinLock() { body = 0; }
__forceinline void Lock()
int spin = 15;
for(;;) {
if(!InterlockedExchange(&body, 1)) break;
if(--spin == 0) { Sleep(10); spin = 29; }
__forceinline void Unlock() { InterlockedExchange(&body, 0); }
long body;
Actual numbers in the sample are not important. This lock is extremely efficient.
You can use InterlockedCompareExchange in a loop:
inline void DecrementPendingWorkItems() {
LONG old_items = this->pendingWorkingItems;
LONG items;
while ((items = old_items) > 0) {
old_items = ::InterlockedCompareExchange(&this->pendingWorkItems,
items-1, items);
if (old_items == items) break;
What the InterlockedCompareExchange function is doing is:
if pendingWorkItems matches items, then
set the value to items-1 and return items
else return pendingWorkItems
This is done atomically, and is also called a compare and swap.
Use an atomic CAS.
You can make it lock free, but not wait free.
As Kirill suggests this is similar to a spin lock in your case.
I think this does what you need, but I'd recommend thinking through all the possibilities before going ahead and using it as I have not tested it at all:
inline bool
InterlockedSetIfEqual(volatile LONG* dest, LONG exchange, LONG comperand)
return comperand == ::InterlockedCompareExchange(dest, exchange, comperand);
inline bool InterlockedDecrementNotZero(volatile LONG* ptr)
LONG comperand;
LONG exchange;
do {
comperand = *ptr;
exchange = comperand-1;
if (comperand <= 0) {
return false;
} while (!InterlockedSetIfEqual(ptr,exchange,comperand));
return true;
There remains the question as to why your pending work items should ever go below zero. You should really ensure that the number of increments matches the number of decrements and all will be fine. I'd perhaps add an assert or exception if this constraint is violated.

Multi-Threaded Binary Tree Algorithm

So I've tried one method that locks each node as it looks at it, but this requires ALOT of locking and unlocking... which of course requires quite a bit of overhead. I was wondering if anyone knew of a more efficient algorithm. Here is my first attempt:
typedef struct _treenode{
struct _treenode *leftNode;
struct _treenode *rightNode;
int32_t data;
pthread_mutex_t mutex;
pthread_mutex_t _initMutex = PTHREAD_MUTEX_INITIALIZER;
int32_t insertNode(TreeNode **_trunk, int32_t data){
TreeNode **current;
pthread_mutex_t *parentMutex = NULL, *currentMutex = &_initMutex;
if(_trunk != NULL){
current = _trunk;
while(*current != NULL){
currentMutex = &(*current)->mutex;
if((*current)->data < data){
if(parentMutex != NULL)
pthreadMutex = currentMutex;
current = &(*current)->rightNode;
}else if((*current)->data > data){
if(parentMutex != NULL)
parentMutex = currentMutex;
current = &(*current)->leftNode;
if(parentMutex != NULL)
return 0;
*current = malloc(sizeof(TreeNode));
pthread_mutex_init(&(*current)->mutex, NULL);
(*current)->leftNode = NULL;
(*current)->rightNode = NULL;
(*current)->data = data;
return 1;
return 0;
int main(){
int i;
TreeNode *trunk = NULL;
for(i=0; i<1000000; i++){
insertNode(&trunk, rand() % 50000);
You don't need to lock every node you visit. You can do something like this. Lock a node when you're about to do an insertion. Do your insertion and unlock. If another thread happens to need to insert at the same point it and the node is locked it should wait before traversing down any further. Once the node is unlocked it can then continue traversing the updated part of the tree.
Another straightforward way is to have 1 lock for the complete tree.
You have a more serialized access to the tree, but you only have one mutex and you lock only once.
If the serialization is an issue, you use a read/write lock. so at least reading can be done in parallel.
Use a read-write lock. Locking on individual nodes will become exceptionally difficult if you later decide to switch your tree implementation. Here's a little demo code using pthreads:
typedef struct {
pthread_rwlock_t rwlock;
TreeNode *root_node;
} Tree;
void Tree_init(Tree *tree) {
pthread_rwlock_init(&tree->rwlock, NULL);
tree->root_node = NULL;
int32_t Tree_insert(Tree *tree, int32_t data) {
int32_t ret = _insertNode(&tree->root_node, data);
return ret;
int32_t Tree_locate(Tree *tree) {
int32_t ret = _locateNode(&tree->root_node);
return ret;
void Tree_destroy(Tree *tree) {
// yada yada
Lock the whole tree. There's no other way that will not get you into trouble sooner or later. Of course, if there's a lot of concurrent reads and writes, you will get a lot of blocking and slow everything down horribly.
Java introduced a concurrent skip list in version 1.6. Skip lists work like trees, but are (supposedly) a bit slower. However, they are based on singly linked lists, and therefore can theoretically be modified without locking using compare-and-swap. This makes for superb multi-threaded performance.
I googled "skip list" C++ compare-and-swap and came up with some interesting info but no C++ code. However, Java is open source, so you can get the algorithm if you are desperate enough. The Java class is: java.util.concurrent.ConcurrentSkipListMap.

How to lock Queue variable address instead of using Critical Section?

I have 2 threads and global Queue, one thread (t1) push the data and another one(t2) pops the data, I wanted to sync this operation without using function where we can use that queue with critical section using windows API.
The Queue is global, and I wanted to know how to sync, is it done by locking address of Queue?
Is it possible to use Boost Library for the above problem?
One approach is to have two queues instead of one:
The producer thread pushes items to queue A.
When the consumer thread wants to pop items, queue A is swapped with empty queue B.
The producer thread continues pushing items to the fresh queue A.
The consumer, uninterrupted, consumes items off queue B and empties it.
Queue A is swapped with queue B etc.
The only locking/blocking/synchronization happens when the queues are being swapped, which should be a fast operation since it's really a matter of swapping two pointers.
I thought you could make a queue with those conditions without using any atomics or any thread safe stuff at all?
like if its just a circle buffer, one thread controls the read pointer, and the other controls the write pointer. both don't update until they are finished reading or writing. and it just works?
the only point of difficulty comes with determining when read==write whether the queue is full or empty, but you can overcome this by just having one dummy item always in the queue
class Queue
volatile Object* buffer;
int size;
volatile int readpoint;
volatile int writepoint;
void Init(int s)
size = s;
buffer = new Object[s];
readpoint = 0;
writepoint = 1;
//thread A will call this
bool Push(Object p)
if(writepoint == readpoint)
return false;
int wp = writepoint - 1;
buffer[wp] = p;
int newWritepoint = writepoint + 1;
newWritePoint = 0;
writepoint = newWritepoint;
return true;
// thread B will call this
bool Pop(Object* p)
writepointTest = writepoint;
if(readpoint+1 == writepoint)
return false;
*p = buffer[readpoint];
int newReadpoint = readpoint + 1;
newReadPoint = 0;
readpoint = newReadPoint;
return true;
Another way to handle this issue is to allocate your queue dynamically and assign it to a pointer. The pointer value is passed off between threads when items have to be dequeued, and you protect this operation with a critical section. This means locking for every push into the queue, but much less contention on the removal of items.
This works well when you have many items between enqueueing and dequeueing, and works less well with few items.
Example (I'm using some given RAII locking class to do the locking). Also note...really only safe when only one thread dequeueing.
queue* my_queue = 0;
queue* pDequeue = 0;
critical_section section;
void enqueue(stuff& item)
locker lock(section);
if (!my_queue)
my_queue = new queue;
item* dequeue()
if (!pDequeue)
{ //handoff for dequeue work
locker lock(section);
pDequeue = my_queue;
my_queue = 0;
if (pDequeue)
item* pItem = pDequeue->pop(); //remove item and return it.
if (!pItem)
delete pDequeue;
pDequeue = 0;
return pItem;
return 0;