How to prevent critical section access with atomics in C++

How to prevent critical section access with atomics in C++ - c++

I have a program where 4 threads are doing random transactions between two "bank accounts", at every 100th transaction per account interest of 5% should be added to the account.
This is an exercise for school I have to do, and in the previous lesson, we used mutexes which seemed very straightforward in locking a section of code.
Now we have to do the same thing but using atomics.
My problem is that currently more than one thread can enter addIntreset function and interest can be added multiple times.
this is part of the code where I check if it is 100th transaction
void transfer(Account& acc, int amount, string& thread)
{
++m_iCounter;
withdraw(acc, amount);
if (m_iCounter % 100 == 0)
{
addInterest(thread);
}
}
All threads go through that function and check for % 100 and sometimes more than one thread slips into addInterest. With mutexes I could restrict access here but how do I do it with atomic?
Maybe I could solve this somehow inside addInterest but if so how?
Inside addInterest is following code (if more than 1 thread slip in interest is added multiple times) :
void addInterest(string& thread)
{
float old = m_iBalance.load();
const float interest = 0.05f;
const float amount = old * interest;
while (!m_iBalance.compare_exchange_weak(old, old + amount))
{
//
}
++m_iInterest;
cout << thread << " interest : Acc" << name << " " << m_iCounter << endl;
}

One problem is that you increment and read counter separately, and because of this addInterest can be called multiple times. To fix this, you should write and read atomically.
const int newCounter = ++m_iCounter;
Adding interest:
If it doesn't matter when you add interest, as long as it happens after incrementing the counter, then your addInterest is probably fine as is.
If you must add interest to the balance you had during the 100th transaction (even if it has already changed when that thread reaches addInterest), you must somehow store the old balance before you increment the counter. The only way I can think of is to synchronize the entire transfer by using atomic flag as a replacement for mutex:
// Member variable
std::atomic_bool flag;
// Before the critical section
bool expected;
do {
expected = false;
} while (!flag.compare_exchange_weak(expected, true));
// Critical section here
// Unlock at the end
flag = false;

Ok 2 possible solution:
the first one is to use the atomic template that is in the header <atomic>, in this case you can make at least the m_iCounter as an atomic variable. but for a better atomicity of the operation all the variables in a critical section have to be atomic.
Further read on Atomic and Atomic template.
the second option is use a sperimental feature of c++ the so called STM (Software Transactional Memory). In this case all the content of the transfer function could be a transaction, or anyway all the thinghs that you have mutexed before.
Further reading Transactional Memory c++

Related

An unexpected hang in multithreaded application

I am developing a simple bank system. One of the functions I implemented is doTransaction.
Simplified, it looks like this:
#include <vector>
#include <mutex>
const int N = 1000;
int accounts[N];
std::vector<std::mutex> mutexes(N);
int doTransaction(int from, int to, int amount) {
std::lock_guard<std::mutex> lock_from(mutexes[from]);
std::lock_guard<std::mutex> lock_to(mutexes[to]);
if (accounts[from] < amount) return -1;
accounts[from] -= amount;
accounts[to] += amount;
return 0;
}
Despite the fact the function seems to be simple, sometimes the execution of this function hangs.
I have already spent a large amount of time trying to solve this problem. I think, there is some kind
of deadlock in my code.

The problem you can get is the case when you have called two methods in parallel, and first one is doTransaction(1, 2, 100) and second one is doTransaction(2, 1, 100). You can use hierarchical locking to avoid deadlocks.

std::lock_guard<std::mutex> lock_from(mutexes[from]);
std::lock_guard<std::mutex> lock_to(mutexes[to]);
The first line can succeed and obtain the lock (e.g. lock 10) and then attempt to get the second lock (e.g. lock 20).
However since this function can be called from different threads, if there are two transactions such as (10->20) and (20->10) that occur at the same time. Then one thread could hold lock 10, waiting for lock 20 while the other holds lock 20 waiting for lock 10.
And also if there are three transactions e.g. (10->20), (20->30) and (30->10) on three separate threads, then there is a possibility each thread locks the 'lock_from' mutex but fails to obtain the 'lock_to' mutex.

confusion with semaphore definitions

For semaphore implementations, what does process specify? In the context of the producer/consumer problem, is the process the producer method/Consumer method? Or is it P() if we are in P() and the value is less than 0?
P() {
value = value –1;
If value < 0
add the calling process to this semaphore’s list;
block this process
}
EXAMPLE
If Consumer runs first before Producer produces its first item
Consumer would decrement the full value -> full = -1
and then since the value is less than 1, it would add the calling process to this semaphore’s list. But I’m not sure what process is.
And what does it mean to block this process? Does it mean that the entire method for consumer is at halt, and producer method runs?
code:
#define N 100
typedef int semaphore;
Semaphore fullBuffer = 0; // Initially, no item in buffer
Semaphore empty = N; // Initially, num empty buffer
Semaphore mutex = 1; // No thread updating the buffer
void producer(void) {
int item;
while(TRUE){
item = produce_item();
down(&empty);
down(&mutex);
insert_item(item);
up(&mutex);
up(&full);
}
}
void consumer(void) {
int item;
while(TRUE){
down(&full);
down(&mutex);
item = remove_item();
up(&mutex);
up(&empty);
consume_item(item);
}
}

A process, in this usage, is exactly like a thread. Usually when 'multiprocess' is used instead of 'multithreaded', it implies that the kernal handles the threading, which allows the computer to take advantage of multiple cores. However, that isn't important for this specific implementation, and is also false for this specific implementation, because nothing is atomic.
Blocking the process here means that a process that calls P and decrememnts the value to anything negative will halt its own execution when it reaches the 'block this process' command.
Assuming multi threading, your 'producer' command will continually decrease the empty semaphore unless it tries to decrement it below zero, in which case it will be halted, and only the 'consumer' command will run. At least, only 'consumer' will run until it increases the empty semaphore enough that 'producer' can now run. You can also switch both 'empty'<->'full' and 'producer'<->'consumer' in the previous two sentences, and they should remain correct.
Also, I suggest you read up on semaphores elsewhere, because they are a basic part of threading/multiprocessing, and other people have described them better than I ever could. (Look at the producer/consumer example there.)

Atomic thread counter

I'm experimenting with the C++11 atomic primitives to implement an atomic "thread counter" of sorts. Basically, I have a single critical section of code. Within this code block, any thread is free to READ from memory. However, sometimes, I want to do a reset or clear operation, which resets all shared memory to a default initialized value.
This seems like a great opportunity to use a read-write lock. C++11 doesn't include read-write mutexes out of the box, but maybe something simpler will do. I thought this problem would be a great opportunity to become more familiar with C++11 atomic primitives.
So I thought through this problem for a while, and it seems to me that all I have to do is :
Whenever a thread enters the critical section, increment an
atomic counter variable
Whenever a thread leaves the critical section, decrement the
atomic counter variable
If a thread wishes to reset all
variables to default values, it must atomically wait for the counter
to be 0, then atomically set it to some special "clearing flag"
value, perform the clear, then reset the counter to 0.
Of course,
threads wishing to increment and decrement the counter must also check for the
clearing flag.
So, the algorithm I just described can be implemented with three functions. The first function, increment_thread_counter() must ALWAYS be called before entering the critical section. The second function, decrement_thread_counter(), must ALWAYS be called right before leaving the critical section. Finally, the function clear() can be called from outside the critical section only iff the thread counter == 0.
This is what I came up with:
Given:
A thread counter variable, std::atomic<std::size_t> thread_counter
A constant clearing_flag set to std::numeric_limits<std::size_t>::max()
...
void increment_thread_counter()
{
std::size_t expected = 0;
while (!std::atomic_compare_exchange_strong(&thread_counter, &expected, 1))
{
if (expected != clearing_flag)
{
thread_counter.fetch_add(1);
break;
}
expected = 0;
}
}
void decrement_thread_counter()
{
thread_counter.fetch_sub(1);
}
void clear()
{
std::size_t expected = 0;
while (!thread_counter.compare_exchange_strong(expected, clearing_flag)) expected = 0;
/* PERFORM WRITES WHICH WRITE TO ALL SHARED VARIABLES */
thread_counter.store(0);
}
As far as I can reason, this should be thread-safe. Note that the decrement_thread_counter function shouldn't require ANY synchronization logic, because it is a given that increment() is always called before decrement(). So, when we get to decrement(), thread_counter can never equal 0 or clearing_flag.
Regardless, since THREADING IS HARD™, and I'm not an expert at lockless algorithms, I'm not entirely sure this algorithm is race-condition free.
Question: Is this code thread safe? Are any race conditions possible here?

You have a race condition; bad things happen if another thread changes the counter between increment_thread_counter()'s test for clearing_flag and the fetch_add.
I think this classic CAS loop should work better:
void increment_thread_counter()
{
std::size_t expected = 0;
std::size_t updated;
do {
if (expected == clearing_flag) { // don't want to succeed while clearing,
expected = 0; //take a chance that clearing completes before CMPEXC
}
updated = expected + 1;
// if (updated == clearing_flag) TOO MANY READERS!
} while (!std::atomic_compare_exchange_weak(&thread_counter, &expected, updated));
}

strange proplem using two Threads and Boolean

(I hate having to put a title like this. but I just couldn't find anything better)
I have two classes with two threads. first one detects motion between two frames:
void Detector::run(){
isActive = true;
// will run forever
while (isActive){
//code to detect motion for every frame
//.........................................
if(isThereMotion)
{
if(number_of_sequence>0){
theRecorder.setRecording(true);
theRecorder.setup();
// cout << " motion was detected" << endl;
}
number_of_sequence++;
}
else
{
number_of_sequence = 0;
theRecorder.setRecording(false);
// cout << " there was no motion" << endl;
cvWaitKey (DELAY);
}
}
}
second one will record a video when started:
void Recorder::setup(){
if (!hasStarted){
this->start();
}
}
void Recorder::run(){
theVideoWriter.open(filename, CV_FOURCC('X','V','I','D'), 20, Size(1980,1080), true);
if (recording){
while(recording){
//++++++++++++++++++++++++++++++++++++++++++++++++
cout << recording << endl;
hasStarted=true;
webcamRecorder.read(matRecorder); // read a new frame from video
theVideoWriter.write(matRecorder); //writer the frame into the file
}
}
else{
hasStarted=false;
cout << "no recording??" << endl;
changeFilemamePlusOne();
}
hasStarted=false;
cout << "finished recording" << endl;
theVideoWriter.release();
}
The boolean recording gets changed by the function:
void Recorder::setRecording(bool x){
recording = x;
}
The goal is to start the recording once motion was detected while preventing the program from starting the recording twice.
The really strange problem, which honestly doesn't make any sense in my head, is that the code will only work if I cout the boolean recording ( marked with the "++++++"). Else recording never changes to false and the code in the else statment never gets called.
Does anyone have an idea on why this is happening. I'm still just begining with c++ but this problem seems really strange to me..

I suppose your variables isThereMotion and recording are simple class members of type bool.
Concurrent access to these members isn't thread safe by default, and you'll face race conditions, and all kinds of weird behaviors.
I'd recommend to declare these member variables like this (as long you can make use of the latest standard):
class Detector {
// ...
std::atomic<bool> isThereMotion;
};
class Recorder {
// ...
std::atomic<bool> hasStarted;
};
etc.
The reason behind the scenes is, that even reading/writing a simple boolean value splits up into several assembler instructions applied to the CPU, and those may be scheduled off in the middle for a thread execution path change of the process. Using std::atomic<> provides something like a critical section for read/write operations on this variable automatically.
In short: Make everything, that is purposed to be accessed concurrently from different threads, an atomic value, or use an appropriate synchronization mechanism like a std::mutex.
If you can't use the latest c++ standard, you can perhaps workaround using boost::thread to keep your code portable.
NOTE:
As from your comments, your question seems to be specific for the Qt framework, there's a number of mechanisms you can use for synchronization as e.g. the mentioned QMutex.
Why volatile doesn't help in multithreaded environments?
volatile prevents the compiler to optimize away actual read access just by assumptions of values set formerly in a sequential manner. It doesn't prevent threads to be interrupted in actually retrieving or writing values there.
volatile should be used for reading from addresses that can be changed independently of the sequential or threading execution model (e.g. bus addressed peripheral HW registers, where the HW changes values actively, e.g. a FPGA reporting current data throughput at a register inteface).
See more details about this misconception here:
Why is volatile not considered useful in multithreaded C or C++ programming?

You could use a pool of nodes with pointers to frame buffers as part of a linked list fifo messaging system using mutex and semaphore to coordinate the threads. A message for each frame to be recorded would be sent to the recording thread (appended to it's list and a semaphore released), otherwise the node would be returned (appended) back to the main thread's list.
Example code using Windows based synchronization to copy a file. The main thread reads into buffers, the spawned thread writes from buffers it receives. The setup code is lengthy, but the actual messaging functions and the two thread functions are simple and small.
mtcopy.zip

Could be a liveness issue. The compiler could be re-ordering instructions or hoisting isActive out of the loop. Try marking it as volatile.
From MSDN docs:
Objects that are declared as volatile are not used in certain optimizations because their values can change at any time. The system always reads the current value of a volatile object when it is requested, even if a previous instruction asked for a value from the same object. Also, the value of the object is written immediately on assignment.
Simple example:
#include <iostream>
using namespace std;
int main() {
bool active = true;
while(active) {
cout << "Still active" << endl;
}
}
Assemble it:
g++ -S test.cpp -o test1.a
Add volatile to active as in volatile bool active = true
Assemble it again g++ -S test.cpp -o test2.a and look at the difference diff test1.a test2.a
< testb $1, -5(%rbp)
---
> movb -5(%rbp), %al
> testb $1, %al
Notice the first one doesn't even bother to read the value of active before testing it since the loop body never modifies it. The second version does.

C++ - Threads without coordinating mechanism like mutex_Lock

I attended one interview two days back. The interviewed guy was good in C++, but not in multithreading. When he asked me to write a code for multithreading of two threads, where one thread prints 1,3,5,.. and the other prints 2,4,6,.. . But, the output should be 1,2,3,4,5,.... So, I gave the below code(sudo code)
mutex_Lock LOCK;
int last=2;
int last_Value = 0;
void function_Thread_1()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 2)
{
cout << ++last_Value << endl;
last = 1;
}
mutex_Unlock(&LOCK);
}
}
void function_Thread_2()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 1)
{
cout << ++last_Value << endl;
last = 2;
}
mutex_Unlock(&LOCK);
}
}
After this, he said "these threads will work correctly even without those locks. Those locks will reduce the efficiency". My point was without the lock there will be a situation where one thread will check for(last == 1 or 2) at the same time the other thread will try to change the value to 2 or 1. So, My conclusion is that it will work without that lock, but that is not a correct/standard way. Now, I want to know who is correct and in which basis?

Without the lock, running the two functions concurrently would be undefined behaviour because there's a data race in the access of last and last_Value Moreover (though not causing UB) the printing would be unpredictable.
With the lock, the program becomes essentially single-threaded, and is probably slower than the naive single-threaded code. But that's just in the nature of the problem (i.e. to produce a serialized sequence of events).

I think the interviewer might have thought about using atomic variables.
Each instantiation and full specialization of the std::atomic template defines an atomic type. Objects of atomic types are the only C++ objects that are free from data races; that is, if one thread writes to an atomic object while another thread reads from it, the behavior is well-defined.
In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std::memory_order.
[Source]
By this I mean the only thing you should change is remove the locks and change the lastvariable to std::atomic<int> last = 2; instead of int last = 2;
This should make it safe to access the last variable concurrently.
Out of curiosity I have edited your code a bit, and ran it on my Windows machine:
#include <iostream>
#include <atomic>
#include <thread>
#include <Windows.h>
std::atomic<int> last=2;
std::atomic<int> last_Value = 0;
std::atomic<bool> running = true;
void function_Thread_1()
{
while(running)
{
if(last == 2)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 1;
}
}
}
void function_Thread_2()
{
while(running)
{
if(last == 1)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 2;
}
}
}
int main()
{
std::thread a(function_Thread_1);
std::thread b(function_Thread_2);
while(last_Value != 6){}//we want to print 1 to 6
running = false;//inform threads we are about to stop
a.join();
b.join();//join
while(!GetAsyncKeyState('Q')){}//wait for 'Q' press
return 0;
}
and the output is always:
1
2
3
4
5
6
Ideone refuses to run this code (compilation errors)..
Edit: But here is a working linux version :) (thanks to soon)

The interviewer doesn't know what he is talking about. Without the locks you get races on both last and last_value. The compiler could for example reorder the assignment to last before the print and increment of last_value, which could lead to the other thread executing on stale data. Furthermore you could get interleaved output, meaning things like two numbers not being seperated by a linebreak.
Another thing, which could go wrong is that the compiler might decide not to reload last and (less importantly) last_value each iteration, since it can't (safely) change between those iterations anyways (since data races are illegal by the C++11 standard and aren't acknowledged in previous standards). This means that the code suggested by the interviewer actually has a good chance of creating infinite loops of doing absoulutely doing nothing.
While it is possible to make that code correct without mutices, that absolutely needs atomic operations with appropriate ordering constraints (release-semantics on the assignment to last and acquire on the load of last inside the if statement).
Of course your solution does lower efficiency due to effectivly serializing the whole execution. However since the runtime is almost completely spent inside the streamout operation, which is almost certainly internally synchronized by the use of locks, your solution doesn't lower the efficiency anymore then it already is. Waiting on the lock in your code might actually be faster then busy waiting for it, depending on the availible resources (the nonlocking version using atomics would absolutely tank when executed on a single core machine)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js