atomic read then write with std::atomic - c++

Assume the following code
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
std::atomic<uint> val;
void F()
{
while(true)
{
++val;
}
}
void G()
{
while(true)
{
++val;
}
}
void H()
{
while(true)
{
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout <<"val="<< val << std::endl;
val = 0;
}
}
int main()
{
std::thread ft(F);
std::thread gt(G);
std::thread ht(H);
ft.join();
gt.join();
ht.join();
return 0;
}
It is basically two threads incrementing the value of val and a third thread which reports this value every second then resets it. The problem is, when the third thread is reading this value and then sets it to zero, there can be possible increments which will be lost (we did not include them in report). So we need an atomic read-then-write mechanism. Is there a clean way to do this that I am not aware of?
PS: I don't want to lock anything

The std::atomic::exchange method seems to be what you're after (emphasis mine):
Atomically replaces the underlying value with desired. The operation is read-modify-write operation.
Use as follows:
auto localValue = val.exchange(0);
std::cout << "Value = " << localValue << std::endl;

As others mentioned std::atomic::exchange would work.
To mention why your current code does what you said, between the two lines of execution:
std::cout <<"val="<< val << std::endl;
val = 0;
the other two threads have time to increment the value that ht thread is about to reset.
std::atomic::exchange would do those lines in one "atomic" operation.

Related

Multithreading and sequence of instructions

While learning multithread programming I've written the following code.
#include <thread>
#include <iostream>
#include <cassert>
void check() {
int a = 0;
int b = 0;
{
std::jthread t2([&](){
int i = 0;
while (a >= b) {
++i;
}
std::cout << "failed at iteration " << i << "\n"
// I know at this point a and b may have changed
<< a << " >= " << b << "\n";
std::exit(0);
});
std::jthread t1([&](){
while (true) {
++a;
++b;
}
});
}
}
int main() {
check();
}
Since ++a always happens before ++b a should be always greater or equal to b.
But experiment shows that sometimes b > a. Why? What causes it? And how can I enforce it?
Even when I replace int a = 0; with int a = 1000; which makes all of this even more crazy.
The program exits soon so no int overflow happens.
I found no instructions reordering in assembly which might have caused this.
Since ++a always happens before ++b a should be always greater or
equal to b
Only in its execution thread. And only if that's observable by the execution thread.
C++ requires certain explicit "synchronization" in order for changes made by one execution thread be visible by other execution threads.
++a;
++b;
With these statements alone, there are no means by which this execution thread would actually be able to "tell the difference" whether a or b was incremented first. As such, C++ allows the compiler to implement whatever optimizations or code reordering steps it wants, as long as it has no observable effects in its execution thread, and if the actual generated code incremented b first there will not be any observable effects. There's no way that this execution thread could possibly tell the difference.
But if there was some intervening statement that "looked" at a, then this wouldn't hold true any more, and the compiler would be required to actually generate code that increments a before using it in some way.
And that's just this execution thread, alone. Even if it was possible to observe the relative order of changes to a in b in this execution thread the C++ compiler is allowed, by the standard, to actually increment the actual variables in any order, as long as there are also any other adjustments that make this not observable. But it could be observable by another execution thread. To prevent that it will be necessary to take explicit synchronization steps, using mutexes, condition variables, and other parts of C++'s execution thread model.
There are non-trivial race conditions between the increment of these different variables and when you read them. If you want strict ordering of these reads and writes you will have to use some sort of synchronization mechanism. std::atomic<> makes it easier.
Try this instead:
#include<iostream>
#include <thread>
#include <iostream>
#include <cassert>
#include <atomic>
void check() {
struct AB { int a = 0; int b=0; };
std::atomic<AB> ab;
{
std::jthread t2([&](){
int i = 0;
AB temp;
while (true) {
temp = ab;
if ( temp.a > temp.b ) break;
++i;
}
std::cout << "failed at iteration " << i << "\n"
// I know at this point a and b may have changed
<< temp.a << " >= " << temp.b << "\n";
std::exit(0);
});
std::jthread t1([&](){
while (true) {
AB temp = ab;
temp.a++;
temp.b++;
ab = temp;
}
});
}
}
int main() {
check();
}
Code: https://godbolt.org/z/Kxeb8d8or
Result:
Program returned: 143
Program stderr
Killed - processing time exceeded

threaded for loop reaching values it's not supposed to reach

I'm messing around with multithreading in c++ and here is my code:
#include <iostream>
#include <vector>
#include <string>
#include <thread>
void read(int i);
bool isThreadEnabled;
std::thread threads[100];
int main()
{
isThreadEnabled = true; // I change this to compare the threaded vs non threaded method
if (isThreadEnabled)
{
for (int i = 0;i < 100;i++) //this for loop is what I'm confused about
{
threads[i] = std::thread(read,i);
}
for (int i = 0; i < 100; i++)
{
threads[i].join();
}
}
else
{
for (int i = 0; i < 100; i++)
{
read(i);
}
}
}
void read(int i)
{
int w = 0;
while (true) // wasting cpu cycles to actually see the difference between the threaded and non threaded
{
++w;
if (w == 100000000) break;
}
std::cout << i << std::endl;
}
in the for loop that uses threads the console prints values in a random order ex(5,40,26...) which is expected and totally fine since threads don't run in the same order as they were initiated...
but what confuses me is that the values printed are sometimes more than the maximum value that int i can reach (which is 100), values like 8000,2032,274... are also printed to the console even though i will never reach that number, I don't understand why ?
This line:
std::cout << i << std::endl;
is actually equivalent to
std::cout << i;
std::cout << std::endl;
And thus while thread safe (meaning there's no undefined behaviour), the order of execution is undefined. Given two threads the following execution is possible:
T20: std::cout << 20
T32: std::cout << 32
T20: std::cout << std::endl
T32: std::cout << std::endl
which results in 2032 in console (glued numbers) and an empty line.
The simplest (not necessarily the best) fix for that is to wrap this line with a shared mutex:
{
std::lock_guard lg { mutex };
std::cout << i << std::endl;
}
(the brackets for a separate scope are not needed if the std::cout << i << std::endl; is the last line in the function)

Is it safe/efficient to cancel a c++ thread by writing to an outside variable?

I have a search problem, which I want to parallelize. If one thread has found a solution, I want all other threads to stop. Otherwise, if all threads exit regularly, I know, that there is no solution.
The following code (that demonstrates my cancelling strategy) seems to work, but I'm not sure, if it is safe and the most efficient variant:
#include <iostream>
#include <thread>
#include <cstdint>
#include <chrono>
using namespace std;
struct action {
uint64_t* ii;
action(uint64_t *ii) : ii(ii) {};
void operator()() {
uint64_t k = 0;
for(; k < *ii; ++k) {
//do something useful
}
cout << "counted to " << k << " in 2 seconds" << endl;
}
void cancel() {
*ii = 0;
}
};
int main(int argc, char** argv) {
uint64_t ii = 1000000000;
action a{&ii};
thread t(a);
cout << "start sleeping" << endl;
this_thread::sleep_for(chrono::milliseconds(2000));
cout << "finished sleeping" << endl;
a.cancel();
cout << "cancelled" << endl;
t.join();
cout << "joined" << endl;
}
Can I be sure, that the value, to which ii points, always gets properly reloaded? Is there a more efficient variant, that doesn't require the dereferenciation at every step? I tried to make the upper bound of the loop a member variable, but since the constructor of thread copies the instance of action, I wouldn't have access to that member later.
Also: If my code is exception safe and does not do I/O (and I am sure, that my platform is Linux), is there a reason not to use pthread_cancel on the native thread?
No, there's no guarantee that this will do anything sensible. The code has one thread reading the value of ii and another thread writing to it, without any synchronization. The result is that the behavior of the program is undefined.
I'd just add a flag to the class:
std::atomic<bool> time_to_stop;
The constructor of action should set that to false, and the cancel member function should set it to true. Then change the loop to look at that value:
for(; !time_to_stop && k < *ii; ++k)
You might, instead, make ii atomic. That would work, but it wouldn't be as clear as having a named member to look at.
First off there is no reason to make ii a pointer. You can have it just as a plain uint64_t.
Secondly if you have multiple threads and at least one of them writes to a shared variable then you are going to have to have some sort of synchronization. In this case you could just use std::atomic<uint64_t> to get that synchronization. Otherwise you would have to use a mutex or some sort of memory fence.

Why is my program printing garbage?

My code:
#include <iostream>
#include <thread>
void function_1()
{
std::cout << "Thread t1 started!\n";
for (int j=0; j>-100; j--) {
std::cout << "t1 says: " << j << "\n";
}
}
int main()
{
std::thread t1(function_1); // t1 starts running
for (int i=0; i<100; i++) {
std::cout << "from main: " << i << "\n";
}
t1.join(); // main thread waits for t1 to finish
return 0;
}
I create a thread that prints numbers in decreasing order while main prints in increasing order.
Sample output here. Why is my code printing garbage ?
Both threads are outputting at the same time, thereby scrambling your output.
You need some kind of thread synchronization mechanism on the printing part.
See this answer for an example using a std::mutex combined with std::lock_guard for cout.
It's not "garbage" — it's the output you asked for! It's just jumbled up, because you have used a grand total of zero synchronisation mechanisms to prevent individual std::cout << ... << std::endl lines (which are not atomic) from being interrupted by similar lines (which are still not atomic) in the other thread.
Traditionally we'd lock a mutex around each of those lines.

Forcing race between threads using C++11 threads

Just got started on multithreading (and multithreading in general) using C++11 threading library, and and wrote small short snipped of code.
#include <iostream>
#include <thread>
int x = 5; //variable to be effected by race
//This function will be called from a thread
void call_from_thread1() {
for (int i = 0; i < 5; i++) {
x++;
std::cout << "In Thread 1 :" << x << std::endl;
}
}
int main() {
//Launch a thread
std::thread t1(call_from_thread1);
for (int j = 0; j < 5; j++) {
x--;
std::cout << "In Thread 0 :" << x << std::endl;
}
//Join the thread with the main thread
t1.join();
std::cout << x << std::endl;
return 0;
}
Was expecting to get different results every time (or nearly every time) I ran this program, due to race between two threads. However, output is always: 0, i.e. two threads run as if they ran sequentially. Why am I getting same results and is there any ways to simulate or force race between two threads ?
Your sample size is rather small, and somewhat self-stalls on the continuous stdout flushes. In short, you need a bigger hammer.
If you want to see a real race condition in action, consider the following. I purposely added an atomic and non-atomic counter, sending both to the threads of the sample. Some test-run results are posted after the code:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
void racer(std::atomic_int& cnt, int& val)
{
for (int i=0;i<1000000; ++i)
{
++val;
++cnt;
}
}
int main(int argc, char *argv[])
{
unsigned int N = std::thread::hardware_concurrency();
std::atomic_int cnt = ATOMIC_VAR_INIT(0);
int val = 0;
std::vector<std::thread> thrds;
std::generate_n(std::back_inserter(thrds), N,
[&cnt,&val](){ return std::thread(racer, std::ref(cnt), std::ref(val));});
std::for_each(thrds.begin(), thrds.end(),
[](std::thread& thrd){ thrd.join();});
std::cout << "cnt = " << cnt << std::endl;
std::cout << "val = " << val << std::endl;
return 0;
}
Some sample runs from the above code:
cnt = 4000000
val = 1871016
cnt = 4000000
val = 1914659
cnt = 4000000
val = 2197354
Note that the atomic counter is accurate (I'm running on a duo-core i7 macbook air laptop with hyper threading, so 4x threads, thus 4-million). The same cannot be said for the non-atomic counter.
There will be significant startup overhead to get the second thread going, so its execution will almost always begin after the first thread has finished the for loop, which by comparison will take almost no time at all. To see a race condition you will need to run a computation that takes much longer, or includes i/o or other operations that take significant time, so that the execution of the two computations actually overlap.