Strange atomic_flag example for c++20 at cppreference.com - c++

cppreference.com std::atomic_flag listed two examples of spinlock in prior c++20 and c++20. The last modified date is 21 July 2020, at 12:58.
Prior c++20:
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void f(int n)
{
for (int cnt = 0; cnt < 100; ++cnt) {
while (lock.test_and_set(std::memory_order_acquire)) // acquire lock
; // spin
std::cout << "Output from thread " << n << '\n';
lock.clear(std::memory_order_release); // release lock
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f, n);
}
for (auto& t : v) {
t.join();
}
}
c++ 20:
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void f(int n)
{
for (int cnt = 0; cnt < 100; ++cnt) {
while (lock.test_and_set(std::memory_order_acquire)) // acquire lock
while (lock.test(std::memory_order_relaxed)) // test lock
; // spin
std::cout << "Output from thread " << n << '\n';
lock.clear(std::memory_order_release); // release lock
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f, n);
}
for (auto& t : v) {
t.join();
}
}
Two things that bother me in the c++20 example are:
(1) ATOMIC_FLAG_INIT is deprecated in c++20 and the default constructor should store value to false for us.
(2) The "optimization" by introducing while (lock.test(std::memory_order_relaxed)) after the flag has been set to true does not make sense to me. Shouldn't while (lock.test(std::memory_order_relaxed)) always immediately return in that case? Why is it an optimization to the prior c++20 example then?
Edit:
c++20 has introduced test() for atomic which simply checks if the flag is true by doing an atomic load. It is placed at the inner loop when test_and_set() has failed so that the computer first spins inside the test() while loop before going back to test_and_set() the second time.

See the comment:
that edit came from stackoverflow.com/q/62318642/2945027 – Cubbi

Related

std::barrier is_nothrow_invocable_v<CompletionFunction&> shall be true error

I am trying to simulate a dice roll program, to be more specific get the number of rolls necessary to get n dices to be 6 at the same time.
So this is a textbook example of fork-join model in order to simulate rolls simultaneously and then check result after each iteration.
#include <iostream>
#include <vector>
#include <thread>
#include <random>
#include <array>
#include <barrier>
auto random_int(int min, int max)
{
static thread_local auto engine = std::default_random_engine{ std::random_device{}() };
auto dist = std::uniform_int_distribution<>(min, max);
return dist(engine);
}
int main()
{
constexpr auto n = 5;
auto done = false;
auto dice = std::array<int, n>{};
auto threads = std::vector<std::thread>{};
auto n_turns = 0;
auto check_result = [&]
{
++n_turns;
auto is_six = [](int i) {return 6 == i; };
done = std::all_of(std::begin(dice), std::end(dice), is_six);
};
auto bar = std::barrier{ n,check_result };
for (int i = 0; i < n; ++i)
{
threads.emplace_back([&, i]{
while (!done)
{
dice[i] = random_int(1, 6);
bar.arrive_and_wait();
}});
}
for (auto&& t : threads)
{
t.join();
}
std::cout << n_turns << std::endl;
}
And I am getting the following error:
error C2338: N4861 [thread.barrier.class]/5:
is_nothrow_invocable_v<CompletionFunction&> shall be true
1>C:\Users\eduar\source\repos\C++20\C++20\main.cpp(114): message : see
reference to class template instantiation
'std::barriermain::<lambda_1>' being compiled
Can someone please hint what am I doing wrong and how to fix this?
The issue is in the error message. Which is great, even cites exactly the part of the standard which has this requirement: [thread.barrier.class]/5:
CompletionFunction shall meet the Cpp17MoveConstructible (Table 28) and Cpp17Destructible (Table 32) requirements.
is_­nothrow_­invocable_­v<CompletionFunction&> shall be true.
You're currently missing the last part: your lambda isn't nothrow-invocable. That's an easy fix tho:
auto check_result = [&]() noexcept
// ^~~~~~~~
{
++n_turns;
auto is_six = [](int i) {return i == 6; };
done = std::all_of(std::begin(dice), std::end(dice), is_six);
};
I also took the opportunity to flip your Yoda conditional, because there is no reason to write Yoda conditionals.

Why does std::atomic_flag cause a deadlock here?

Why does this ping-pong example with C++20 std::atomic_flag often result in a deadlock? I am using GCC 11.1 to compile.
#include <atomic>
#include <iostream>
#include <thread>
constexpr auto count_limit = 10'000;
auto atomic_flag = std::atomic_flag{};
auto counter = std::atomic<int>{};
void ping() {
while (counter < count_limit) {
atomic_flag.wait(true);
++counter;
atomic_flag.test_and_set();
atomic_flag.notify_one();
}
}
void pong() {
while (counter < count_limit) {
atomic_flag.wait(false);
atomic_flag.clear();
atomic_flag.notify_one();
}
}
int main() {
atomic_flag.test_and_set();
{
auto const t1 = std::jthread{ping};
auto const t2 = std::jthread{pong};
}
std::cout << "Finished\n";
}
Update: The "deadlock" does not occur on the Linux machine on godbolt.org: https://godbolt.org/z/zPb8d1bca. It does not happen on my own Linux machine either. It does happen on my Windows machine, so this might be a GCC bug that is specific to Windows.

Why cannot my c++ thread pool accelerate my program?

I tried to implement a c++ thread pool according to some notes made by others, the code is like this:
#include <vector>
#include <queue>
#include <functional>
#include <future>
#include <atomic>
#include <condition_variable>
#include <thread>
#include <mutex>
#include <memory>
#include <glog/logging.h>
#include <iostream>
#include <chrono>
using std::cout;
using std::endl;
class ThreadPool {
public:
ThreadPool(const ThreadPool&) = delete;
ThreadPool(ThreadPool&&) = delete;
ThreadPool& operator=(const ThreadPool&) = delete;
ThreadPool& operator=(ThreadPool&&) = delete;
ThreadPool(uint32_t capacity=std::thread::hardware_concurrency(),
uint32_t n_threads=std::thread::hardware_concurrency()
): capacity(capacity), n_threads(n_threads) {
init(capacity, n_threads);
}
~ThreadPool() noexcept {
shutdown();
}
void init(uint32_t capacity, uint32_t n_threads) {
CHECK_GT(capacity, 0) << "task queue capacity should be greater than 0";
CHECK_GT(n_threads, 0) << "thread pool capacity should be greater than 0";
for (int i{0}; i < n_threads; ++i) {
pool.emplace_back(std::thread([this] {
std::function<void(void)> task;
while (!this->stop) {
{
std::unique_lock<std::mutex> lock(this->q_mutex);
task_q_empty.wait(lock, [&] {return this->stop | !task_q.empty();});
if (this->stop) break;
task = this->task_q.front();
this->task_q.pop();
task_q_full.notify_one();
}
// auto id = std::this_thread::get_id();
// std::cout << "thread id is: " << id << std::endl;
task();
}
}));
}
}
void shutdown() {
stop = true;
task_q_empty.notify_all();
task_q_full.notify_all();
for (auto& thread : pool) {
if (thread.joinable()) {
thread.join();
}
}
}
template<typename F, typename...Args>
auto submit(F&& f, Args&&... args) -> std::future<decltype(f(args...))> {
using res_type = decltype(f(args...));
std::function<res_type(void)> func = std::bind(std::forward<F>(f), std::forward<Args>(args)...);
auto task_ptr = std::make_shared<std::packaged_task<res_type()>>(func);
{
std::unique_lock<std::mutex> lock(q_mutex);
task_q_full.wait(lock, [&] {return this->stop | task_q.size() <= capacity;});
CHECK (this->stop == false) << "should not add task to stopped queue\n";
task_q.emplace([task_ptr]{(*task_ptr)();});
}
task_q_empty.notify_one();
return task_ptr->get_future();
}
private:
std::vector<std::thread> pool;
std::queue<std::function<void(void)>> task_q;
std::condition_variable task_q_full;
std::condition_variable task_q_empty;
std::atomic<bool> stop{false};
std::mutex q_mutex;
uint32_t capacity;
uint32_t n_threads;
};
int add(int a, int b) {return a + b;}
int main() {
auto t1 = std::chrono::steady_clock::now();
int n_threads = 1;
ThreadPool tp;
tp.init(n_threads, 1024);
std::vector<std::future<int>> res;
for (int i{0}; i < 1000000; ++i) {
res.push_back(tp.submit(add, i, i+1));
}
auto t2 = std::chrono::steady_clock::now();
for (auto &el : res) {
el.get();
// cout << el.get() << endl;
}
tp.shutdown();
cout << "processing: "
<< std::chrono::duration<double, std::milli>(t2 - t1).count()
<< endl;
return 0;
}
The problem is that, when I set n_threads=1, the program takes the same length of time as I set n_threads=4. Since my gpu has 72 kernels (from the htop command), I believe the 4 thread would be faster than the 1 thread settings. What is the problem with this implementation of the thread pool please?
I found few issues:
1) Use ORing instead of the bitwise operation in the both conditional-variable waits:
Replace this - `task_q_empty.wait(lock, [&] {return this->stop | !task_q.empty();});`
By - `task_q_empty.wait(lock, [&] {return this->stop || !task_q.empty();});`
2) Use notify_all() in place of notify_one() in init() and submit().
3) Two condition_variables is unnecessary here, use only task_q_empty.
4) Your use case is not ideal. Switching of the threads may outweigh adding of two integers, it may appear more the threads longer the execution time. Test in optimized mode. Try scenario like this to simulate longer process:
int add(int a, int b) { this_thread::sleep_for(chrono::milliseconds(200)); return a + b; }

Using condition_variable::notify_all to notify multiple threads

I have been trying to code the dining philosophers as a way to get better with multithreading programming. In my code, I have a condition_variable that stops the thread until all the threads have been created. However, it seems that when I call condition_variable::notify_all to notify that all the threads have been created and to start 'eating', only one thread is notified. For example:
I have a Philosophers class which has these member variables:
static std::condition_variable start;
static std::mutex start_mutex;
And these member function.
static void start_eating() {
start.notify_all();
}
void dine() {
signal(SIGINT, ctrl_c_catch);
std::unique_lock lk{ start_mutex };
start.wait(lk);
std::cout << id << "started\n";
// see end for complete class...
Each thread waits on the condition_variable start and won't continue until I call start_eating(). The problem is that when I call start.notify_all();, only one of the threads gets notified and continues. However, when I change the code to unlock the mutex after waiting, everything runs OK (All the threads continue):
std::unique_lock lk{ start_mutex };
start.wait(lk);
lk.unlock();
I was dont understand what is going on here. Why do I need to unlock the mutex?
The full code:
#include <chrono>
#include <mutex>
#include <vector>
#include <thread>
#include <condition_variable>
#include <atomic>
#include <signal.h>
#include <iostream>
#include <shared_mutex>
#include <ctime>
namespace clk = std::chrono;
const auto EAT_SLEEP_TIME = clk::milliseconds{1}; // 5 seconds
const auto NUM_SEATS = 5U;
using Fork = std::mutex; // is the fork being used or not
std::mutex cout_mutex;
void ctrl_c_catch(int dummy);
class Philosopher {
Fork& left;
Fork& right;
unsigned id;
unsigned times_eaten;
static std::condition_variable start;
static std::mutex start_mutex;
static std::atomic_bool end;
public:
Philosopher(Fork& l, Fork& r, unsigned i) : left{ l }, right{ r }, id{ i }, times_eaten{} {}
static void start_eating() {
start.notify_all();
}
static void stop_eating() {
end = true;
}
void dine() {
signal(SIGINT, ctrl_c_catch);
std::unique_lock lk{ start_mutex };
start.wait(lk);
// lk.unlock(); // uncommenting this fixes the issue
std::cout << id << " started\n";
while (!end) {
if (&right < &left) {
right.lock();
left.lock();
} else {
left.lock();
right.lock();
}
cout_mutex.lock();
std::clog << id << " got both forks, eating\n";
cout_mutex.unlock();
++times_eaten;
std::this_thread::sleep_for(EAT_SLEEP_TIME * (rand() % 50));
right.unlock();
left.unlock();
std::this_thread::sleep_for(EAT_SLEEP_TIME * (rand() % 50));
}
cout_mutex.lock();
std::cout << id << " stopped, terminating thread. Eaten " << times_eaten << "\n";
cout_mutex.unlock();
delete this;
}
};
std::atomic_bool Philosopher::end = false;
std::condition_variable Philosopher::start{};
std::mutex Philosopher::start_mutex{};
template <size_t N, typename T = unsigned>
constexpr std::array<T, N> range(T b = 0, T s = 1) {
std::array<T, N> ret{};
for (auto& i : ret) {
i = b;
b += s;
}
return ret;
}
void ctrl_c_catch(int dummy) {
std::cout << "Caught ctrl-c or stop\nStoping Philosophers\n";
Philosopher::stop_eating();
std::this_thread::sleep_for(clk::seconds{5});
exit(0);
}
int main() {
srand(time(NULL));
signal(SIGINT, ctrl_c_catch);
std::vector<Fork> forks{ NUM_SEATS }; // 5 forks
std::vector<std::thread> phil; // vector of philosophers
for (unsigned i : range<NUM_SEATS - 1>()) {
auto p = new Philosopher{forks[i], forks[i + 1], i};
phil.emplace_back(&Philosopher::dine, p);
}
auto p = new Philosopher{forks[NUM_SEATS - 1], forks[0], NUM_SEATS - 1};
phil.emplace_back(&Philosopher::dine, p);
std::clog << "Waiting for 5 seconds\n";
std::this_thread::sleep_for(clk::seconds{10});
std::clog << "Starting Philosophers\n Type 'stop' to stop\n";
Philosopher::start_eating();
for (auto& t : phil)
t.detach();
std::this_thread::sleep_for(clk::seconds{15});
ctrl_c_catch(0);
std::string dummy;
std::cin >> dummy;
if (dummy == "stop")
ctrl_c_catch(0);
return 0;
}
As explained here, calling std::condition_variable::wait releases the lock, waits, and after waking up, the lock is reacquired. So you need to unlock it manually (or automatically using RAII) to allow other threads to lock it. Condition variables in C++ have similar semantics to non-blocking monitors, so you can read up on that to get a better intuitive understanding. Also, because of spurious unblocking, which is impossible to prevent, you should use the other version of the function, the one that uses a predicate (more info in above link).

Proper usage of std::atomic, pre increment value as function param

I have code where I need unique id (packet id for some protocol). So I used std::atomic<int>. After reading documentation I was confused because it stated that increment is done in this way. fetch_add(1)+1
I understand that value inside fetch_add is incremented atomically but I get pre-increment value +1 outside atomic operation. What I would guess is not atomic.
Can I use some_func(++atomic_value)?
I wrote simple code to check if it works. And it works but I don't understand why.
#include<iostream>
#include <atomic>
#include <thread>
#include <vector>
#include <random>
#include <mutex>
#include <algorithm>
std::atomic<int> Index = 0;
//int Index = 0; // non atomic Index. It will generate duplicities
std::vector<int> Numbers;
std::mutex Mutex;
std::default_random_engine Generator;
std::uniform_int_distribution<int> Distribution(5, 10);
void func(int Value)
{
std::lock_guard<std::mutex> Guard(Mutex);
Numbers.push_back(Value);
}
void ThreadProc()
{
Sleep(Distribution(Generator));
func(++Index); // is this proper usage of std::atomic?
}
int main()
{
const int ThreadCount = 1000;
std::vector<std::thread> Threads;
for ( int i = 0; i < ThreadCount; i++ )
{
Threads.push_back(std::thread(ThreadProc));
}
for_each(Threads.begin(), Threads.end(), [](std::thread& t) { t.join(); });
std::sort(Numbers.begin(), Numbers.end());
auto End = std::unique(Numbers.begin(), Numbers.end());
if ( Numbers.end() == End )
{
std::cout << "No duplicites found." << std::endl;
}
else
{
std::cout << "Duplicites found ! - " << Numbers.end() - End << std::endl;
for_each(End, Numbers.end(), [](int n) { std::cout << n << ", "; });
}
return 0;
}
Off-topic question: When I defined Index as non atomic I get duplicities but only from end of range. Numbers are always 900+. Why it is so?