Segmentation Fault when assigning value to a pointer C++ - c++

When I run the following parallel code I get a segmentation fault at the assignment at row 18 (between the two prints). I don't really understand what is causing.
This is a minimal working example which describes the problem:
#include <iostream>
#include <numeric>
#include <vector>
#include <thread>
struct Worker{
std::vector<int>* v;
void f(){
std::vector<int> a(20);
std::iota(a.begin(), a.end(), 1);
auto b = new std::vector<int>(a);
std::cout << "Test 1" << std::endl;
v = b;
std::cout << "Test 2" << std::endl;
}
};
int main(int argc, char** argv) {
int nw = 1;
std::vector<std::thread> threads(nw);
std::vector<std::unique_ptr<Worker>> W;
for(int i = 0; i < nw; i++){
W.push_back(std::make_unique<Worker>());
threads[i] = std::thread([&]() { W[i]->f(); } );
// Pinning threads to cores
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(i, &cpuset);
pthread_setaffinity_np(threads[i].native_handle(), sizeof(cpu_set_t), &cpuset);
}
for (int i = 0; i < nw; i++) {
threads[i].join();
std::cout << (*(W[i]->v))[0] << std::endl;
}
}
It seems that compiling it with -fsanitize=address the code works fine but I get worst performances. How can I make it work?

std::vector is not thread-safe. None of the containers in the C++ library are thread safe.
threads[i] = std::thread([&]() { W[i]->f(); } );
The new execution thread captures the vector by reference and accesses it.
W.push_back(std::make_unique<Worker>());
The original execution thread continuously modifies the vector here, without synchronizing access to the W vector with any of the new execution threads. Any push_back may invalidate the existing contents of the vector in order to reallocate it, and if a different execution thread attempts to get W[i] at the same time, while it's being reallocated, hillarity ensues.
This is undefined behavior.
You must either synchronize access to the vector using a mutex, or make sure that the vector will never be reallocated, using any number of known techniques. A sufficiently-large reserve(), in advance, should do the trick.
Additionally, it's been pointed out that i is also captured by reference, so by the time each new execution thread starts, its value could be anything.

In addition to the vector synchronization problem mentioned by Sam, there is another problem.
This line:
threads[i] = std::thread([&]() { W[i]->f(); } );
captures i by reference. There is a good chance that i goes out of scope (and is destroyed) before the thread starts running. The statement W[i]->f(); is likely to read an invalid value of i which is negative or too large. Note that before i goes out of scope, the last value written to it is nw, so if even if the memory that previously contained i is still accessible, it's likely to have the value nw which is too large.
You could fix this problem by capturing i by value:
threads[i] = std::thread([&W, i]() { W[i]->f(); } );
// ^^^^^
// captures W by reference, and i by value

As noted by others, the capture is the problem.
I've added the i parameter to the f() call:
void f(int i){
std::vector<int> a(20);
std::iota(a.begin(), a.end(), 1);
auto b = new std::vector<int>(a);
std::cout << "Test 1 " << i << std::endl;
v = b;
std::cout << "Test 2 " << v->size() << std::endl;
}
and the output: Test 1 1
The call to f works however but it is called without a valid Worker instance and when you assign to v it is surely at a wrong memory.

Related

Debugging the use of std::string in a thread pool C++

I'm in the process of trying to figure out multithreading - I'm pretty new to it. I'm using a thread_pool type that I found here. For sufficiently large N, the following code segfaults. Could you guys help me understand why and how to fix?
#include "thread_pool.hpp"
#include <thread>
#include <iostream>
static std::mutex mtx;
void printString(const std::string &s) {
std::lock_guard lock(mtx);
std::hash<std::thread::id> tid{};
auto id = tid(std::this_thread::get_id()) % 16;
std::cout << "thread: " << id << " " << s << std::endl;
}
TEST(test, t) {
thread_pool pool(16);
int N = 1000000;
std::vector<std::string> v(N);
for (int i = 0; i < N; i++) {
v[i] = std::to_string(i);
}
for (auto &s: v) {
pool.push_task([&s]() {
printString(s);
});
}
}
Here's the thread sanitizer output (note the ===> comments where I direct you to appropriate line"):
SEGV on unknown address 0x000117fbdee8 (pc 0x000102fa35b6 bp 0x7e8000186b50 sp 0x7e8000186b30 T257195)
0x102fa35b6 std::basic_string::__get_short_size const string:1514
0x102fa3321 std::basic_string::size const string:970
0x102f939e6 std::operator<<<…> ostream:1056
0x102f9380b printString RoadRunnerMapTests.cpp:37 // ==> this line: void printString(const std::string &s) {
0x102fabbd5 $_0::operator() const RoadRunnerMapTests.cpp:49 // ===> this line: v[i] = std::to_string(i);
0x102fabb3d (test_cxx_api_RoadRunnerMapTests:x86_64+0x10001eb3d) type_traits:3694
0x102fabaad std::__invoke_void_return_wrapper::__call<…> __functional_base:348
0x102faba5d std::__function::__alloc_func::operator() functional:1558
0x102fa9669 std::__function::__func::operator() functional:1732
0x102f9d383 std::__function::__value_func::operator() const functional:1885
0x102f9c055 std::function::operator() const functional:2560
0x102f9bc29 thread_pool::worker thread_pool.hpp:389 // ==> [this](https://github.com/bshoshany/thread-pool/blob/master/thread_pool.hpp#L389) line
0x102fa00bc (test_cxx_api_RoadRunnerMapTests:x86_64+0x1000130bc) type_traits:3635
0x102f9ff1e std::__thread_execute<…> thread:286
0x102f9f005 std::__thread_proxy<…> thread:297
0x1033e9a2c __tsan_thread_start_func
0x7fff204828fb _pthread_start
0x7fff2047e442 thread_start
Destructors are called in the order opposite to variable declaration order. i.e. v will be destructed earlier than pool, therefore at the moment when some threads from pool will call to printString(), the argument string will not be a valid object, because v and its content are already destroyed. To resolve this, I'd recommend to declare v before pool.
Tasks passed to thread pool contain references to content of vector v, however this vector goes out of scope prior to pool leaving tasks with dangling references. In order to fix this you need to reorder scopes of variables:
int N = 1000000;
std::vector<std::string> v(N);
thread_pool pool(16);

Switching from global static variables to static variables breaks code

I'm working on an assignment for school, one of the requirements of which is that I cannot use global variables, but I do need static variables for shared memory. The premise of the assignment is to use the pthread library and semaphores to ensure that created threads execute in reverse order. I've gotten it to work with global static semaphore/condvar/mutex as such:
#include <pthread.h>
#include <stdio.h>
#include <iostream>
#include <semaphore.h>
using namespace std;
#define NUM 5
static sem_t threadCounter;
static pthread_cond_t nextThreadCond = PTHREAD_COND_INITIALIZER;
static pthread_cond_t makingThreadCond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t makingThreadMutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t nextThreadMutex = PTHREAD_MUTEX_INITIALIZER;
void *wait_func(void *args)
{
// cout<<"Waiting"<<endl;
// pthread_cond_wait(&makingThreadCond, &makingThreadMutex);
// cout<<"Woke up"<<endl;
int tid = *((int *)args);
int val;
sem_getvalue(&threadCounter, &val);
// cout << tid << ":" << val << endl;
while (tid != val-1)
{
pthread_cond_wait(&nextThreadCond, &nextThreadMutex);
sem_getvalue(&threadCounter, &val);
// cout<<"evaluating condition in"<<tid<<", val is "<<val<<endl;
}
sem_wait(&threadCounter); // decrement threadCounter
// cout << "after decrement" << endl;
sem_getvalue(&threadCounter, &val);
// cout << "decremented val "<<val << endl;
cout<<"Exiting thread #"<<tid<<endl;
pthread_mutex_unlock(&nextThreadMutex);
// cout<<"after nextThreadMutex unlock"<<endl;
pthread_cond_broadcast(&nextThreadCond);
// cout<<"after nextThreadCond broadcast"<<endl;
}
int main()
{
pthread_t tid[NUM];
if (sem_init(&threadCounter, 0, NUM) < 0)
{
cout << "Failed to init sem" << endl;
}
for (int i = 0; i < NUM; i++)
{
int *argId = (int *)malloc(sizeof(*argId));
*argId = i;
if (pthread_create(&tid[i], NULL, wait_func, argId))
{
cout << "Couldn't make thread " << i << endl;
}
}
for (int i = 0; i < NUM; i++)
{
pthread_join(tid[i], NULL);
}
}
but this isn't allowed as I said, so I tried to convert it where I share them through a struct and passed in with pthread_create arguments as such:
#include <pthread.h>
#include <stdio.h>
#include <iostream>
#include <semaphore.h>
using namespace std;
#define NUM 5
struct args
{
int tid;
sem_t* sem;
pthread_cond_t* cond;
pthread_mutex_t* mut;
};
void *wait_func(void *args_ptr)
{
// cout<<"Waiting"<<endl;
// pthread_cond_wait(&makingThreadCond, &makingThreadMutex);
// cout<<"Woke up"<<endl;
struct args* args = (struct args*) args_ptr;
int tid = (args->tid);
pthread_cond_t cond = *(args->cond);
pthread_mutex_t mut = *(args->mut);
sem_t sem = *(args->sem);
int val;
sem_getvalue(&sem, &val);
// cout << tid << ":" << val << endl;
while (tid != val - 1)
{
pthread_cond_wait(&cond, &mut);
sem_getvalue(&sem, &val);
// cout<<"evaluating condition in"<<tid<<", val is "<<val<<endl;
}
sem_wait(&sem); // decrement threadCounter
// cout << "after decrement" << endl;
sem_getvalue(&sem, &val);
// cout << "decremented val "<<val << endl;
cout << "Exiting thread #" << tid << endl;
pthread_mutex_unlock(&mut);
// cout<<"after nextThreadMutex unlock"<<endl;
pthread_cond_broadcast(&cond);
// cout<<"after nextThreadCond broadcast"<<endl;
}
int main()
{
static sem_t threadCounter;
static pthread_cond_t nextThreadCond = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t nextThreadMutex = PTHREAD_MUTEX_INITIALIZER;
pthread_t tid[NUM];
if (sem_init(&threadCounter, 0, NUM) < 0)
{
cout << "Failed to init sem" << endl;
}
for (int i = 0; i < NUM; i++)
{
int *argId = (int *)malloc(sizeof(*argId));
*argId = i;
struct args args;
args.tid = *argId;
args.sem = &threadCounter;
args.cond = &nextThreadCond;
args.mut = &nextThreadMutex;
if (pthread_create(&tid[i], NULL, wait_func, &args))
{
cout << "Couldn't make thread " << i << endl;
}
}
// cout << "Before posting sem" << endl;
// sem_post(&makingThreads);
// cout << "Sem posetd" << endl;
// cout<<"Broadcasting"<<endl;
// pthread_cond_broadcast(&makingThreadCond);
for (int i = 0; i < NUM; i++)
{
pthread_join(tid[i], NULL);
}
}
This gets stuck immediately with "Exiting thread #4" twice. I would think that the second code is equivalent to the first, just without global variables but there must be something I'm missing.
struct args args;
This declares an object inside the scope of your for loop. When execution reaches the end of the for loop, this object gets destroyed -- like any other object that's declared locally within a function or within some inner scope -- and this happens before either the loop starts again from the beginning, or if the for loop stops iterating altogether. Either way, as soon the execution reaches the next } this object goes away. It is gone for good. It gets destroyed. It is no more. It joins the choir-invisible. It becomes an ex-object.
But before that happens, before the end of this loop, the following occurs:
if (pthread_create(&tid[i], NULL, wait_func, &args))
So you start a new execution thread, and pass it a pointer to this object, which is about to meet its maker.
And as soon as pthread_create() returns, that's the end of the loop and your args object is gone, and the abovementioned happens: it gets destroyed; it is no more; it joins the choir-invisible; and it becomes an ex-object.
And the C and the C++ standards give you absolutely no guarantees whatsoever, that your new execution thread actually starts running, and reaches the point where it reads this pointer, and what it's pointing to, before the end of this loop gets reached.
And, more likely than not, each new execution thread doesn't get around to reading the pointer to the args object, in the main execution thread, until long after it gets destroyed. So it grabs stuff from a pointer to a destroyed object. Goodbye.
As such, this execution thread's actions become undefined behavior.
This explains the random, unpredictable behavior that you've observed.
The usual approach is to malloc or new everything that gets passed to your new execution thread, and pass to the execution thread a pointer to the newed or malloced object.
It is also possible to carefully write some code that will make the main execution thread stop and wait until the new execution thread retrieves whatever it needs to do, and then proceeds on its own. A bunch more code will be needed to implement that approach, if you so choose.
Your code also has evidence of your initial attempts to take this approach:
int *argId = (int *)malloc(sizeof(*argId));
*argId = i;
struct args args;
args.tid = *argId;
mallocing this pointer, assigning to it, then copying it to args.tid accomplishes absolutely nothing useful. The same thing can be done simply by:
struct args args;
args.tid = i;
The only thing that malloc does is leak memory. Furthermore, this whole args object, declared as a local variable in the for loop's inner scope, is doomed for the reasons explained above.
P.S. When taking the "malloc the entire args object" approach, this also will leak memory unless you also take measures to diligently free the malloced object, when it is appropriate to do so.
You are passing a pointer to the local variable args to pthread_create. The variable's lifetime ends when the for loop iteration ends and the pointer becomes dangling.
The thread may be accessing it later though, causing undefined behavior.
You need to allocate args dynamically (but not argId), and pass that to the thread. The thread function must then assure the deletion of the pointer. Also don't name your variables the same thing as a type. That is very confusing. The struct keyword in a variable declaration is generally (if you don't name variables and types the same) not needed in C++ and may cause other issues when used without reason, so don't use it and name thing differently.
struct Args
{
int tid;
sem_t* sem;
pthread_cond_t* cond;
pthread_mutex_t* mut;
};
//...
auto args = new Args{i, &threadCounter, &nextThreadCond, &nextThreadMutex};
if (pthread_create(&tid[i], NULL, wait_func, args))
{
cout << "Couldn't make thread " << i << endl;
}
and at the end of the thread function delete the pointer:
void *wait_func(void *args_ptr)
{
auto args = static_cast<Args*>(args_ptr);
//...
delete args;
}
static_cast is safer than the C style cast, since it is much more restricted in the types it can cast between and e.g. can't accidentally drop a const or anything similar.
None of the variables seem to have a reason to be static either in the global or local case.
pthread_cond_t cond = *(args->cond);
pthread_mutex_t mut = *(args->mut);
This tries to create a new condition variable and mutex and initialize it based on the value of the condition variable and mutex pointed to. That doesn't make sense and won't work.
while (tid != val - 1)
{
pthread_cond_wait(&cond, &mut);
sem_getvalue(&sem, &val);
// cout<<"evaluating condition in"<<tid<<", val is "<<val<<endl;
Here, you pass to pthread_cond_wait a pointer to the local condition variable and mutex you created above rather than a pointer to the shared one. Look at this code:
int a;
foo(&a);
void foo(int* a)
{
int b = *a;
bar (&b); // If bar changes *b, that will not affect a!
}
See the problem? You passed bar a pointer to b, not a. So if bar changes the thing the pointer points to, it won't be modifying a but the local copy of b.
Don't try to create mutexes or condition variables that are copies of other mutexes or condition variables. It doesn't make semantic sense and it won't work.
Instead, you can do this:
pthread_cond_t* cond = (args->cond);
pthread_mutex_t* mut = (args->mut);
Now you can pass cond and mut to pthread_cond_wait, and you'll be passing pointers to the shared synchronization objects.

std::atomic_flag to stop multiple threads

I'm trying to stop multiple worker threads using a std::atomic_flag. Starting from Issue using std::atomic_flag with worker thread the following works:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
std::atomic_flag continueFlag;
std::thread t;
void work()
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
continueFlag.test_and_set(std::memory_order_relaxed);
t = std::thread(&work);
}
void stop()
{
continueFlag.clear(std::memory_order_relaxed);
t.join();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
Trying to rewrite into multiple worker threads:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
#include <vector>
#include <memory>
struct thread_data {
std::atomic_flag continueFlag;
std::thread thread;
};
std::vector<thread_data> threads;
void work(int threadNum, std::atomic_flag &continueFlag)
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work" << threadNum << " ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
thread_data td;
td.continueFlag.test_and_set(std::memory_order_relaxed);
td.thread = std::thread(&work, i, td.continueFlag);
threads.push_back(std::move(td));
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
}
}
void stop()
{
//Flag stop
for (auto &data : threads) {
data.continueFlag.clear(std::memory_order_relaxed);
}
//Join
for (auto &data : threads) {
data.thread.join();
}
threads.clear();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
My issue is "Problem Sector" in above. Namely creating the threads. I cannot wrap my head around how to instantiate the threads and passing the variables to the work thread.
The error right now is referencing this line threads.push_back(std::move(td)); with error Error C2280 'thread_data::thread_data(const thread_data &)': attempting to reference a deleted function.
Trying to use unique_ptr like this:
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, td->continueFlag);
threads.push_back(std::move(td));
Gives error std::atomic_flag::atomic_flag(const std::atomic_flag &)': attempting to reference a deleted function at line td->thread = std::thread(&work, i, td->continueFlag);. Am I fundamentally misunderstanding the use of std::atomic_flag? Is it really both immovable and uncopyable?
Your first approach was actually closer to the truth. The problem is that it passed a reference to an object within the local for loop scope to each thread, as a parameter. But, of course, once the loop iteration ended, that object went out of scope and got destroyed, leaving each thread with a reference to a destroyed object, resulting in undefined behavior.
Nobody cared about the fact that you moved the object into the std::vector, after creating the thread. The thread received a reference to a locally-scoped object, and that's all it knew. End of story.
Moving the object into the vector first, and then passing to each thread a reference to the object in the std::vector will not work either. As soon as the vector internally reallocates, as part of its natural growth, you'll be in the same pickle.
What needs to happen is to have the entire threads array created first, before actually starting any std::threads. If the RAII principle is religiously followed, that means nothing more than a simple call to std::vector::resize().
Then, in a second loop, iterate over the fully-cooked threads array, and go and spawn off a std::thread for each element in the array.
I was almost there with my unique_ptr solution. I just needed to pass the call as a std::ref() as such:
std::vector<std::unique_ptr<thread_data>> threads;
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, std::ref(td->continueFlag));
threads.push_back(std::move(td));
}
}
However, inspired by Sam above I also figured a non-pointer way:
std::vector<thread_data> threads;
void start()
{
const unsigned int numThreads = 2;
//create new vector, resize doesn't work as it tries to assign/copy which atomic_flag
//does not support
threads = std::vector<thread_data>(numThreads);
for (int i = 0; i < numThreads; i++) {
auto& t = threads.at(i);
t.continueFlag.test_and_set(std::memory_order_relaxed);
t.thread = std::thread(&work, i, std::ref(t.continueFlag));
}
}

Parallel execution doesn't update my variable

I want to write a program where, random numbers are going to be created and I am going to track down the greatest of them. Two threads are going to run in parallel. However, my best variable is stuck at its initial variable. Why?
[EDIT]
I updated the code after Joachim's answer, but I am not getting the correct answer at every run! What am I missing?
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
#include <random>
std::default_random_engine generator((unsigned int)time(0));
int random(int n) {
std::uniform_int_distribution<int> distribution(0, n);
return distribution(generator);
}
std::mutex mtx; // mutex for critical section
void update_cur_best(int& cur_best, int a, int b) {
// critical section (exclusive access to std::cout signaled by locking mtx):
mtx.lock();
if(a > b)
cur_best = a;
else
cur_best = b;
mtx.unlock();
}
void run(int max, int& best) {
for(int i = 0; i < 15; ++i) {
int a = random(max); int b = random(max);
update_cur_best(best, a, b);
mtx.lock();
std::cout << "|" << a << "| |" << b << "|" << std::endl;
mtx.unlock();
}
}
int main ()
{
int best = 0;
std::thread th1 (run, 100, std::ref(best));
std::thread th2 (run, 100, std::ref(best));
th1.join();
th2.join();
std::cout << "best = " << best << std::endl;
return 0;
}
Sample output:
|4| |21|
|80| |75|
|93| |95|
|4| |28|
|52| |92|
|96| |12|
|83| |8|
|4| |33|
|28| |35|
|59| |52|
|20| |73|
|60| |96|
|61| |34|
|67| |79|
|67| |95|
|54| |57|
|20| |75|
|40| |30|
|16| |32|
|25| |100|
|33| |36|
|69| |26|
|94| |46|
|15| |57|
|50| |68|
|9| |56|
|46| |70|
|65| |65|
|76| |73|
|16| |29|
best = 29
I am getting 29, which is not the maximum!
As an answer to the updated question, in update_cur_best the value of best is overwritten on each iteration. In the end, its value will simply be the greater of the most recent a, b pair generated. What you want to do is update it only when the current a or b is greater than best (I'm not sure why you generate two random values on each iteration...)
It's because you can't really pass references to the thread constructor, because they will not be passed on as references, but copied and it's those copies that are passed to your thread function. You have to use std::ref to wrap the reference.
E.g.
std::thread th1 (run, 100, std::ref(best));

Accessing random number engine from multiple threads

this is my first question, so please forgive me any violations against your policy. I want to have one global random number engine per thread, to which purpose I've devised the following scheme: Each thread I start gets a unique index from an atomic global int. There is a static vector of random engines, whose i-th member is thought to be used by the thread with the index i. If the index if greater than the vector size elements are added to it in a synchronized manner. To prevent performance penalties, I check twice if the index is greater than the vector size: once in an unsynced manner, and once more after locking the mutex. So far so good, but the following example fails with all sorts of errors (heap corruption, malloc-errors, etc.).
#include<vector>
#include<thread>
#include<mutex>
#include<atomic>
#include<random>
#include<iostream>
using std::cout;
std::atomic_uint INDEX_GEN{};
std::vector<std::mt19937> RNDS{};
float f = 0.0f;
std::mutex m{};
class TestAThread {
public:
TestAThread() :thread(nullptr){
cout << "Calling constructor TestAThread\n";
thread = new std::thread(&TestAThread::run, this);
}
TestAThread(TestAThread&& source) : thread(source.thread){
source.thread = nullptr;
cout << "Calling move constructor TestAThread. My ptr is " << thread << ". Source ptr is" << source.thread << "\n";
}
TestAThread(const TestAThread& source) = delete;
~TestAThread() {
cout << "Calling destructor TestAThread. Pointer is " << thread << "\n";
if (thread != nullptr){
cout << "Deleting thread pointer\n";
thread->join();
delete thread;
thread = nullptr;
}
}
void run(){
int index = INDEX_GEN.fetch_add(1);
std::uniform_real_distribution<float> uniformRnd{ 0.0f, 1.0f };
while (true){
if (index >= RNDS.size()){
m.lock();
// add randoms in a synchronized manner.
while (index >= RNDS.size()){
cout << "index is " << index << ", size is " << RNDS.size() << std::endl;
RNDS.emplace_back();
}
m.unlock();
}
f += uniformRnd(RNDS[index]);
}
}
std::thread* thread;
};
int main(int argc, char* argv[]){
std::vector<TestAThread> threads;
for (int i = 0; i < 10; ++i){
threads.emplace_back();
}
cout << f;
}
What am I doing wrong?!
Obviously f += ... would be a race-condition regardless of the right-hand side, but I suppose you already knew that.
The main problem that I see is your use of the global std::vector<std::mt19937> RNDS. Your mutex-protected critical section only encompasses adding new elements; not accessing existing elements:
... uniformRnd(RNDS[index]);
That's not thread-safe because resizing RNDS in another thread could cause RNDS[index] to be moved into a new memory location. In fact, this could happen after the reference RNDS[index] is computed but before uniformRnd gets around to using it, in which case what uniformRnd thinks is a Generator& will be a dangling pointer, possibly to a newly-created object. In any event, uniformRnd's operator() makes no guarantee about data races [Note 1], and neither does RNDS's operator[].
You could get around this problem by:
computing a reference (or pointer) to the generator within the protected section (which cannot be contingent on whether the container's size is sufficient), and
using a std::deque instead of a std::vector, which does not invalidate references when it is resized (unless the referenced object has been removed from the container by the resizing).
Something like this (focusing on the race condition; there are other things I'd probably do differently):
std::mt19937& get_generator(int index) {
std::lock_guard<std::mutex> l(m);
if (index <= RNDS.size()) RNDS.resize(index + 1);
return RNDS[index];
}
void run(){
int index = INDEX_GEN.fetch_add(1);
auto& gen = get_generator(index);
std::uniform_real_distribution<float> uniformRnd{ 0.0f, 1.0f };
while (true) {
/* Do something with uniformRnd(gen); */
}
}
[1] The prototype for operator() of uniformRnd is template< class Generator > result_type operator()( Generator& g );. In other words, the argument must be a mutable reference, which means that it is not implicitly thread-safe; only const& arguments to standard library functions are free of data races.