I'll post my code, and then tell you what I think it's doing.
#include <thread>
#include <mutex>
#include <list>
#include <iostream>
using namespace std;
...
//List of threads and ints
list<thread> threads;
list<int> intList;
//Whether or not a thread is running
bool running(false);
//Counters
int busy(0), counter(0);
//Add 10000 elements to the list
for (int i = 0; i < 10000; ++i){
//push back an int
intList.push_back(i);
counter++;
//If the thread is running, make a note of it and continue
if (running){
busy++;
continue;
}
//If we haven't yet added 10 elements before a reset, continue
if (counter < 10)
continue;
//If we've added more than 10 ints, and there's no active thread,
//reset the counter and launch
counter = 0;
threads.push_back(std::thread([&]
//These iterators are function args
(list<int>::iterator begin, list<int>::iterator end){
//mutex for the running bool
mutex m;
m.lock();
running = true;
m.unlock();
//Remove either 10 elements or every element till the end
int removed(0);
while (removed < 10 && begin != end){
begin = intList.erase(begin);
removed++;
}
//unlock the running bool
m.lock();
running = false;
m.unlock();
//Pass into the thread func the current beginning and end of the list
}, intList.begin(), intList.end()));
}
for (auto& thread : threads){
thread.join();
}
What I think this code is doing is adding 10000 elements to the end of a list. For every 10 we add, launch a (single) thread that deletes the first 10 elements of the list (at the time the thread was launched).
I don't expect this to remove every list element, I was just interested in seeing if I could add to the end of a list while removing elements from the beginning. In Visual Studio I get a "list iterators incompatible" error quite often, but I figure the problem is cross platform.
What's wrong with my thinking? I know it's something
EDIT:
So I see now that this code is very incorrect. Really I just want one auxiliary thread active at a time to delete elements, which is why I though calling erase was ok. However I don't know how to declare a thread without joining it up, and if I wait for that then I don't really see the point of doing any of this.
Should I declare my thread before the loop and have it wait for a signal from the main thread?
To clarify, my goal here is to do the following: I want to grab keyboard presses on one thread and store them in a list, and every so often log them to a file on a separate thread while removing the things I've logged. Since I don't want to spend a lot of time writing to the disk, I'd like to write in discrete chunks (of 10).
Thanks to Christophe, and everyone else. Here's my code now... I may be using lock_guard incorrectly.
#include <thread>
#include <mutex>
#include <list>
#include <iostream>
#include <atomic>
using namespace std;
...
atomic<bool> running(false);
list<int> intList;
int busy(0), counter(0);
mutex m;
thread * t(nullptr);
for (int i = 0; i < 100000; ++i){
//Would a lock_guard here be inappropriate?
m.lock();
intList.push_back(i);
m.unlock();
counter++;
if (running){
busy++;
continue;
}
if (counter < 10)
continue;
counter = 0;
if (t){
t->join();
delete t;
}
t = new thread([&](){
running = true;
int removed(0);
while (removed < 10){
lock_guard<mutex> lock(m);
if (intList.size())
intList.erase(intList.begin());
removed++;
}
running = false;
});
}
if (t){
t->join();
delete t;
}
Your code won't work for because:
your mutex is local to each thread (each thread has it's own copy used only by itself: no chance of interthread synchronisation!)
intList is not an atomic type, but you access to it from several threads causing race conditions and undefined behaviour.
the begin and end that you send to your threads at their creation, might no longer be valid during the execution.
Here some improvements (look at the commented lines):
atomic<bool> running(false); // <=== atomic (to avoid unnecessary use of mutex)
int busy(0), counter(0);
mutex l; // define the mutex here, so that it will be the same for all threads
for (int i = 0; i < 10000; ++i){
l.lock(); // <===you need to protect each access to the list
intList.push_back(i);
l.unlock(); // <===and unlock
counter++;
if (running){
busy++;
continue;
}
if (counter < 10)
continue;
counter = 0;
threads.push_back(std::thread([&]
(){ //<====No iterator args as they might be outdated during executionof threads!!
running = true; // <=== no longer surrounded from lock/unlock as it is now atomic
int removed(0);
while (removed < 10){
l.lock(); // <====you really need to protect access to the list
if (intList.size()) // <=== check if elements exist NOW
intList.erase(intList.begin()); // <===use current data, not a prehistoric outdated local begin !!
l.unlock(); // <====end of protected section
removed++;
}
running = false; // <=== no longer surrounded from lock/unlock as it is now atomic
})); //<===No other arguments
}
...
By the way, I'd suggest that you have a look at lock_guard<mutex> for the locks, as these ensure the unlock in all circumstances (especially when there are exceptions or orhter surprises like this).
Edit: I've avoided the lock protection of running with a mutex, by making it atomic<bool>.
Related
I wrote a program that writes random numbers to one file in the first thread, and another thread reads them from there and writes to another file those that are prime numbers. The third thread is needed to stop/start the work. I read that I/O threads are thread-safe. Since writing to a single shared resource is thread-safe, what could be the problem?
Output: always correct record in numbers.log, sometimes no record in numbers_prime.log when there are prime numbers, sometimes they are all written.
#include <iostream>
#include <fstream>
#include <thread>
#include <mutex>
#include <vector>
#include <condition_variable>
#include <future>
#include <random>
#include <chrono>
#include <string>
using namespace std::chrono_literals;
std::atomic_int ITER_NUMBERS = 30;
std::atomic_bool _var = false;
bool ret() { return _var; }
std::atomic_bool _var_log = false;
bool ret_log() { return _var_log; }
std::condition_variable cv;
std::condition_variable cv_log;
std::mutex mtx;
std::mutex mt;
std::atomic<int> count{0};
std::atomic<bool> _FL = 1;
int MIN = 100;
int MAX = 200;
bool is_empty(std::ifstream& pFile) // function that checks if the file is empty
{
return pFile.peek() == std::ifstream::traits_type::eof();
}
bool isPrime(int n) // function that checks if the number is prime
{
if (n <= 1)
return false;
for (int i = 2; i <= sqrt(n); i++)
if (n % i == 0)
return false;
return true;
}
void Log(int min, int max) { // function that generates random numbers and writes them to a file numbers.log
std::string str;
std::ofstream log;
std::random_device seed;
std::mt19937 gen{seed()};
std::uniform_int_distribution dist{min, max};
log.open("numbers.log", std::ios_base::trunc);
for (int i = 0; i < ITER_NUMBERS; ++i, ++count) {
std::unique_lock<std::mutex> ulm(mtx);
cv.wait(ulm,ret);
str = std::to_string(dist(gen)) + '\n';
log.write(str.c_str(), str.length());
log.flush();
_var_log = true;
cv_log.notify_one();
//_var_log = false;
//std::this_thread::sleep_for(std::chrono::microseconds(500000));
}
log.close();
_var_log = true;
cv_log.notify_one();
_FL = 0;
}
void printCheck() { // Checking function to start/stop printing
std::cout << "Log to file? [y/n]\n";
while (_FL) {
char input;
std::cin >> input;
std::cin.clear();
if (input == 'y') {
_var = true;
cv.notify_one();
}
if (input == 'n') {
_var = false;
}
}
}
void primeLog() { // a function that reads files from numbers.log and writes prime numbers to numbers_prime.log
std::unique_lock ul(mt);
int number = 0;
std::ifstream in("numbers.log");
std::ofstream out("numbers_prime.log", std::ios_base::trunc);
if (is_empty(in)) {
cv_log.wait(ul, ret_log);
}
int oldCount{};
for (int i = 0; i < ITER_NUMBERS; ++i) {
if (oldCount == count && count != ITER_NUMBERS) { // check if primeLog is faster than Log. If it is faster, then we wait to continue
cv_log.wait(ul, ret_log);
_var_log = false;
}
if (!in.eof()) {
in >> number;
if (isPrime(number)) {
out << number;
out << "\n";
}
oldCount = count;
}
}
}
int main() {
std::thread t1(printCheck);
std::thread t2(Log, MIN, MAX);
std::thread t3(primeLog);
t1.join();
t2.join();
t3.join();
return 0;
}
This has nothing to do with the I/O stream thread safety. The shown code's logic is broken.
The shown code seems to follow a design pattern of breaking up a single logical algorithm into multiple pieces, and scattering them far and wide. This makes it more difficult to understand what it's doing. So let's rewrite a little bit of it, to make the logic more clear. In primeLog let's do this instead:
cv_log.wait(ul, []{ return _var_log; });
_var_log = false;
It's now more clear that this waits for _var_log to be set, before proceeding on its merry way. Once it is it gets immediately reset.
The code that follows reads exactly one number from the file, before looping back here. So, primeLog's main loop will always handle exactly one number, on each iteration of the loop.
The problem now is very easy to see, once we head over to the other side, and do the same clarification:
std::unique_lock<std::mutex> ulm(mtx);
cv.wait(ulm,[]){ return _var; });
// Code that generates one number and writes it to the file
_var_log = true;
cv_log.notify_one();
Once _var is set to true, it remains true. This loops starts running full blast, iterating continuously. On each iteration of the loop it blindly sets _var_log to true and signals the other thread's condition variable.
C++ execution threads are completely independent of each other unless they are explicitly synchronize in some way.
Nothing is preventing this loop from running full blast, getting through its entire number range, before the other execution thread wakes up and decides to read the first number from the file. It'll do that, then go back and wait for its condition variable to be signaled again, for the next number. Its hopes and dreams of the 2nd number will be left unsatisfied.
On each iteration of the generating thread's loop the condition variable, for the other execution thread, gets signaled.
Condition variables are not semaphores. If nothing is waiting on a condition variable when it's signaled -- too bad. When some execution thread decides to wait on a condition variable, it may or may not be immediately woken up.
One of these two execution thread relies on it receiving a condition variable notification for every iteration of its loop.
The logic in the other execution thread fails to implement this guarantee. This may not be the only flaw, there might be others, subject to further analysis, this was just the most apparent logical flaw.
Thanks to those who wrote about read-behind-write, now I know more. But that was not the problem. The main problem was that if it was a new file, when calling pFile.peek() in the is_empty function, we permanently set the file flag to eofbit. Thus, until the end of the program in.rdstate() == std::ios_base::eofbit.
Fix: reset the flag state.
if (is_empty(in)) {
cv_log.wait(ul, ret_log);
}
in.clear(); // reset state
There was also a problem with the peculiarity of reading/writing one file from different threads, though it was not the cause of my program error, but it led to another one.
Because if when I run the program again primeLog() opens std::ifstream in("numbers.log") for reading faster than log.open("numbers.log", std::ios_base::trunc), then in will save old data into its buffer faster than log.open will erase them with the std::ios_base::trunc flag. Hence we will read and write to numbers_prime.log the old data.
Suppose that I have a program that has a worker-thread that squares number from a queue. The problem is that if the work is to light (takes to short time to do), the worker finishes the work and notifies the main thread before it have time to even has time to wait for the worker to finish.
My simple program looks as follows:
#include <atomic>
#include <condition_variable>
#include <queue>
#include <thread>
std::atomic<bool> should_end;
std::condition_variable work_to_do;
std::mutex work_to_do_lock;
std::condition_variable fn_done;
std::mutex fn_done_lock;
std::mutex data_lock;
std::queue<int> work;
std::vector<int> result;
void worker() {
while(true) {
if(should_end) return;
data_lock.lock();
if(work.size() > 0) {
int front = work.front();
work.pop();
if (work.size() == 0){
fn_done.notify_one();
}
data_lock.unlock();
result.push_back(front * front);
} else {
data_lock.unlock();
// nothing to do, so we just wait
std::unique_lock<std::mutex> lck(work_to_do_lock);
work_to_do.wait(lck);
}
}
}
int main() {
should_end = false;
std::thread t(worker); // start worker
data_lock.lock();
const int N = 10;
for(int i = 0; i <= N; i++) {
work.push(i);
}
data_lock.unlock();
work_to_do.notify_one(); // notify the worker that there is work to do
//if the worker is quick, it signals done here already
std::unique_lock<std::mutex> lck(fn_done_lock);
fn_done.wait(lck);
for(auto elem : result) {
printf("result = %d \n", elem);
}
work_to_do.notify_one(); //notify the worker so we can shut it down
should_end = true;
t.join();
return 0;
}
Your try to use notification itself over conditional variable as a flag that job is done is fundamentally flawed. First and foremost std::conditional_variable can have spurious wakeups so it should not be done this way. You should use your queue size as an actual condition for end of work, check and modify it under the same mutex protected in all threads and use the same mutex lock for condition variable. Then you may use std::conditional_variable to wait until work is done but you do it after you check queue size and if work is done at the moment you do not go to wait at all. Otherwise you check queue size in a loop (because of spurious wakeups) and wait if it is still not empty or you use std::condition_variable::wait() with a predicate, that has the loop internally.
Suppose you wish you run a section in parallel, then merge back into the main thread then back to section in parallel, and so on. Similar to the childhood game red light green light.
I've given an example of what I'm trying to do, where I'm using a conditional variable to block the threads at the start but wish to start them all in parallel but then block them at the end so they can be printed out serially. The *= operation could be a much larger operation spanning many seconds. Reusing the threads is also important. Using a task queue might be too heavy.
I need to use some kind of blocking construct that isn't just a plain busy loop, because I know how to solve this problem with busy loops.
In English:
Thread 1 creates 10 threads that are blocked
Thread 1 signals all threads to start (without blocking eachother)
Thread 2-11 process their exclusive memory
Thread 1 is waiting until 2-11 are complete (can use an atomic to count here)
Thread 2-11 complete, each can notify for 1 to check its condition if necessary
Thread 1 checks its condition and prints the array
Thread 1 resignals 2-11 to process again, continuing from 2
Example code (Naive adapted from example on cplusplus.com):
// condition_variable example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable
#include <atomic>
std::mutex mtx;
std::condition_variable cv;
bool ready = false;
std::atomic<int> count(0);
bool end = false;
int a[10];
void doublea (int id) {
while(!end) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
a[id] *= 2;
count.fetch_add(1);
}
}
void go() {
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
ready = false; // Naive
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
ready = true;
cv.notify_all();
ready = false;
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
end = true;
cv.notify_all();
}
int main () {
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i) {
a[i] = 0;
threads[i] = std::thread(doublea,i);
}
std::cout << "10 threads ready to race...\n";
go(); // go!
return 0;
}
This is not as trivial to implement it efficiently. Moreover, it does not make any sense unless you are learning this subject. Conditional variable is not a good choice here because it does not scale well.
I suggest you to look how mature run-time libraries implement fork-join parallelism and learn from them or use them in your app. See http://www.openmprtl.org/, http://opentbb.org/, https://www.cilkplus.org/ - all these are open-source.
OpenMP is the closest model for what you are looking for and it has the most efficient implementation of fork-join barriers. Though, it has its disadvantages because it is designed for HPC and lacks dynamic composability. TBB and Cilk work best for nested parallelism and usage in modules and libraries which can be used in context of external parallel regions.
You can use barrier or condition variable to start all threads. Then thread one can wait to when all threads end their work (by join method on all threads, it is blocking) and then print in one for loop their data.
I have some code that is trying to run some intense matrix processing, so I thought it would be faster if I multithreaded it. However, what my intention is is to keep the thread alive so that it can be used in the future for more processing. Here is the problem, the multithreaded version of the code runs slower than a single thread, and I believe the problem lies with the way I signal/keep my threads alive.
I am using pthreads on Windows and C++. Here is my code for the thread, where runtest() is the function where the matrix calculations happen:
void* playQueue(void* arg)
{
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break;
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_exit(NULL);
}
The playQueue() function is the one passed to the pthread, and what I have as of now, is that there is a queue (testQueue) of lets say 1000 items, and there are 100 threads. Each thread will continue to run until the queue is empty (hence the stuff inside the mutex).
I believe that the reason the multithread runs so slow is because of something called false sharing (i think?) and my method of signaling the thread to call runtest() and keeping the thread alive is poor.
What would be an effective way of doing this so that the multithreaded version will run faster (or at least equally as fast) as an iterative version?
HERE IS THE FULL VERSION OF MY CODE (minus the matrix stuff)
# include <cstdlib>
# include <iostream>
# include <cmath>
# include <complex>
# include <string>
# include <pthread.h>
# include <queue>
using namespace std;
# include "matrix_exponential.hpp"
# include "test_matrix_exponential.hpp"
# include "c8lib.hpp"
# include "r8lib.hpp"
# define NUM_THREADS 3
int main ( );
int counter;
queue<int> testQueue;
queue<int> anotherQueue;
void *playQueue(void* arg);
void runtest();
void matrix_exponential_test01 ( );
void matrix_exponential_test02 ( );
pthread_mutex_t anotherLock;
pthread_mutex_t queueLock;
pthread_cond_t queue_cv;
int main ()
{
counter = 0;
/* for (int i=0;i<1; i++)
for(int j=0; j<1000; j++)
{
runtest();
cout << counter << endl;
}*/
pthread_t threads[NUM_THREADS];
pthread_mutex_init(&queueLock, NULL);
pthread_mutex_init(&anotherLock, NULL);
pthread_cond_init (&queue_cv, NULL);
for(int z=0; z<1000; z++)
{
testQueue.push(1);
}
for( int i=0; i < NUM_THREADS; i++ )
{
pthread_create(&threads[i], NULL, playQueue, (void*)NULL);
}
while(anotherQueue.size()<NUM_THREADS)
{
}
cout << counter;
pthread_mutex_destroy(&queueLock);
pthread_cond_destroy(&queue_cv);
pthread_cancel(NULL);
cout << counter;
return 0;
}
void* playQueue(void* arg)
{
while(true)
{
cout<<counter<<endl;
pthread_mutex_lock(&queueLock);
if(testQueue.empty()){
pthread_mutex_unlock(&queueLock);
break;
}
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
pthread_mutex_lock(&anotherLock);
anotherQueue.push(1);
pthread_mutex_unlock(&anotherLock);
pthread_exit(NULL);
}
void runtest()
{
counter++;
matrix_exponential_test01 ( );
matrix_exponential_test02 ( );
}
So in here the "matrix_exponential_tests" are taken from this website with permission and is where all of the matrix math occurs. The counter is just used to debug and make sure all the instances are running.
Doesn't it stuck ?
while(true)
{
pthread_mutex_lock(&queueLock);
if(testQueue.empty())
break; //<----------------you break without unlock the mutex...
else
testQueue.pop();
pthread_mutex_unlock(&queueLock);
runtest();
}
The section between lock and unlock run slower than if it was in single thread.
mutexes are slowing you down. you should lock only the critical section, and if you want to speed it up, try not use mutex at all.
You can do it by supplying the test via function argument rather than use the queue.
one way to avoid using the mutex is to use a vector without deleting and std::atomic_int (c++11) as the index (or to lock only getting the current index and the increment)
or use iterator like this:
vector<test> testVector;
vector<test>::iterator it;
//when it initialized to:
it = testVector.begin();
now your loop can be like this:
while(true)
{
vector<test>::iterator it1;
pthread_mutex_lock(&queueLock);
it1 = (it==testVector.end())? it : it++;
pthread_mutex_unlock(&queueLock);
//now you outside the critical section:
if(it==testVector.end())
break;
//you don't delete or change the vector
//so you can use the it1 iterator freely
runtest();
}
I am relatively new to threads, and I'm still learning best techniques and the C++11 thread library. Right now I'm in the middle of implementing a worker thread which infinitely loops, performing some work. Ideally, the main thread would want to stop the loop from time to time to sync with the information that the worker thread is producing, and then start it again. My idea initially was this:
// Code run by worker thread
void thread() {
while(run_) {
// Do lots of work
}
}
// Code run by main thread
void start() {
if ( run_ ) return;
run_ = true;
// Start thread
}
void stop() {
if ( !run_ ) return;
run_ = false;
// Join thread
}
// Somewhere else
volatile bool run_ = false;
I was not completely sure about this so I started researching, and I discovered that volatile is actually not required for synchronization and is in fact generally harmful. Also, I discovered this answer, which describes a process nearly identical to the one I though about. In the answer's comments however, this solution is described as broken, as volatile does not guarantee that different processor cores readily (if ever) communicate changes on the volatile values.
My question is this then: Should I use an atomic flag, or something else entirely? What exactly is the property that is lacking in volatile and that is then provided by whatever construct is needed to solve my problem effectively?
Have you looked for the Mutex ? They're made to lock the Threads avoiding conflicts on the shared data. Is it what you're looking for ?
I think you want to use barrier synchronization using std::mutex?
Also take a look at boost thread, for a relatively high level threading library
Take a look at this code sample from the link:
#include <iostream>
#include <map>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
std::map<std::string, std::string> g_pages;
std::mutex g_pages_mutex;
void save_page(const std::string &url)
{
// simulate a long page fetch
std::this_thread::sleep_for(std::chrono::seconds(2));
std::string result = "fake content";
g_pages_mutex.lock();
g_pages[url] = result;
g_pages_mutex.unlock();
}
int main()
{
std::thread t1(save_page, "http://foo");
std::thread t2(save_page, "http://bar");
t1.join();
t2.join();
g_pages_mutex.lock(); // not necessary as the threads are joined, but good style
for (const auto &pair : g_pages) {
std::cout << pair.first << " => " << pair.second << '\n';
}
g_pages_mutex.unlock();
}
I would suggest to use std::mutex and std::condition_variable to solve the problem. Here's an example how it can work with C++11:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
using namespace std;
int main()
{
mutex m;
condition_variable cv;
// Tells, if the worker should stop its work
bool done = false;
// Zero means, it can be filled by the worker thread.
// Non-zero means, it can be consumed by the main thread.
int result = 0;
// run worker thread
auto t = thread{ [&]{
auto bound = 1000;
for (;;) // ever
{
auto sum = 0;
for ( auto i = 0; i != bound; ++i )
sum += i;
++bound;
auto lock = unique_lock<mutex>( m );
// wait until we can safely write the result
cv.wait( lock, [&]{ return result == 0; });
// write the result
result = sum;
// wake up the consuming thread
cv.notify_one();
// exit the loop, if flag is set. This must be
// done with mutex protection. Hence this is not
// in the for-condition expression.
if ( done )
break;
}
} };
// the main threads loop
for ( auto i = 0; i != 20; ++i )
{
auto r = 0;
{
// lock the mutex
auto lock = unique_lock<mutex>( m );
// wait until we can safely read the result
cv.wait( lock, [&]{ return result != 0; } );
// read the result
r = result;
// set result to zero so the worker can
// continue to produce new results.
result = 0;
// wake up the producer
cv.notify_one();
// the lock is released here (the end of the scope)
}
// do time consuming io at the side.
cout << r << endl;
}
// tell the worker to stop
{
auto lock = unique_lock<mutex>( m );
result = 0;
done = true;
// again the lock is released here
}
// wait for the worker to finish.
t.join();
cout << "Finished." << endl;
}
You could do the same with std::atomics by essentially implementing spin locks. Spin locks can be slower than mutexes. So I repeat the advise on the boost website:
Do not use spinlocks unless you are certain that you understand the consequences.
I believe that mutexes and condition variables are the way to go in your case.