C++ threads inside a 'for' loop print wrong values - c++

I'm trying to understand multithreading in C++, but I’m stuck in this problem: if I launch threads in a for loop they print wrong values. This is the code:
#include <iostream>
#include <list>
#include <thread>
void print_id(int id){
printf("Hello from thread %d\n", id);
}
int main() {
int n=5;
std::list<std::thread> threads={};
for(int i=0; i<n; i++ ){
threads.emplace_back(std::thread([&](){ print_id(i); }));
}
for(auto& t: threads){
t.join();
}
return 0;
}
I was expecting to get printed the values 0,1,2,3,4 but I often got the same value twice. This is the output:
Hello from thread 2
Hello from thread 3
Hello from thread 3
Hello from thread 4
Hello from thread 5
What am I missing?

The [&] syntax is causing i to be captured by reference. So quite often therefore i will be further advanced when the thread runs than you might expect. More seriously, the behaviour of your code is undefined if i goes out of scope before a thread runs.
Capturing i by value - i.e. std::thread([i](){ print_id(i); }) is the fix.

Two problems:
You have no control over when the thread runs, which means the value of the variable i in the lambda might not be what you expect.
The variable i is local for the loop and the loop only. If the loop finishes before one or more thread runs, those threads will have an invalid reference to a variable whose lifetime have ended.
You can solve both these problems very simply by capturing the variable i by value instead of by reference. That means each thread will have a copy of the value, and that copy will be made uniquely for each thread.

Another thing:
Do not wait until to have always an ordered sequence: 0, 1, 2, 3, ... because the multithreading execution mode has a specificity: indeterminism.
Indeterminism means that the execution of the same program, under the same conditions, gives a different result.
This is due to the fact that the OS schedules threads differently from one execution to another depending on several parameters: CPU load, priority of other processes, possible system interruptions, etc.
Your example contains only five threads, so it's simple. Try to increase the number of threads, and for example put a sleep in the processing function. You will see that the result can be different from one execution to another.

Related

why is there a race condition in this multithreading snippet

I have this code in c++ using multithreading but I am unsure why I am getting the output I am getting.
void Fun(int* var) {
int myID;
myID = *var;
std::cout << "Thread ID: " << myID << std::endl;
}
int main()
{
using ThreadVector = std::vector<std::thread>;
ThreadVector tv;
std::cout << std::thread::hardware_concurrency() << std::endl;
for (int i = 0; i < 3 ; ++i)
{
auto th = std::thread(&Fun, &i);
tv.push_back(std::move(th));
}
for (auto& elem : tv)
{
elem.join();
}
}
I am wondering if there is a race condition for the i variable, and if so, how does it interleave? I tried to compile it and I constantly got the Thread ID printout as 3, but I was surprised because I thought the variable had to be global in order to be accessed by the various new threads?
This is what I thought would happen: thread 1 is created, Fun starts to run in thread 1 with myid = 0, main thread continues running and increments i, 2nd thread is created and the myid for that would be myid=1... and so on. And so the printout would be the myID in increments i/e 1,2,3
I know that I can solve this with std::lock_guard but I am just wondering how is the interleaving (LOAD, INCREMENT,STORE) happening that causes this race condition for the i variable.
Kind help is appreciated thank you!
I am wondering if there is a race condition for the i variable
Yes, most definitely. The parent thread writes to i, which is a non-atomic variable, and the child threads read it, without any intervening synchronization. That's the exact definition of a data race in C++.
and if so, how does it interleave?
Data races in C++ cause undefined behavior, and any behavior you may observe does not have to be explainable by interleaving.
I tried to compile it and I constantly got the Thread ID printout as 3, but I was surprised because I thought the variable had to be global in order to be accessed by the various new threads?
No, it doesn't have to be global. Threads can access variables which are local to other threads if they are somehow passed a pointer or reference to such a variable.
This is what I thought would happen: thread 1 is created, Fun starts to run in thread 1 with myid = 0, main thread continues running and increments i, 2nd thread is created and the myid for that would be myid=1... and so on. And so the printout would be the myID in increments i/e 1,2,3
Well, nothing at all in your program forces those events to occur (or become observable) in that order, so there is really no basis for expecting that they will. It's entirely possible, for instance, that the three threads all get started, but don't get a chance to actually run until after the loop in main has completed, at which point i has the value 3. (Or rather, the memory where i used to be located, as it is now out of scope and its lifetime has ended - it's a separate bug that you don't prevent that from happening.)
This is the version of the code that would not exhibit a data race:
#include <iostream>
#include <thread>
#include <vector>
// since `id` is passed by value, each thread will work on its own copy and no
// data race is possible
void fun(int id) { std::cout << "thread id: " << id << "\n"; }
int main() {
std::vector<std::thread> threads;
for (auto id = 0; id < 3; ++id) {
threads.emplace_back(fun, id);
}
for (auto& thread : threads) {
thread.join();
}
}
Since each thread receives a copy of the variable id, there is no race (except for the scrambled output due to unsynchronized std::cout, but I assume that's not part of this discussion).
Variables do not need to be global for their use in multiple threads. In fact, global variables often make it more difficult or even practically impossible to write multithreaded code, since there is no guarantee that every read and write will be appropriately synchronized.

What happens when a thread is constructed, and how is the thread executed

I'm completely new to multithreading and have a little trouble understanding how multithreading actually works.
Let's consider the following example of code. The program simply takes file names as input and counts the number of lowercase letters in them.
#include <iostream>
#include <thread>
#include <mutex>
#include <memory>
#include <vector>
#include <string>
#include <fstream>
#include <ctype.h>
class LowercaseCounter{
public:
LowercaseCounter() :
total_count(0)
{}
void count_lowercase_letters(const std::string& filename)
{
int count = 0;
std::ifstream fin(filename);
char a;
while (fin >> a)
{
if (islower(a))
{
std::lock_guard<std::mutex> guard(m);
++total_count;
}
}
}
void print_num() const
{
std::lock_guard<std::mutex> guard(m);
std::cout << total_count << std::endl;
}
private:
int total_count;
mutable std::mutex m;
};
int main(){
std::vector<std::unique_ptr<std::thread>> threads;
LowercaseCounter counter;
std::string line;
while (std::cin >> line)
{
if (line == "exit")
break;
else if (line == "print")
counter.print_num(); //I think that this should print 0 every time it's called.
else
threads.emplace_back(new std::thread(&LowercaseCounter::count_lowercase_letters, counter, line));
}
for (auto& thread : threads)
thread->join();
}
Firstly I though that the output of counter.print_num() will print 0 as far as the threads are not 'joined' yet to execute the functions. However, It turns out that the program works correctly and the output of counter.print_num() is not 0. So I asked myself the following questions.
What actually happens when a thread is constructed?
If the program above works fine, then thread must be executed when is created, then what does std::thread::join method do?
If the thread is executed at the time of creation, then what's the point of using multithreading in this example?
Thanks in advance.
You seem to be under the impression that the program can only be running one thread at a time, and that it needs to interrupt whatever it's doing in order to execute the code of the thread. That's not the case.
You can think of a thread as a completely separate program that happens to share memory and resources with the program that created it. The function you pass as an argument is that program's 'main()` for every intent and purpose. In Linux, threads are literally separate processes, but as far as C++ is concerned, that's just an implementation detail.
So, in a modern operating system with preemptive multitasking, much like multiple programs can run at the same time, threads can also run at the same time. Note that I say can, it's up to the compiler and OS to decide when to give CPU time to each thread.
then what does std::thread::join method do?
It just waits until the thread is done.
So what would happen if I didn't call join() method for each one of threads
It would crash upon reaching the end of main() because attempting to exit the program without joining a non-detached thread is considered an error.
As you said, in c++ the thread is executed when it is created all std::thread::join does is wait for the thread to finish execution.
In your code all the threads will start executing simultaneously in the loop and then the main thread will wait for each thread to finish execution in the next loop.

use std::thread and join for parallelism

I'm making a script that iterates through all chromosomes of a fasta file and splitting it into pieces of 10 bp, the function is called chrdata and i am saving these fragments into a single file. This fragmentation can occur on each chromosome individually completely separate for the other chromosomes, as such i'm trying threads.
chrdata(faidx_t *seq_ref ,int chr_no,FILE *fp)
My goal is wish to make this process faster. To achieve this i have tried multi-threading with the std::thread function.
I have tried different things.
First i tried to create a thread for the first chromosome and then thread.join() then the next thread for next chromosome and so on.
Then i tried to create multiple threads at once, like explained in Simultaneous Threads in C++ using <thread>
This is the example below.
However as far as I understand and that I can read, I always need to use join otherwise I'll end up with "terminate called without an active exception". The issue is there is no time execution difference between example (1) and (2).
Based on my understanding its becuase despite of creating a vector with thread object they still have to join and thus wait for all the threads to execute. This means this would be concurrent execution and not parallele.
So my question is: Would anyone be able to give me suggestions to the function below where i might change to make the execution faster by using parallele execution?
Or is my understanding of join and concurrent wrong in this instance? I'm not completely sure why we cannot just skip the whole join part, if all the threads are done, why cant we just use detach()?
void function(const char* fastafile,FILE *fp,int thread_no) {
std::vector<std::thread> threads;
//extracting the chromosome file
faidx_t *seq_ref = NULL;
seq_ref = fai_load(fastafile);
assert(seq_ref!=NULL);
int chr_total = 10; //just the first 10 chromosomes
int chr_idx = 0;
int chr_no = 0;
while(chr_idx < chr_total){
for (chr_no; chr_no < std::min(chr_idx+thread_no,chr_total);chr_no++){
threads.push_back(std::thread(chrdata,seq_ref,chr_no,fp));
}
for (auto &th : threads) { th.join(); }
threads.clear();
chr_idx = chr_idx + thread_no;
}
}
I havent attacked main() or chrdata() to make the code and question more clear.
pastebin.com/iY6u9CbH

How to maintain certain frame rate in different threads

I have two different computational tasks that have to execute at certain frequencies. One has to be performed every 1ms and the other every 13.3ms. The tasks share some data.
I am having a hard time how to schedule these tasks and how to share data between them. One way that I thought might work is to create two threads, one for each task.
The first task is relatively simpler and can be handled in 1ms itself. But, when the second task (that is relatively more time-consuming) is going to launch, it will make a copy of the data that was just used by task 1, and continue to work on them.
Do you think this would work? How can it be done in c++?
There are multiple ways to do that in C++.
One simple way is to have 2 threads, as you described. Each thread does its action and then sleeps till the next period start. A working example:
#include <functional>
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>
#include <mutex>
std::mutex mutex;
std::atomic<bool> stop = {false};
unsigned last_result = 0; // Whatever thread_1ms produces.
void thread_1ms_action() {
// Do the work.
// Update the last result.
{
std::unique_lock<std::mutex> lock(mutex);
++last_result;
}
}
void thread_1333us_action() {
// Copy thread_1ms result.
unsigned last_result_copy;
{
std::unique_lock<std::mutex> lock(mutex);
last_result_copy = last_result;
}
// Do the work.
std::cout << last_result_copy << '\n';
}
void periodic_action_thread(std::chrono::microseconds period, std::function<void()> const& action) {
auto const start = std::chrono::steady_clock::now();
while(!stop.load(std::memory_order_relaxed)) {
// Do the work.
action();
// Wait till the next period start.
auto now = std::chrono::steady_clock::now();
auto iterations = (now - start) / period;
auto next_start = start + (iterations + 1) * period;
std::this_thread::sleep_until(next_start);
}
}
int main() {
std::thread a(periodic_action_thread, std::chrono::milliseconds(1), thread_1ms_action);
std::thread b(periodic_action_thread, std::chrono::microseconds(13333), thread_1333us_action);
std::this_thread::sleep_for(std::chrono::seconds(1));
stop = true;
a.join();
b.join();
}
If executing an action takes longer than one period to execute, then it sleeps till the next period start (skips one or more periods). I.e. each Nth action happens exactly at start_time + N * period, so that there is no time drift regardless of how long it takes to perform the action.
All access to the shared data is protected by the mutex.
So I'm thinking that task1 needs to make the copy, because it knows when it is safe to do so. Here is one simplistic model:
Shared:
atomic<Result*> latestResult = {0};
Task1:
Perform calculation
Result* pNewResult = new ResultBuffer
Copy result to pNewResult
latestResult.swap(pNewResult)
if (pNewResult)
delete pNewResult; // Task2 didn't take it!
Task2:
Result* pNewResult;
latestResult.swap(pNewResult);
process result
delete pNewResult;
In this model task1 and task2 only ever naggle when swapping a simple atomic pointer, which is quite painless.
Note that this makes many assumptions about your calculation. Could your task1 usefully calculate the result straight into the buffer, for example? Also note that at the start Task2 may find the pointer is still null.
Also it inefficiently new()s the buffers. You need 3 buffers to ensure there is never any significant naggling between the tasks, but you could just manage three buffer pointers under mutexes, such that Task 1 will have a set of data ready, and be writing another set of data, while task 2 is reading from a third set.
Note that even if you have task 2 copy the buffer, Task 1 still needs 2 buffers to avoid stalls.
You can use C++ threads and thread facilities like class thread and timer classes like steady_clock like it has been described in previous answer but if this solution works strongly depends on the platform your code is running on.
1ms and 13.3ms are pretty short time intervals and if your code is running on non-real time OS like Windows or non-RTOS Linux, there is no guarantee that OS scheduler will wake up your threads at exact times.
C++ 11 has the class high_resolution_clock that should use high resolution timer if your platform supports one but it still depends on the implementation of this class. And the bigger problem than the timer is using C++ wait functions. Neither C++ sleep_until nor sleep_for guarantees that they will wake up your thread at specified times. Here is the quote from C++ documentation.
sleep_for - blocks the execution of the current thread for at least the specified sleep_duration. sleep_for
Fortunately, most OS have some special facilities like Windows Multimedia Timers you can use if your threads are not woken up at expected times.
Here are more details. Precise thread sleep needed. Max 1ms error

Why does my thread not run in background?

In the listing bellow, I expect that as I call t.detach() right after the line when the thread is created, the thread t will run in background while the printf("quit the main function now \n") will called and thenmain will exit.
#include <thread>
#include <iostream>
void hello3(int* i)
{
for (int j = 0; j < 100; j++)
{
*i = *i + 1;
printf("From new thread %d \n", *i);
fflush(stdout);
}
char c = getchar();
}
int main()
{
int i;
i = 0;
std::thread t(hello3, &i);
t.detach();
printf("quit the main function now \n");
fflush(stdout);
return 0;
}
However from what it prints out on the screen it is not the case. It prints
From new thread 1
From new thread 2
....
From new thread 99
quit the main function now.
It looks like the main function waits until the thread finishes before it executes the commandprintf("quit the main function now \n"); and exits.
Can you please explain why it is? What I am missing here?
The problem is that your thread is too fast.
It's able to print all 100 strings before main gets a chance to continue.
Try making the thread slower and you'll see the main printf before that of the thread.
It happens, based on your OS scheduling. Moreover the speed of your thread affects the output too. If you stall the thread (change 100 to 500 for example), you will see the message first.
I just executed the code and the "quit the main function now." message appeared first, like this:
quit the main function now
From new thread 1
From new thread 2
From new thread 3
From new thread 4
...
You are right about detach:
Detaches the thread represented by the object from the calling thread, allowing them to execute independently from each other.
but this does not guarantee that the message "quit the main function now" will appear first, although it's very likely.
It looks like the main function waits until the thread finishes before it executes the command printf("quit the main function now \n"); and exits.
That's because when you create a thread, it gets scheduled for execution, but the order of events across threads is no longer sequential, ordered, or deterministic. In some runs of your program, the output of hello3 will occur before quit the main function now, in some runs it'll print afterwards, and in some runs, the output will be interleaved. This is a form of Undefined Behavior normally referred to as a "Race Condition". In most (but not all) cases, the output of hello3 prints last because there's some overhead (varies by the OS and processor) in setting up a thread, so the several microseconds it takes to properly build the thread and ready it for execution takes so long that the printf statement in your main function already had time to execute and flush before the thread was ready to run.
If you want explicit evidence that things are running concurrently, you should add more work into the main thread before the quit statement so that it becomes unlikely that the main function will finish before the thread is ready to start executing.
There are two major problems with your code:
As soon the main function exits, you will have undefined behaviour because the variable i, referenced by the detached thread, no longer exists.
As soon the whole process ends after the main function returns, the detached thread will also die.