How do I create different number of threads in c++? - c++

In my program, I want to get number of threads from user. For example, user enters number of threads as 5, i want to create 5 threads. It is only needed in the beginning of the program. I don't need to change number of threads during the program. So, i write the code such as;
int numberOfThread;
cout << "Enter number of threads: " ;
cin >> numberOfThread;
for(int i = 0; i < numberOfThread; i++)
{
pthread_t* mythread = new pthread_t;
pthread_create(&mythread[i],NULL, myThreadFunction, NULL);
}
for(int i = 0; i < numberOfThread; i++)
{
pthread_join(mythread[i], NULL);
}
return 0;
but i have an error in this line pthread_join(mythread[i], NULL);
error: ‘mythread’ was not declared in this scope.
What is wrong in this code?
and do you have a better idea to create user defined number of thread?

First, you have a memory leak when creating threads because you allocate memory but then loose the reference to it.
I suggest you the following: create an std::vector of std::threads (so, don't use pthread_t at all) and then you can have something like:
std::vector<std::thread> threads;
for (std::size_t i = 0; i < numberOfThread; i++) {
threads.emplace_back(myThreadFunction, 1);
}
for (auto& thread : threads) {
thread.join();
}
if your myThreadFunction looks like:
void myThreadFunction(int n) {
std::cout << n << std::endl; // output: 1, from several different threads
}

Related

Implementation of a lock free vector

After several searches, I cannot find a lock-free vector implementation.
There is a document that speaks about it but nothing concrete (in any case I have not found it). http://pirkelbauer.com/papers/opodis06.pdf
There are currently 2 threads dealing with arrays, there may be more in a while.
One thread that updates different vectors and another thread that accesses the vector to do calculations, etc. Each thread accesses the different array a large number of times per second.
I implemented a lock with mutex on the different vectors but when the reading or writing thread takes too long to unlock, all further updates are delayed.
I then thought of copying the array all the time to go faster, but copying thousands of times per second an array of thousands of elements doesn't seem great to me.
So I thought to use 1 mutex per value in each table to lock only the value I am working on.
A lock-free could be better but I can not find a solution and I wonder if the performances would be really better.
EDIT:
I have a thread that receives data and ranges in vectors.
When I instantiate the structure, I use a fixed size.
I have to do 2 different things for the updates:
-Update vector elements. (1d vector which simulates a 2d vector)
-Add a line at the end of the vector and remove the first line. The array always remains sorted. Adding elements is much much rarer than updating
The thread that is read-only walks the array and performs calculations.
To limit the time spent on the array and do as little calculation as possible, I use arrays that store the result of my calculations. Despite this, I often have to scan the table enough to do new calculations or just update them. (the application is in real-time so the calculations to be made vary according to the requests)
When a new element is added to the vector, the reading thread must directly use it to update the calculations.
When I say calculation, it is not necessarily only arithmetic, it is more a treatment to be done.
There is no perfect implementation to run concurrency, each task has it's own good enogh. My goto method to find a decent implementation is to only alow what is needed and then check if i would need somthing more in the future.
You described a quite simple scenario, one thread one accion to a shared vector, then the vector needs to tell if the acction is alowed soo std::atomic_flag is good enogh.
This example shuld give you an idea on how it works and what to expent. Mainly i just attached a flag to eatch array and checkt it before to see if is safe to do somthing and some people like to add a guard to the flag, just in case.
#include <iostream>
#include <thread>
#include <atomic>
#include <chrono>
const int vector_size = 1024;
struct Element {
void some_yield(){
std::this_thread::yield();
};
void some_wait(){
std::this_thread::sleep_for(
std::chrono::microseconds(1)
);
};
};
Element ** data;
std::atomic_flag * vector_safe;
bool alive = true;
uint32_t c_down_time = 0;
uint32_t p_down_time = 0;
uint32_t c_intinerations = 0;
uint32_t p_intinerations = 0;
std::chrono::high_resolution_clock::time_point c_time_point;
std::chrono::high_resolution_clock::time_point p_time_point;
int simple_consumer_work(){
Element a_read;
uint16_t i, e;
while (alive){
// Loops thru the vectors
for (i=0; i < vector_size; i++){
// locks the thread untin the vector
// at index i is free to read
while (!vector_safe[i].test_and_set()){}
// Do the watherver
for (e=0; e < vector_size; e++){
a_read = data[i][e];
}
// And signal that this vector is done
vector_safe[i].clear();
}
}
return 0;
};
int simple_producer_work(){
uint16_t i;
while (alive){
for (i=0; i < vector_size; i++){
while (!vector_safe[i].test_and_set()){}
data[i][i].some_wait();
vector_safe[i].clear();
}
p_intinerations++;
}
return 0;
};
int consumer_work(){
Element a_read;
uint16_t i, e;
bool waiting;
while (alive){
for (i=0; i < vector_size; i++){
waiting = false;
c_time_point = std::chrono::high_resolution_clock::now();
while (!vector_safe[i].test_and_set(std::memory_order_acquire)){
waiting = true;
}
if (waiting){
c_down_time += (uint32_t)std::chrono::duration_cast<std::chrono::nanoseconds>
(std::chrono::high_resolution_clock::now() - c_time_point).count();
}
for (e=0; e < vector_size; e++){
a_read = data[i][e];
}
vector_safe[i].clear(std::memory_order_release);
}
c_intinerations++;
}
return 0;
};
int producer_work(){
bool waiting;
uint16_t i;
while (alive){
for (i=0; i < vector_size; i++){
waiting = false;
p_time_point = std::chrono::high_resolution_clock::now();
while (!vector_safe[i].test_and_set(std::memory_order_acquire)){
waiting = true;
}
if (waiting){
p_down_time += (uint32_t)std::chrono::duration_cast<std::chrono::nanoseconds>
(std::chrono::high_resolution_clock::now() - p_time_point).count();
}
data[i][i].some_wait();
vector_safe[i].clear(std::memory_order_release);
}
p_intinerations++;
}
return 0;
};
void print_time(uint32_t down_time){
if ( down_time <= 1000) {
std::cout << down_time << " [nanosecods] \n";
} else if (down_time <= 1000000) {
std::cout << down_time / 1000 << " [microseconds] \n";
} else if (down_time <= 1000000000) {
std::cout << down_time / 1000000 << " [miliseconds] \n";
} else {
std::cout << down_time / 1000000000 << " [seconds] \n";
}
};
int main(){
std::uint16_t i;
std::thread consumer;
std::thread producer;
vector_safe = new std::atomic_flag [vector_size] {ATOMIC_FLAG_INIT};
data = new Element * [vector_size];
for(i=0; i < vector_size; i++){
data[i] = new Element;
}
consumer = std::thread(consumer_work);
producer = std::thread(producer_work);
std::this_thread::sleep_for(
std::chrono::seconds(10)
);
alive = false;
producer.join();
consumer.join();
std::cout << " Consumer loops > " << c_intinerations << std::endl;
std::cout << " Consumer time lost > "; print_time(c_down_time);
std::cout << " Producer loops > " << p_intinerations << std::endl;
std::cout << " Producer time lost > "; print_time(p_down_time);
for(i=0; i < vector_size; i++){
delete data[i];
}
delete [] vector_safe;
delete [] data;
return 0;
}
And dont forget that the compiler can and will change portions of the code, spagueti code is realy realy buggy in multithreading.

Does std::thread library in C++ support nested threading?

I want to create nested threads in C++ using std::thread library like this.
#include<iostream>
#include<thread>
#include<vector>
using namespace std;
void innerfunc(int inp)
{
cout << inp << endl;
}
void outerfunc(int inp)
{
thread * threads = new thread[inp];
for (int i = 0; i < inp; i++)
threads[i] = thread(innerfunc, i);
for (int i = 0; i < inp; i++)
threads[i].join();
delete[] threads;
}
int main()
{
int inp = 0;
thread t1 = thread(outerfunc,2);
thread t2 = thread(outerfunc,3);
t1.join();
t2.join();
}
Can I do this safely? I am worried whether join() works correctly.
There isn't really such a thing as "nested" or "children" threads in C++, the OS models don't immediately map to C++. The model for C++ is more accurately described along the lines of threads of execution being associated with thread objects.
From the linked cppreference;
The class thread represents a single thread of execution.
thread objects can be moved (std::move) around as required; it really is more an issue of ownership and who needs to join() the thread object before it goes out of scope.
In answer to the questions;
Can I do this safely?
Yes. Threads of execution (and their associated thread objects) can be created in "nested" threads and be successfully executed.
I am worried whether join() works correctly.
Yes it will. This is related to the "ownership" of the thread. So long as the thread of execution is joined before the thread object goes out of scope, it will work as you expect.
On a side note; I'm sure the innerfunc is for demonstration only, but cout will probably not synchronize as expected. The output will be "garbled".
Everything works perfectly! Just add a lock for all the 'cout' statements. Otherwise, the values will get garbled.
mutex m;
void innerfunc(int inp)
{
m.lock();
cout <<"Innerfunc triggered " << inp << endl;
m.unlock();
}
void outerfunc(int inp)
{
m.lock();
cout <<"Outerfunc triggered " << inp << endl;
m.unlock();
thread * threads = new thread[inp];
for (int i = 0; i < inp; i++)
threads[i] = thread(innerfunc, i);
for (int i = 0; i < inp; i++)
threads[i].join();
delete[] threads;
}

Fill an array from different threads concurrently c++

First of all, I think it is important to say that I am new to multithreading and know very little about it. I was trying to write some programs in C++ using threads and ran into a problem (question) that I will try to explain to you now:
I wanted to use several threads to fill an array, here is my code:
static const int num_threads = 5;
int A[50], n;
//------------------------------------------------------------
void ThreadFunc(int tid)
{
for (int q = 0; q < 5; q++)
{
A[n] = tid;
n++;
}
}
//------------------------------------------------------------
int main()
{
thread t[num_threads];
n = 0;
for (int i = 0; i < num_threads; i++)
{
t[i] = thread(ThreadFunc, i);
}
for (int i = 0; i < num_threads; i++)
{
t[i].join();
}
for (int i = 0; i < n; i++)
cout << A[i] << endl;
return 0;
}
As a result of this program I get:
0
0
0
0
0
1
1
1
1
1
2
2
2
2
2
and so on.
As I understand, the second thread starts writing elements to an array only when the first thread finishes writing all elements to an array.
The question is why threads dont't work concurrently? I mean why don't I get something like that:
0
1
2
0
3
1
4
and so on.
Is there any way to solve this problem?
Thank you in advance.
Since n is accessed from more than one thread, those accesses need to be synchronized so that changes made in one thread don't conflict with changes made in another. There are (at least) two ways to do this.
First, you can make n an atomic variable. Just change its definition, and do the increment where the value is used:
std::atomic<int> n;
...
A[n++] = tid;
Or you can wrap all the accesses inside a critical section:
std::mutex mtx;
int next_n() {
std::unique_lock<std::mutex> lock(mtx);
return n++;
}
And in each thread, instead of directly incrementing n, call that function:
A[next_n()] = tid;
This is much slower than the atomic access, so not appropriate here. In more complex situations it will be the right solution.
The worker function is so short, i.e., finishes executing so quickly, that it's possible that each thread is completing before the next one even starts. Also, you may need to link with a thread library to get real threads, e.g., -lpthread. Even with that, the results you're getting are purely by chance and could appear in any order.
There are two corrections you need to make for your program to be properly synchronized. Change:
int n;
// ...
A[n] = tid; n++;
to
std::atomic_int n;
// ...
A[n++] = tid;
Often it's preferable to avoid synchronization issues altogether and split the workload across threads. Since the work done per iteration is the same here, it's as easy as dividing the work evenly:
void ThreadFunc(int tid, int first, int last)
{
for (int i = first; i < last; i++)
A[i] = tid;
}
Inside main, modify the thread create loop:
for (int first = 0, i = 0; i < num_threads; i++) {
// possible num_threads does not evenly divide ASIZE.
int last = (i != num_threads-1) ? std::size(A)/num_threads*(i+1) : std::size(A);
t[i] = thread(ThreadFunc, i, first, last);
first = last;
}
Of course by doing this, even though the array may be written out of order, the values will be stored to the same locations every time.

threading program in C++ not faster

I have a program which reads the file line by line and then stores each possible substring of length 50 in a hash table along with its frequency. I tried to use threads in my program so that it will read 5 lines and then use five different threads to do the processing. The processing involves reading each substring of that line and putting them into hash map with frequency. But it seems there is something wrong which I could not figure out for which the program is not faster then the serial approach. Also, for large input file it is aborted. Here is the piece of code I am using
unordered_map<string, int> m;
mutex mtx;
void parseLine(char *line, int subLen){
int no_substr = strlen(line) - subLen;
for(int i = 0; i <= no_substr; i++) {
char *subStr = (char*) malloc(sizeof(char)* subLen + 1);
strncpy(subStr, line+i, subLen);
subStr[subLen]='\0';
mtx.lock();
string s(subStr);
if(m.find(s) != m.end()) m[s]++;
else {
pair<string, int> ret(s, 1);
m.insert(ret);
}
mtx.unlock();
}
}
int main(){
char **Array = (char **) malloc(sizeof(char *) * num_thread +1);
int num = 0;
while (NOT END OF FILE) {
if(num < num_th) {
if(num == 0)
for(int x = 0; x < num_th; x++)
Array[x] = (char*) malloc(sizeof(char)*strlen(line)+1);
strcpy(Array[num], line);
num++;
}
else {
vector<thread> threads;
for(int i = 0; i < num_th; i++) {
threads.push_back(thread(parseLine, Array[i]);
}
for(int i = 0; i < num_th; i++){
if(threads[i].joinable()) {
threads[i].join();
}
}
for(int x = 0; x < num_th; x++) free(seqArray[x]);
num = 0;
}
}
}
It's a myth that just by the virtue of using threads, the end result must be faster. In general, in order to take advantage of multithreading, two conditions must be met(*):
1) You actually have to have sufficient physical CPU cores, that can run the threads at the same time.
2) The threads have independent tasks to do, that they can do on their own.
From a cursory examination of the shown code, it seems to fail on the second part. It seems to me that, most of the time all of these threads will be fighting each other in order to acquire the same mutex. There's little to be gained from multithreading, in this situation.
(*) Of course, you don't always use threads for purely performance reasons. Multithreading also comes in useful in many other situations too, for example, in a program with a GUI, having a separate thread updating the GUI helps the UI working even while the main execution thread is chewing on something, for a while...

How to launch multiple operations in a loop using multithreading c++

Introduction
I am trying to launch 4 functions in parallel: func1, func2, func3 and func4 on a 6 cores machine. Every function will iterate for 1000 times an fill vector entities. The main function is do_operations(). I have two versions of do_operations() which I posted in section source code.
Problem
By using the first version I get the following error:
std::system_error'
what(): Resource temporarily unavailable
In order to solve that problem. I added a condition in the version 2. If the number of threads is equal to 6 which is the number of the cores that I have. Then I run the threads and clear vector<thread> threads.
Am I writing the threading function correctly? what am I doing wrong.
Source code
void my_class::func1(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func2(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func3(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func4(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
Version 1
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
}
for(auto &t : threads){ t.join(); }
threads.clear();
}
Version 2
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
if((threads.size() == concurentThreadsSupported) || (i == 999))
{
for(auto &t : threads){ t.join(); }
threads.clear();
}
}
}
You are launching in total 4000 threads. If every thread gets 1MB stack space, then only the threads will occupy 4000MB of address space. My assumption is that you do not have full 4GB of 32-bit address space reserved for applications (something must be left for kernel and hardware). My second assumption is that if there is not enough space to allocate a new stack, it will return the message you are seeing.