Introduction
I am trying to launch 4 functions in parallel: func1, func2, func3 and func4 on a 6 cores machine. Every function will iterate for 1000 times an fill vector entities. The main function is do_operations(). I have two versions of do_operations() which I posted in section source code.
Problem
By using the first version I get the following error:
std::system_error'
what(): Resource temporarily unavailable
In order to solve that problem. I added a condition in the version 2. If the number of threads is equal to 6 which is the number of the cores that I have. Then I run the threads and clear vector<thread> threads.
Am I writing the threading function correctly? what am I doing wrong.
Source code
void my_class::func1(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func2(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func3(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func4(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
Version 1
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
}
for(auto &t : threads){ t.join(); }
threads.clear();
}
Version 2
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
if((threads.size() == concurentThreadsSupported) || (i == 999))
{
for(auto &t : threads){ t.join(); }
threads.clear();
}
}
}
You are launching in total 4000 threads. If every thread gets 1MB stack space, then only the threads will occupy 4000MB of address space. My assumption is that you do not have full 4GB of 32-bit address space reserved for applications (something must be left for kernel and hardware). My second assumption is that if there is not enough space to allocate a new stack, it will return the message you are seeing.
Related
I have some class objects and want to hand them over to several threads. The number of threads is given by the command line.
When I write it the following way, it works fine:
thread t1(thread(thread(tasks[0], parts[0])));
thread t2(thread(thread(tasks[1], parts[1])));
thread t3(thread(thread(tasks[2], parts[2])));
thread t4(thread(thread(tasks[3], parts[3])));
t1.join();
t2.join();
t3.join();
t4.join();
But as I mentioned, the number of threads shall be given by the command line, so it must be more dynamic. I tried the following code, which doesn't work, and I have no idea what is wrong with it:
for(size_t i=0; i < threads.size(); i++) {
threads.push_back(thread(tasks[i], parts[i]));
}
for(auto &t : threads) {
t.join();
}
I hope someone has an idea on how to correct it.
In this statement:
thread t1(thread(thread(tasks[0], parts[0])));
You don't need to move a thread into another thread and then move that into another thread. Just pass your task parameters directly to t1's constructor:
thread t1(tasks[0], parts[0]);
Same with t2, t3, and t4.
As for your loop:
for(size_t i=0; i < threads.size(); i++) {
threads.push_back(thread(tasks[i], parts[i]));
}
Assuming you are using std::vector<std::thread> threads, then your loop is populating threads wrong. At best, the loop simply won't do anything at all if threads is initially empty, because i < threads.size() will be false when size()==0. At worst, if threads is not initially empty then the loop will run and continuously increase threads.size() with each call to threads.push_back(), causing an endless loop because i < threads.size() will never be false, thus pushing more and more threads into threads until memory blows up.
Try something more like this instead:
size_t numThreads = ...; // taken from cmd line...
std::vector<std::thread> threads(numThreads);
for(size_t i = 0; i < numThreads; i++) {
threads[i] = std::thread(tasks[i], parts[i]);
}
for(auto &t : threads) {
t.join();
}
Or this:
size_t numThreads = ...; // taken from cmd line...
std::vector<std::thread> threads;
threads.reserve(numThreads);
for(size_t i = 0; i < numThreads; i++) {
threads.emplace_back(tasks[i], parts[i]);
}
for(auto &t : threads) {
t.join();
}
Threads are not copyable; try this:
threads.emplace_back(std::thread(task));
Emplace back thread on vector
In my program, I want to get number of threads from user. For example, user enters number of threads as 5, i want to create 5 threads. It is only needed in the beginning of the program. I don't need to change number of threads during the program. So, i write the code such as;
int numberOfThread;
cout << "Enter number of threads: " ;
cin >> numberOfThread;
for(int i = 0; i < numberOfThread; i++)
{
pthread_t* mythread = new pthread_t;
pthread_create(&mythread[i],NULL, myThreadFunction, NULL);
}
for(int i = 0; i < numberOfThread; i++)
{
pthread_join(mythread[i], NULL);
}
return 0;
but i have an error in this line pthread_join(mythread[i], NULL);
error: ‘mythread’ was not declared in this scope.
What is wrong in this code?
and do you have a better idea to create user defined number of thread?
First, you have a memory leak when creating threads because you allocate memory but then loose the reference to it.
I suggest you the following: create an std::vector of std::threads (so, don't use pthread_t at all) and then you can have something like:
std::vector<std::thread> threads;
for (std::size_t i = 0; i < numberOfThread; i++) {
threads.emplace_back(myThreadFunction, 1);
}
for (auto& thread : threads) {
thread.join();
}
if your myThreadFunction looks like:
void myThreadFunction(int n) {
std::cout << n << std::endl; // output: 1, from several different threads
}
I want to learn how to adapt pseudocode I have for multithreading line by line to C++. I understand the pseudocode but I am not very experienced with C++ nor the std::thread function.
This is the pseudocode I have and that I've often used:
myFunction
{
int threadNr=previous;
int numberProcs = countProcessors();
// Every thread calculates a different line
for (y = y_start+threadNr; y < y_end; y+=numberProcs) {
// Horizontal lines
for (int x = x_start; x < x_end; x++) {
psetp(x,y,RGB(255,128,0));
}
}
}
int numberProcs = countProcessors();
// Launch threads: e.g. for 1 processor launch no other thread, for 2 processors launch 1 thread, for 4 processors launch 3 threads
for (i=0; i<numberProcs-1; i++)
triggerThread(50,FME_CUSTOMEVENT,i); //The last parameter is the thread number
triggerEvent(50,FME_CUSTOMEVENT,numberProcs-1); //The last thread used for progress
// Wait for all threads to finished
waitForThread(0,0xffffffff,-1);
I know I can call my current function using one thread via std::thread like this:
std::thread t1(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
t1.join();
But this is not efficient as it is calling the entire function over and over again per thread.
I would expect every processor to tackle a horizontal line on it's own.
Any example code would would be highly appreciated as I tend to learn best through example.
Invoking thread::join() forces the calling thread to wait for the child thread to finish executing. For example, if I use it to create a number of threads in a loop, and call join() on each one, it'll be the same as though everything happened in sequence.
Here's an example. I have two methods that print out the numbers 1 through n. The first one does it single threaded, and the second one joins each thread as they're created. Both have the same output, but the threaded one is slower because you're waiting for each thread to finish before starting the next one.
#include <iostream>
#include <thread>
void printN_nothreads(int n) {
for(int i = 0; i < n; i++) {
std::cout << i << "\n";
}
}
void printN_threaded(int n) {
for(int i = 0; i < n; i++) {
std::thread t([=](){ std::cout << i << "\n"; });
t.join(); //This forces synchronization
}
}
Doing threading better.
To gain benefit from using threads, you have to start all the threads before joining them. In addition, to avoid false sharing, each thread should work on a separate region of the image (ideally a section that's far away in memory).
Let's look at how this would work. I don't know what library you're using, so instead I'm going to show you how to write a multi-threaded transform on a vector.
auto transform_section = [](auto func, auto begin, auto end) {
for(; begin != end; ++begin) {
func(*begin);
}
};
This transform_section function will be called once per thread, each on a different section of the vector. Let's write transform so it's multithreaded.
template<class Func, class T>
void transform(Func func, std::vector<T>& data, int num_threads) {
size_t size = data.size();
auto section_start = [size, num_threads](int thread_index) {
return size * thread_index / num_threads;
};
auto section_end = [size, num_threads](int thread_index) {
return size * (thread_index + 1) / num_threads;
};
std::vector<std::thread> threads(num_threads);
// Each thread works on a different section
for(int i = 0; i < num_threads; i++) {
T* start = &data[section_start(i)];
T* end = &data[section_end(i)];
threads[i] = std::thread(transform_section, func, start, end);
}
// We only join AFTER all the threads are started
for(std::thread& t : threads) {
t.join();
}
}
I have a program which reads the file line by line and then stores each possible substring of length 50 in a hash table along with its frequency. I tried to use threads in my program so that it will read 5 lines and then use five different threads to do the processing. The processing involves reading each substring of that line and putting them into hash map with frequency. But it seems there is something wrong which I could not figure out for which the program is not faster then the serial approach. Also, for large input file it is aborted. Here is the piece of code I am using
unordered_map<string, int> m;
mutex mtx;
void parseLine(char *line, int subLen){
int no_substr = strlen(line) - subLen;
for(int i = 0; i <= no_substr; i++) {
char *subStr = (char*) malloc(sizeof(char)* subLen + 1);
strncpy(subStr, line+i, subLen);
subStr[subLen]='\0';
mtx.lock();
string s(subStr);
if(m.find(s) != m.end()) m[s]++;
else {
pair<string, int> ret(s, 1);
m.insert(ret);
}
mtx.unlock();
}
}
int main(){
char **Array = (char **) malloc(sizeof(char *) * num_thread +1);
int num = 0;
while (NOT END OF FILE) {
if(num < num_th) {
if(num == 0)
for(int x = 0; x < num_th; x++)
Array[x] = (char*) malloc(sizeof(char)*strlen(line)+1);
strcpy(Array[num], line);
num++;
}
else {
vector<thread> threads;
for(int i = 0; i < num_th; i++) {
threads.push_back(thread(parseLine, Array[i]);
}
for(int i = 0; i < num_th; i++){
if(threads[i].joinable()) {
threads[i].join();
}
}
for(int x = 0; x < num_th; x++) free(seqArray[x]);
num = 0;
}
}
}
It's a myth that just by the virtue of using threads, the end result must be faster. In general, in order to take advantage of multithreading, two conditions must be met(*):
1) You actually have to have sufficient physical CPU cores, that can run the threads at the same time.
2) The threads have independent tasks to do, that they can do on their own.
From a cursory examination of the shown code, it seems to fail on the second part. It seems to me that, most of the time all of these threads will be fighting each other in order to acquire the same mutex. There's little to be gained from multithreading, in this situation.
(*) Of course, you don't always use threads for purely performance reasons. Multithreading also comes in useful in many other situations too, for example, in a program with a GUI, having a separate thread updating the GUI helps the UI working even while the main execution thread is chewing on something, for a while...
I'm trying to write some code that creates threads that can modify different parts of memory concurrently. I read that a mutex is usually used to lock code, but I'm not sure if I can use that in my situation. Example:
using namespace std;
mutex m;
void func(vector<vector<int> > &a, int b)
{
lock_guard<mutex> lk(m);
for (int i = 0; i < 10E6; i++) { a[b].push_back(1); }
}
int main()
{
vector<thread> threads;
vector<vector<int> > ints(4);
for (int i = 0; i < 10; i++)
{
threads.push_back(thread (func, ref(ints), i % 4));
}
for (int i = 0; i < 10; i++) { threads[i].join(); }
return 0;
}
Currently, the mutex just locks the code inside func, so (I believe) every thread just has to wait until the previous is finished.
I'm trying to get the program to edit the 4 vectors of ints at the same time, but that does realize it has to wait until some other thread is done editing one of those vectors before starting the next.
I think you want the following: (one std::mutex by std::vector<int>)
std::mutex m[4];
void func(std::vector<std::vector<int> > &a, int index)
{
std::lock_guard<std::mutex> lock(m[index]);
for (int i = 0; i < 10E6; i++) {
a[index].push_back(1);
}
}
Have you considered using a semaphore instead of a mutex?
The following questions might help you:
Semaphore Vs Mutex
When should we use mutex and when should we use semaphore
try:
void func(vector<vector<int> > &a, int b)
{
for (int i=0; i<10E6; i++) {
lock_guard<mutex> lk(m);
a[b].push_back(1);
}
}
You only need to lock your mutex while accessing the shared object (a). The way you implemented func means that one thread must finish running the entire loop before the next can start running.