Locking a part of memory for multithreading - c++

I'm trying to write some code that creates threads that can modify different parts of memory concurrently. I read that a mutex is usually used to lock code, but I'm not sure if I can use that in my situation. Example:
using namespace std;
mutex m;
void func(vector<vector<int> > &a, int b)
{
lock_guard<mutex> lk(m);
for (int i = 0; i < 10E6; i++) { a[b].push_back(1); }
}
int main()
{
vector<thread> threads;
vector<vector<int> > ints(4);
for (int i = 0; i < 10; i++)
{
threads.push_back(thread (func, ref(ints), i % 4));
}
for (int i = 0; i < 10; i++) { threads[i].join(); }
return 0;
}
Currently, the mutex just locks the code inside func, so (I believe) every thread just has to wait until the previous is finished.
I'm trying to get the program to edit the 4 vectors of ints at the same time, but that does realize it has to wait until some other thread is done editing one of those vectors before starting the next.

I think you want the following: (one std::mutex by std::vector<int>)
std::mutex m[4];
void func(std::vector<std::vector<int> > &a, int index)
{
std::lock_guard<std::mutex> lock(m[index]);
for (int i = 0; i < 10E6; i++) {
a[index].push_back(1);
}
}

Have you considered using a semaphore instead of a mutex?
The following questions might help you:
Semaphore Vs Mutex
When should we use mutex and when should we use semaphore

try:
void func(vector<vector<int> > &a, int b)
{
for (int i=0; i<10E6; i++) {
lock_guard<mutex> lk(m);
a[b].push_back(1);
}
}
You only need to lock your mutex while accessing the shared object (a). The way you implemented func means that one thread must finish running the entire loop before the next can start running.

Related

C/C++ - Single semaphore of type sem_t to print numbers in order

Problem: Let's say we have n threads where each thread receives a random unique number between 1 and n. And we want the threads to print the numbers in sorted order.
Trivial Solution (using n semaphore/mutex): We can use n mutex locks (or similarly semaphores) where thread i waits to acquire mutex lock number i and unlocks number i + 1. Also, thread 1 has no wait.
However, I'm wondering if it's possible to simulate a similar logic using a single semaphore (of type sem_t) to implement the following logic: (i is a number between 1 to n inclusive)
Thread with number i as input, waits to acquire a count of (i-1) on the semaphore, and after
printing, releases a count of i. Needless to say, thread one does not
wait.
I know that unlike Java, sem_t does not support arbitrary increase/decrease in the semaphore value. Moreover, writing a for loop to do (i-1) wait and i release won't work because of asynchrony.
I've been looking for the answer for so long but couldn't find any. Is this possible in plain C? If not, is it possible in C++ using only one variable or semaphore? Overall, what is the least wasteful way to do this with ONE semaphore.
Please feel free to edit the question since I'm new to multi-threaded programming.
You can do this with a condition_variable in C++, which is equivalent to a pthread_cond_t with the pthreads library in C.
What you want to share between threads is a pointer to a condition_variable, number, and a mutex to guard access to the number.
struct GlobalData
{
std::condition_variable cv;
int currentValue;
std::mutex mut;
};
Each thread simply invokes a function that waits for its number to be set:
void WaitForMyNumber(std::shared_ptr<GlobalData> gd, int number)
{
std::unique_lock<std::mutex> lock(gd->mut);
while (gd->currentValue != number)
{
gd->cv.wait(lock);
}
std::cout << number << std::endl;
gd->currentValue++;
gd->cv.notify_all(); // notify all other threads that it can wake up and check
}
And then a program to test it all out. This one uses 10 threads. You can modify it to use more and then have your own randomization algorithm of the numbers list.
int main()
{
int numbers[10] = { 9, 1, 0, 7, 5, 3, 2, 8, 6, 4 };
std::shared_ptr<GlobalData> gd = std::make_shared<GlobalData>();
// gd->number is initialized to 0.
std::thread threads[10];
for (int i = 0; i < 10; i++)
{
int num = numbers[i];
auto fn = [gd, num] {WaitForMyNumber(gd, num); };
threads[i] = std::move(std::thread(fn));
}
// wait for all the threads to finish
for (int i = 0; i < 10; i++)
{
threads[i].join();
}
return 0;
}
All of the above is in C++. But it would be easy to transpose the above solution to C using pthreads. But I'll leave that as an exercise for the OP.
I'm not sure if this satisfies your "one semaphore requirement". The mutex technically has a semaphore. Not sure if the condition_variable itself has a semaphore for its implementation.
Thats a good question although, I fear you might have a XY problem since I can not imagine a good reason for your problem scenario. Never the less, after 1-2 minutes I came up with 2 solutions with pros and cons, but I think one is perfect for you:
A. When your threads are almost done the same time and or need their print ASAP you could use a shared std::atomic<T> with T=unsigned,int,size_t,uint32_t what ever you like, or any of the integer atomics in the C standard library when using C, initialise it with 0, and now every thread i busy waits until its value is i-1. If so, it prints and then adds 1 on the atomic. Of course since of the busy wait, you will have much CPU load when thread are waiting long, and slow down, when many are waiting. But you get your print ASAP
B. You just store your result of thread i in a container, maybe along with its index, since I guess you want more to just print i, and after all threads are finished or periodically, sort this container and then print it.
A.:
#include <iostream>
#include <atomic>
#include <thread>
#include <vector>
#include <functional>
void thread_function(unsigned i, std::atomic<unsigned>& atomic) {
while (atomic < i - 1) {}
std::cout << i << " ";
atomic += 1;
}
int main() {
std::atomic<unsigned> atomic = 0;
std::vector<std::thread> threads;
for (auto i : {3,1,2}) {
threads.push_back(std::thread(thread_function, i, std::ref(atomic)));
}
for (auto& t : threads) {
t.join();
}
std::cout << "\n";
}
Works also in C, just use the atomics there.
The following code uses pthread_cond_t and works in C.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#define n 100
int counter = 0;
int used[n];
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
void foo(void *given_number){
int number = (int)given_number;
pthread_mutex_lock(&mutex);
while(counter != number){
pthread_cond_wait(&cond, &mutex);
}
printf("%d\n", number);
counter++;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex);
}
int get_random_number(){
while(1){
int x = rand()%n;
if(!used[x]){
used[x] = 1;
return x;
}
}
}
int main(){
pthread_t threads[n];
for(int i = 0; i < n; i++){
int num = get_random_number();
pthread_create(&threads[i], NULL, foo, (void *)num);
}
for(int i = 0; i < n; i++){
pthread_join(threads[i], NULL);
}
return 0;
}

How to multithread line by line pixels using std::thread?

I want to learn how to adapt pseudocode I have for multithreading line by line to C++. I understand the pseudocode but I am not very experienced with C++ nor the std::thread function.
This is the pseudocode I have and that I've often used:
myFunction
{
int threadNr=previous;
int numberProcs = countProcessors();
// Every thread calculates a different line
for (y = y_start+threadNr; y < y_end; y+=numberProcs) {
// Horizontal lines
for (int x = x_start; x < x_end; x++) {
psetp(x,y,RGB(255,128,0));
}
}
}
int numberProcs = countProcessors();
// Launch threads: e.g. for 1 processor launch no other thread, for 2 processors launch 1 thread, for 4 processors launch 3 threads
for (i=0; i<numberProcs-1; i++)
triggerThread(50,FME_CUSTOMEVENT,i); //The last parameter is the thread number
triggerEvent(50,FME_CUSTOMEVENT,numberProcs-1); //The last thread used for progress
// Wait for all threads to finished
waitForThread(0,0xffffffff,-1);
I know I can call my current function using one thread via std::thread like this:
std::thread t1(FilterImage,&size_param, cdepth, in_data, input_worldP, output_worldP);
t1.join();
But this is not efficient as it is calling the entire function over and over again per thread.
I would expect every processor to tackle a horizontal line on it's own.
Any example code would would be highly appreciated as I tend to learn best through example.
Invoking thread::join() forces the calling thread to wait for the child thread to finish executing. For example, if I use it to create a number of threads in a loop, and call join() on each one, it'll be the same as though everything happened in sequence.
Here's an example. I have two methods that print out the numbers 1 through n. The first one does it single threaded, and the second one joins each thread as they're created. Both have the same output, but the threaded one is slower because you're waiting for each thread to finish before starting the next one.
#include <iostream>
#include <thread>
void printN_nothreads(int n) {
for(int i = 0; i < n; i++) {
std::cout << i << "\n";
}
}
void printN_threaded(int n) {
for(int i = 0; i < n; i++) {
std::thread t([=](){ std::cout << i << "\n"; });
t.join(); //This forces synchronization
}
}
Doing threading better.
To gain benefit from using threads, you have to start all the threads before joining them. In addition, to avoid false sharing, each thread should work on a separate region of the image (ideally a section that's far away in memory).
Let's look at how this would work. I don't know what library you're using, so instead I'm going to show you how to write a multi-threaded transform on a vector.
auto transform_section = [](auto func, auto begin, auto end) {
for(; begin != end; ++begin) {
func(*begin);
}
};
This transform_section function will be called once per thread, each on a different section of the vector. Let's write transform so it's multithreaded.
template<class Func, class T>
void transform(Func func, std::vector<T>& data, int num_threads) {
size_t size = data.size();
auto section_start = [size, num_threads](int thread_index) {
return size * thread_index / num_threads;
};
auto section_end = [size, num_threads](int thread_index) {
return size * (thread_index + 1) / num_threads;
};
std::vector<std::thread> threads(num_threads);
// Each thread works on a different section
for(int i = 0; i < num_threads; i++) {
T* start = &data[section_start(i)];
T* end = &data[section_end(i)];
threads[i] = std::thread(transform_section, func, start, end);
}
// We only join AFTER all the threads are started
for(std::thread& t : threads) {
t.join();
}
}

Does std::thread library in C++ support nested threading?

I want to create nested threads in C++ using std::thread library like this.
#include<iostream>
#include<thread>
#include<vector>
using namespace std;
void innerfunc(int inp)
{
cout << inp << endl;
}
void outerfunc(int inp)
{
thread * threads = new thread[inp];
for (int i = 0; i < inp; i++)
threads[i] = thread(innerfunc, i);
for (int i = 0; i < inp; i++)
threads[i].join();
delete[] threads;
}
int main()
{
int inp = 0;
thread t1 = thread(outerfunc,2);
thread t2 = thread(outerfunc,3);
t1.join();
t2.join();
}
Can I do this safely? I am worried whether join() works correctly.
There isn't really such a thing as "nested" or "children" threads in C++, the OS models don't immediately map to C++. The model for C++ is more accurately described along the lines of threads of execution being associated with thread objects.
From the linked cppreference;
The class thread represents a single thread of execution.
thread objects can be moved (std::move) around as required; it really is more an issue of ownership and who needs to join() the thread object before it goes out of scope.
In answer to the questions;
Can I do this safely?
Yes. Threads of execution (and their associated thread objects) can be created in "nested" threads and be successfully executed.
I am worried whether join() works correctly.
Yes it will. This is related to the "ownership" of the thread. So long as the thread of execution is joined before the thread object goes out of scope, it will work as you expect.
On a side note; I'm sure the innerfunc is for demonstration only, but cout will probably not synchronize as expected. The output will be "garbled".
Everything works perfectly! Just add a lock for all the 'cout' statements. Otherwise, the values will get garbled.
mutex m;
void innerfunc(int inp)
{
m.lock();
cout <<"Innerfunc triggered " << inp << endl;
m.unlock();
}
void outerfunc(int inp)
{
m.lock();
cout <<"Outerfunc triggered " << inp << endl;
m.unlock();
thread * threads = new thread[inp];
for (int i = 0; i < inp; i++)
threads[i] = thread(innerfunc, i);
for (int i = 0; i < inp; i++)
threads[i].join();
delete[] threads;
}

How to launch multiple operations in a loop using multithreading c++

Introduction
I am trying to launch 4 functions in parallel: func1, func2, func3 and func4 on a 6 cores machine. Every function will iterate for 1000 times an fill vector entities. The main function is do_operations(). I have two versions of do_operations() which I posted in section source code.
Problem
By using the first version I get the following error:
std::system_error'
what(): Resource temporarily unavailable
In order to solve that problem. I added a condition in the version 2. If the number of threads is equal to 6 which is the number of the cores that I have. Then I run the threads and clear vector<thread> threads.
Am I writing the threading function correctly? what am I doing wrong.
Source code
void my_class::func1(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func2(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func3(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
void my_class::func4(std::vector<std::string> & entities)
{
for(int = 0; i < 1000;i++)
{
mtx.lock();
entities.push_back(to_string(i));
mtx.unlock();
}
}
Version 1
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
}
for(auto &t : threads){ t.join(); }
threads.clear();
}
Version 2
void my_class::do_operations()
{
//get number of CPUS
int concurentThreadsSupported = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
std::vector<std::string> entities;
for(int i =0; i < 1000; i++)
{
threads.push_back(std::thread(&my_class::func1, this, ref(entities)));
threads.push_back(std::thread(&my_class::func2, this, ref(entities)));
threads.push_back(std::thread(&my_class::func3, this, ref(entities)));
threads.push_back(std::thread(&my_class::func4, this, ref(entities)));
if((threads.size() == concurentThreadsSupported) || (i == 999))
{
for(auto &t : threads){ t.join(); }
threads.clear();
}
}
}
You are launching in total 4000 threads. If every thread gets 1MB stack space, then only the threads will occupy 4000MB of address space. My assumption is that you do not have full 4GB of 32-bit address space reserved for applications (something must be left for kernel and hardware). My second assumption is that if there is not enough space to allocate a new stack, it will return the message you are seeing.

OpenMP filling array with two threads in series

I have an array. And I need to fill it with two threads each value consequently, using omp_set_lock, and omp_unset_lock. First thread should write first value, then second array should write second value etc. I have no idea how to do that, because, in openmp you cant't explicitly make one thread wait for another. Have any ideas?
Why not try the omp_set_lock/omp_unset_lock functions?
omp_lock_t lock;
omp_init_lock(&lock);
#pragma omp parallel for
bool thread1 = true;
for (int i = 0; i < arr.size(); ++i) {
omp_set_lock(&lock);
if (thread1 == true) {
arr[i] = fromThread1();
thread1 = false;
} else {
arr[i] = fromThread2();
thread1 = true;
}
omp_unset_lock(&lock);
}