C++ Simple multithreading program memory leak - c++

I wrote an simple code that should make 1000 of threads, do some job, join them, and replay everything 1000 times.
I have a memory leak with this piece of code and I don't understand why. I've been looking for solution pretty much everywhere and can't find one.
#include <iostream>
#include <thread>
#include <string>
#include <windows.h>
#define NUM_THREADS 1000
std::thread t[NUM_THREADS];
using namespace std;
//This function will be called from a threads
void checkString(string str)
{
//some stuff to do
}
void START_THREADS(string text)
{
//Launch a group of threads
for (int i = 0; i < NUM_THREADS; i++)
{
t[i] = std::thread(checkString, text);
}
//Join the threads with the main thread
for (int i = 0; i < NUM_THREADS; i++) {
if (t[i].joinable())
{
t[i].join();
}
}
system("cls");
}
int main()
{
for(int i = 0; i < 1000; i++)
{
system("cls");
cout << i << "/1000" << endl;
START_THREADS("anything");
}
cout << "Launched from the main\n";
return 0;
}

I'm not sure about memory leaks, but you certainly have a memory error. You shouldn't be doing this:
delete &t[i];
t[i] was not allocated with new and it can't be deleted. You can safely remove that line.
As for memory consumption, you need to ask yourself whether you really need to spawn 1 million threads. Spawning threads isn't cheap, and it is unlikely that your platform will be able to run more than a handful of them concurrently.

Related

Can't tell if Mutex Lock is kicking in or not?

I'm working on a college assignment and have been tasked with showing a basic mutex lock example. I've never worked with threads in any form, so I'm a total beginner working with POSIX threads in C++.
What I'm trying to get the program to do is create 1000 threads that increment a global integer by 1000.
#include <iostream>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <unistd.h>
#include <thread>
pthread_t threadArr[1000];
pthread_mutex_t lock;
// Global int to increment
int numberToInc = 0;
void* incByTwo(void*)
{
pthread_mutex_lock(&lock);
for(int j = 0; j < 1000; j++){
numberToInc += 1;
}
pthread_mutex_unlock(&lock);
return NULL;
}
int main()
{
//Creates 1000 threads with incByTwo func
for(int i = 0; i < 1000; i++){
pthread_create(&threadArr[i], NULL, incByTwo, NULL);
}
std::cout << "\n" << numberToInc << "\n";
return 0;
}
The following produces a series of different results, obviously because the threads are executing concurrently, right?
Now, I've gotten it to work correctly by inserting
for(int i = 0; i < 1000; i++){
pthread_join(threadArr[i], NULL);
}
After the thread creation loop, but then removing the mutex locks, it still works. I've been trying to piece out how pthread_join works but I'm a little lost. Any advice?
Sorted a way to show the mutex lock in action. So when I output the global var in the function, without mutex locks it has the potential to show the results out of order.
Running the number range with mutex locks, out looks like:
1000
2000
3000
... (etc)
10000
With mutex locks removed, the output can vary in the order.
E.g.
1000
2000
4000
6000
3000
5000
7000
8000
9000
10000
While the final result of the three threads is correct, the sequence is out of order. In the context of this program it doesn't really matter but I'd imagine if it's passing inconsistently sequenced values it messes things up?
pthread_t threadArr[10];
pthread_mutex_t lock;
int numberToInc = 0;
void* incByTwo(void*)
{
pthread_mutex_lock(&lock);
for(int j = 0; j < 1000; j++){
numberToInc += 1;
}
std::cout << numberToInc << "\n";
pthread_mutex_unlock(&lock);
return NULL;
}
int main()
{
if (pthread_mutex_init(&lock, NULL) != 0)
{
printf("\n mutex init failed\n");
return 1;
}
for(int i = 0; i < 10; i++){
pthread_create(&threadArr[i], NULL, incByTwo, NULL);
}
pthread_join(threadArr[0], NULL);
return 0;
}

print 3 4 5 in sequence for 50 times using semaphore synchronization using three threads in C Linux

My programming is going in dead lock.I am trying to print three numbers 3 4 5 sequentially for 50 times using three threads using semaphore synchronization.
Please help me.
Below is the code
#include <iostream>
#include <pthread.h>
#include <semaphore.h>
using namespace std;
sem_t sem1;
sem_t sem2;
sem_t sem3;
void * fun1(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem1);
sem_wait(&sem3);
cout<<"3"
sem_post(&sem2);
sem_post(&sem3);
}
}
void * fun2(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem2);
sem_wait(&sem3);
cout<<"4";
sem_post(&sem3);
sem_post(&sem1);
}
}
void * fun3 (void *)
{
for(int i = 0; i< 50; i++)
{
sem_wait(&sem2);
sem_wait(&sem3);
cout<<"5";
sem_post(&sem1);
sem_post(&sem2);
}
}
int main()
{
pthread_t t1;
pthread_t t2;
pthread_t t3;
sem_init(&sem1,0,1);
sem_init(&sem2,0,0);
sem_init(&sem3,0,1);
pthread_create(&t1,NULL,&fun1,NULL);
pthread_create(&t2,NULL,&fun2,NULL);
pthread_create(&t3,NULL,&fun3,NULL);
pthread_join(t1,NULL);
pthread_join(t2,NULL);
pthread_join(t3,NULL);
return 1;
}
Please help me to understand and solve this deadlock.Provide suggestions also i can do this for example 3 4 5 6 using 4 etc threads
Please help me to understand and solve this deadlock.
There is indeed a deadlock in your code. Consider at the beginning, thread 1 first gets 2 semaphores and call cout << "3". After posting sem2 and sem3, it is possible that thread 3 immediately gets these 2 sem, then call cout << "5". However, after thread 3 posting sem1 and sem2, no one can reach a cout << statement, because sem3's value is 0 and everyone needs to pass a wait of sem3.
If you are wondering why there is totally no output, it's because the buffer inside iostream. For console output, "\n" will flush buffer, so if you replace "3" by "3\n", you can see the output.
Provide suggestions also i can do this for example 3 4 5 6 using 4 etc threads
In the following code, you should see the symmetry, which can be easily generalized to any number of thread. And you should always call sem_destroy after using semaphore, otherwise you might get system level resource leak.
#include <iostream>
#include <pthread.h>
#include <semaphore.h>
using namespace std;
sem_t sem1;
sem_t sem2;
sem_t sem3;
void * fun1(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem1);
cout<<"3\n";
sem_post(&sem2);
}
}
void * fun2(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem2);
cout<<"4\n";
sem_post(&sem3);
}
}
void * fun3 (void *)
{
for(int i = 0; i< 50; i++)
{
sem_wait(&sem3);
cout<<"5\n";
sem_post(&sem1);
}
}
int main()
{
pthread_t t1;
pthread_t t2;
pthread_t t3;
sem_init(&sem1,0,1);
sem_init(&sem2,0,0);
sem_init(&sem3,0,0);
pthread_create(&t1,NULL,&fun1,NULL);
pthread_create(&t2,NULL,&fun2,NULL);
pthread_create(&t3,NULL,&fun3,NULL);
pthread_join(t1,NULL);
pthread_join(t2,NULL);
pthread_join(t3,NULL);
sem_destroy(&sem1);
sem_destroy(&sem2);
sem_destroy(&sem3);
return 1;
}

How to create a certain number of threads based on a value a variable contains?

I have a integer variable, that contains the number of threads to execute. Lets call it myThreadVar. I want to execute myThreadVar threads, and cannot think of any way to do it, without a ton of if statements. Is there any way I can create myThreadVar threads, no matter what myThreadVar is?
I was thinking:
for (int i = 0; i < myThreadVar; ++i) { std::thread t_i(myFunc); }, but that obviously won't work.
Thanks in advance!
Make an array or vector of threads, put the threads in, and then if you want to wait for them to finish have a second loop go over your collection and join them all:
std::vector<std::thread> myThreads;
myThreads.reserve(myThreadVar);
for (int i = 0; i < myThreadVar; ++i)
{
myThreads.push_back(std::thread(myFunc));
}
While other answers use vector::push_back(), I prefer vector::emplace_back(). Possibly more efficient. Also use vector::reserve(). See it live here.
#include <thread>
#include <vector>
void func() {}
int main() {
int num = 3;
std::vector<std::thread> vec;
vec.reserve(num);
for (auto i = 0; i < num; ++i) {
vec.emplace_back(func);
}
for (auto& t : vec) t.join();
}
So, obvious the best solution is not to wait previous thread to done. You need to run all of them in parallel.
In this case you can use vector class to store all of instances and after that make join to all of them.
Take a look at my example.
#include <thread>
#include <vector>
void myFunc() {
/* Some code */
}
int main()
{
int myThreadVar = 50;
std::vector <thread> threadsToJoin;
threadsToJoin.resize(myThreadVar);
for (int i = 0; i < myThreadVar; ++i) {
threadsToJoin[i] = std::thread(myFunc);
}
for (int i = 0; i < threadsToJoin.size(); i++) {
threadsToJoin[i].join();
}
}
#include <iostream>
#include <thread>
void myFunc(int n) {
std::cout << "myFunc " << n << std::endl;
}
int main(int argc, char *argv[]) {
int myThreadVar = 5;
for (int i = 0; i < myThreadVar; ++i) {
std::cout << "Launching " << i << std::endl;
std::thread t_i(myFunc,i);
t_i.detach();
}
}
g++ -std=c++11 -o 35106568 35106568.cpp
./35106568
Launching 0
myFunc 0
Launching 1
myFunc 1
Launching 2
myFunc 2
Launching 3
myFunc 3
Launching 4
myFunc 4
You need to store the thread so you can send it to join.
std::thread t[myThreadVar];
for (int i = 0; i < myThreadVar; ++i) { t[i] = std::thread(myFunc); }//Start all threads
for (int i = 0; i < myThreadVar; ++i) {t[i].join;}//Wait for all threads to finish
I think this is valid syntax, but I'm more used to c so I am unsure if I initialized the array correctly.

OpenMP vs C++11 threads

In the following example the C++11 threads take about 50 seconds to execute, but the OMP threads only 5 seconds. Any ideas why? (I can assure you it still holds true if you are doing real work instead of doNothing, or if you do it in a different order, etc.) I'm on a 16 core machine, too.
#include <iostream>
#include <omp.h>
#include <chrono>
#include <vector>
#include <thread>
using namespace std;
void doNothing() {}
int run(int algorithmToRun)
{
auto startTime = std::chrono::system_clock::now();
for(int j=1; j<100000; ++j)
{
if(algorithmToRun == 1)
{
vector<thread> threads;
for(int i=0; i<16; i++)
{
threads.push_back(thread(doNothing));
}
for(auto& thread : threads) thread.join();
}
else if(algorithmToRun == 2)
{
#pragma omp parallel for num_threads(16)
for(unsigned i=0; i<16; i++)
{
doNothing();
}
}
}
auto endTime = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = endTime - startTime;
return elapsed_seconds.count();
}
int main()
{
int cppt = run(1);
int ompt = run(2);
cout<<cppt<<endl;
cout<<ompt<<endl;
return 0;
}
OpenMP thread-pools for its Pragmas (also here and here). Spinning up and tearing down threads is expensive. OpenMP avoids this overhead, so all it's doing is the actual work and the minimal shared-memory shuttling of the execution state. In your Threads code you are spinning up and tearing down a new set of 16 threads every iteration.
I tried a code of an 100 looping at
Choosing the right threading framework and it took
OpenMP 0.0727, Intel TBB 0.6759 and C++ thread library 0.5962 mili-seconds.
I also applied what AruisDante suggested;
void nested_loop(int max_i, int band)
{
for (int i = 0; i < max_i; i++)
{
doNothing(band);
}
}
...
else if (algorithmToRun == 5)
{
thread bristle(nested_loop, max_i, band);
bristle.join();
}
This code looks like taking less time than your original C++ 11 thread section.

Why don't threads seem to run in parallel in this code?

This is the first time I am working with threads so I am sorry if this is a bad question. Shouldn't the output be consisted of "randomized" mains and foos? What I get seems to be a column of foos and a column of mains.
#include <iostream>
#include <thread>
void foo() {
for (int i = 0; i < 20; ++i) {
std::cout << "foo" << std::endl;
}
}
int main(int argc, char** argv) {
std::thread first(foo);
for (int i = 0; i < 20; ++i) {
std::cout << "main" << std::endl;
}
first.join();
return 0;
}
There is a overhead starting a tread. So in this simple example the output is completely unpredictable. Both for loops running very short, and therefore if the thread start is only even a millisecond late, both code segments are executed sequentially instead of parallel. But if the operating system schedules the thread first, the "foo" sequence is showing before the "main" sequence.
Insert some sleep calls into the thread and the main function to see if they really run parallel.
#include <iostream>
#include <thread>
#include <unistd.h>
void foo() {
for (int i = 0; i < 20; ++i) {
std::cout << "foo" << std::endl;
sleep(1);
}
}
int main(int argc, char** argv) {
std::thread first(foo);
for (int i = 0; i < 20; ++i) {
std::cout << "main" << std::endl;
sleep(1);
}
first.join();
return 0;
}
Using threads does not automatically enforce parallel execution of code segments, because if you e.g. have only one CPU in your system, the execution is switched between all processes and threads, and code segments are never running parallel.
There is a good wikipedia article about threads here. Especially read the section about "Multithreading".
After cout try to yield. This may honor any waiting thread. (Although implementation dependent)