So, i have some issues with i suspect a memory leak, in order to test i wrote this small code. By commenting the following line:
printf("Calc index: %d\n", ArrLength);
the code runs well. But when i uncomment it, the program crashed after a couple thousand threads.. When i use try/catch, the program just crashed inside the try function. Anyone who can help me out here?
#include "stdafx.h"
#include <process.h>
#include <iostream>
#include <mutex>
#include <windows.h>
using namespace std;
typedef struct {
int StartNode;
int EndNode;
int GangID;
int MemberID;
int ArrLength;
int arr[10000];
}t;
t *arg;
mutex m;
void myFunc(void *param) {
m.lock();
printf("Calculate thread started\n");
t *args = (t*)param;
int StartNode = args->StartNode;
int EndNode = args->EndNode;
int GangID = args->GangID;
int MemberID = args->MemberID;
int ArrLength = args->ArrLength;
printf("Calc index: %d\n", ArrLength);
free(args);
m.unlock();
}
int main()
{
for (int i = 0; i < 1000000; i++)
{
HANDLE handle;
arg = (t *)malloc(sizeof(t));
arg->StartNode = 2;
arg->EndNode = 1;
arg->GangID = 1;
arg->MemberID = 1;
arg->ArrLength = 5;
for (int j = 0; j < 10000; j++)
{
arg->arr[j] = j;
}
handle = (HANDLE)_beginthread(myFunc, 0, (void*)arg);
}
cin.get();
return 0;
}
Well, let do some calc. Your t struct has 40020 bytes per instance. You do allocate it 1M times that causes about 40 Gb allocated in total. And this is not all the memory required because each thread is not for free. By default, Windows allocates 1Mb stack per thread that gives you 1 Tb (one terabyte) of memory required just to let the threads live.
So, total memory amount is something like 1040 Gb. Do you really intend that?
Related
I'm currently working on a program that takes an array of 10000 random numbers, splits it evenly with five threads, and those five threads split further into 20 threads each evenly. The purpose is to find the ultimate minimum number from that original 10000 random number array.
I believe I'm on the right track. Upon further examination, something is wrong with my array traversal on level 1. I can only see one of the first 5 thread's 20 threads return. The information that comes back appears correct, but when I try to cout the information as it is happening, I get blank space for the return minimums of 4 of the L1 thread's L2 threads. (Sorry if that's a bit confusing.)
Would anyone be able to give me any tips on what to look out for? I will provide the code below.
Bear in mind, efficiency is not the goal of this program. Demonstrating multithreading is.
#include <sys/time.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctime>
#include <iostream>
#include <pthread.h>
#include <cstdlib>
#include <unistd.h>
/*
Macros were used to make modulation easier and to ease reading of the code
*/
#define L1_THREADS 5 //# of level 1 threads
#define L2_THREADS 20 //# of level 2 threads
//#define L1_ARRAY 5 //eventually unused
#define L2_ARRAY 2000 //size of level 2 array
using namespace std;
//structure to house the Thread Parameters required
struct ThreadParameters{
int* array;
int start;
int end;
int smallest;
};
//function to find the minimum value at the bottom level of the thread spread
void* find_min(void* args){
struct ThreadParameters* specs = (struct ThreadParameters*)args;
int *array = specs->array;
int start = specs->start;
int end = specs->end;
int smallest = array[start];
for(int i = start; i < end; i++){
if(array[i] < smallest){
smallest = array[i];
}
}
specs->smallest = smallest;
return NULL;
}
//function to find the minimum value of the values returned by the threads it creates
void* find_first(void* args){
pthread_t threads2[L2_THREADS] = {0};
struct ThreadParameters thread2_parameters[L2_THREADS] = {0};
struct ThreadParameters* specs = (struct ThreadParameters*)args;
int *array = specs->array;
int start = specs->start;
int end = specs ->end;
int smallest = array[start];
//Level 1, creates the 20 threads for level 2
for(int i = 0; i < L2_THREADS; i++){
thread2_parameters[i].array = array;
thread2_parameters[i].start = i*(L2_ARRAY/L2_THREADS);
thread2_parameters[i].end = (i+1)*(L2_ARRAY/L2_THREADS);
pthread_create(&threads2[i], NULL, find_min, &thread2_parameters[i]);
}
for(int i = 0; i < L2_THREADS; i++){
pthread_join(threads2[i], NULL);
}
cout << "Minimums from L2 threads: ";
for(int i = start; i < L2_THREADS; i++){
cout << "[" << thread2_parameters[i].smallest << "]";
if(thread2_parameters[i].smallest < smallest){
smallest = thread2_parameters[i].smallest;
}
}
cout << endl;
specs->smallest = smallest;
return NULL;
}
int main(){
time_t t;
int n = 10000;
int randnum[n]; /* array of random numbers */
/* Initialize random number generator */
srand((unsigned) time(&t));
/* Generate random numbers */
for (int i = 0; i < n; i++) {
randnum[i] = rand()%10000 +1;
}
/* end of random number generation code */
pthread_t threads1[L1_THREADS] = {0};
struct ThreadParameters thread1_parameters[L1_THREADS] = {0};
//int min[L1_ARRAY];
int smallest;
smallest = randnum[0];
//Level 0, creates the first five threads for level 1
for(int i = 0; i < L1_THREADS; i++){
thread1_parameters[i].array = randnum;
thread1_parameters[i].start = i*(n/L1_THREADS);
thread1_parameters[i].end = (i+1)*(n/L1_THREADS);
pthread_create(&threads1[i], NULL, find_first, &thread1_parameters[i]);
}
for(int i = 0; i < L1_THREADS; i++){
pthread_join(threads1[i], NULL);
}
cout << "Minimums from L1 threads: ";
//finds the ultimate minimum after L1 threads return with their values
for(int i = 0; i < L1_THREADS; i++){
cout << "[" << thread1_parameters[i].smallest << "]";
if(thread1_parameters[i].smallest < smallest){
smallest = thread1_parameters[i].smallest;
}
}
cout << "\nThe ultimate minimum is " << smallest << endl;
return 0;
}
I'd like to use memory manager and I tried to use boost::pool like below, but ordered_free() doesn't free all elements.
Sample ↓
#include <iostream>
#include <boost\pool\pool.hpp>
using namespace std;
int main()
{
boost::pool<> p(sizeof(int));
int* ptr_1= (int*)p.ordered_malloc(3);
for (int i = 0; i < 3; i++)
{
ptr_1[i] = i;
}
p.ordered_free(ptr_1);
int* ptr_2 = (int*)p.ordered_malloc(3);
for (int i = 0; i < 3; i++)
{
ptr_2[i] = i;
}
p.ordered_free(ptr_2);
return 0;
}
In this case, p.ordered_free(ptr); deletes only 5 bytes and ptr_2's memory place is not same to ptr_1. Is there any way to delete all elements by using boost::pool?
Based on the Boost documentation it looks like ordered_free(ptr_1) just deletes one chunk of memory but you can use ordered_free(ptr_1, 3) to delete the whole array.
My programming is going in dead lock.I am trying to print three numbers 3 4 5 sequentially for 50 times using three threads using semaphore synchronization.
Please help me.
Below is the code
#include <iostream>
#include <pthread.h>
#include <semaphore.h>
using namespace std;
sem_t sem1;
sem_t sem2;
sem_t sem3;
void * fun1(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem1);
sem_wait(&sem3);
cout<<"3"
sem_post(&sem2);
sem_post(&sem3);
}
}
void * fun2(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem2);
sem_wait(&sem3);
cout<<"4";
sem_post(&sem3);
sem_post(&sem1);
}
}
void * fun3 (void *)
{
for(int i = 0; i< 50; i++)
{
sem_wait(&sem2);
sem_wait(&sem3);
cout<<"5";
sem_post(&sem1);
sem_post(&sem2);
}
}
int main()
{
pthread_t t1;
pthread_t t2;
pthread_t t3;
sem_init(&sem1,0,1);
sem_init(&sem2,0,0);
sem_init(&sem3,0,1);
pthread_create(&t1,NULL,&fun1,NULL);
pthread_create(&t2,NULL,&fun2,NULL);
pthread_create(&t3,NULL,&fun3,NULL);
pthread_join(t1,NULL);
pthread_join(t2,NULL);
pthread_join(t3,NULL);
return 1;
}
Please help me to understand and solve this deadlock.Provide suggestions also i can do this for example 3 4 5 6 using 4 etc threads
Please help me to understand and solve this deadlock.
There is indeed a deadlock in your code. Consider at the beginning, thread 1 first gets 2 semaphores and call cout << "3". After posting sem2 and sem3, it is possible that thread 3 immediately gets these 2 sem, then call cout << "5". However, after thread 3 posting sem1 and sem2, no one can reach a cout << statement, because sem3's value is 0 and everyone needs to pass a wait of sem3.
If you are wondering why there is totally no output, it's because the buffer inside iostream. For console output, "\n" will flush buffer, so if you replace "3" by "3\n", you can see the output.
Provide suggestions also i can do this for example 3 4 5 6 using 4 etc threads
In the following code, you should see the symmetry, which can be easily generalized to any number of thread. And you should always call sem_destroy after using semaphore, otherwise you might get system level resource leak.
#include <iostream>
#include <pthread.h>
#include <semaphore.h>
using namespace std;
sem_t sem1;
sem_t sem2;
sem_t sem3;
void * fun1(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem1);
cout<<"3\n";
sem_post(&sem2);
}
}
void * fun2(void *)
{
for(int i = 0; i < 50 ; i++)
{
sem_wait(&sem2);
cout<<"4\n";
sem_post(&sem3);
}
}
void * fun3 (void *)
{
for(int i = 0; i< 50; i++)
{
sem_wait(&sem3);
cout<<"5\n";
sem_post(&sem1);
}
}
int main()
{
pthread_t t1;
pthread_t t2;
pthread_t t3;
sem_init(&sem1,0,1);
sem_init(&sem2,0,0);
sem_init(&sem3,0,0);
pthread_create(&t1,NULL,&fun1,NULL);
pthread_create(&t2,NULL,&fun2,NULL);
pthread_create(&t3,NULL,&fun3,NULL);
pthread_join(t1,NULL);
pthread_join(t2,NULL);
pthread_join(t3,NULL);
sem_destroy(&sem1);
sem_destroy(&sem2);
sem_destroy(&sem3);
return 1;
}
I have following code with one dynamically allocated array "data". I am passing array size as a command line argument. The program works fine until datasize = 33790. It gives segmentation fault if I try to provide a value > 33790.
"33790" might be machine specific. I am trying to understand why a dynamically allocated memory would return seg fault after a particular size. Any help is welcome. :)
#include "iostream"
#include <stdlib.h>
#include "iomanip"
#include "ctime"
#define N 100000
using namespace std;
int main(int argc, char* argv[])
{
int a;
cout<<"Size of int : "<<sizeof(int)<<endl;
long int datasize = strtol(argv[1],NULL,0);
cout<<"arg1 : "<<datasize<<endl;
double sum = 0;
int *data;
data = new int(datasize);
clock_t begin = clock();
for(int i = 0; i < N; i++) //repeat the inner loop N times
{
//fetch the data into the cache
//access it multiple times in order to amortize the compulsory miss latency
for (long int j = 0; j < datasize; j++)
{
sum += data[j]; //get entire array of data inside cache
}
}
clock_t end = clock();
double time_spent = (double) (end - begin);
cout<<"sum = "<<sum<<endl;
cout<<"Time Spent for data size = "<<argv[1]<<" is "<<time_spent<<endl;
delete[] data;
return 0;
}
You are not allocating any arrays (having multiple elements) but allocating only one int having value datasize.
Use new int[datasize] instead of new int(datasize) to allocate an array of int having datasize elements.
I'm trying to benchmark my implementation of merge sort using openmp. I have written the following code.
#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>
#include <omp.h>
using namespace std;
class Sorter {
private:
int* data;
int size;
bool isSorted;
public:
Sorter(int* data, int size){
this->data = data;
this->size = size;
this->isSorted = false;
}
void sort(){
vector<int> v(data,data+size);
vector<int> ans = merge_sort(v);
copy(ans.begin(),ans.end(),data);
isSorted = true;
}
vector<int> merge_sort(vector<int>& vec){
if(vec.size() == 1){
return vec;
}
std::vector<int>::iterator middle = vec.begin() + (vec.size() / 2);
vector<int> left(vec.begin(), middle);
vector<int> right(middle, vec.end());
#pragma omp parallel sections
{
#pragma omp section
{left = merge_sort(left);}
#pragma omp section
{right = merge_sort(right);}
}
return merge(vec,left, right);
}
vector<int> merge(vector<int> &vec,const vector<int>& left, const vector<int>& right){
vector<int> result;
unsigned left_it = 0, right_it = 0;
while(left_it < left.size() && right_it < right.size()) {
if(left[left_it] < right[right_it]){
result.push_back(left[left_it]);
left_it++;
}else{
result.push_back(right[right_it]);
right_it++;
}
}
while(left_it < left.size()){
result.push_back(left[left_it]);
left_it++;
}
while(right_it < right.size()){
result.push_back(right[right_it]);
right_it++;
}
return result;
}
int* getSortedData(){
if(!isSorted){
sort();
}
return data;
}
};
void printArray(int* array, int size){
for(int i=0;i<size;i++){
cout<<array[i]<<", ";
}
cout<<endl;
}
bool isSorted(int* array, int size){
for(int i=0;i<size-1;i++){
if(array[i] > array[i+1]) {
cout<<array[i]<<" > "<<array[i+1]<<endl;
return false;
}
}
return true;
}
int main(int argc, char** argv){
if(argc<3){
cout<<"Specify size and threads"<<endl;
return -1;
}
int size = atoi(argv[1]);
int threads = atoi(argv[2]);
//omp_set_nested(1);
omp_set_num_threads(threads);
cout<<"Merge Sort of "<<size<<" with "<<omp_get_max_threads()<<endl;
int *array = new int[size];
srand(time(NULL));
for(int i=0;i<size;i++){
array[i] = rand() % 100;
}
//printArray(array,size);
Sorter* s = new Sorter(array, size);
cout<<"Starting sort"<<endl;
double start = omp_get_wtime();
s->sort();
double stop = omp_get_wtime();
cout<<"Time: "<<stop-start<<endl;
int* array2 = s->getSortedData();
if(size<=10)
printArray(array2,size);
cout<<"Array sorted: "<<(isSorted(array2,size)?"yes":"no")<<endl;
return 0;
}
The program runs correctly, but when i specify the number of threads to be, say 4, the program still creates only 2 threads. I tried using omp_set_nested(1) before omp_set_num_threads(threads) but that hands the whole terminal until the program crashes and says "libgomp: Thread creation failed: Resource temporarily unavailable" I think because too many threads are created? I haven't found a work around it yet.
Edit:
After the program crashes, I check the system load and it shows the load to be over 1000!
I have a 4-core AMD A8 CPU and 10GB RAM
If I uncomment omp_set_nested(1) and run the program
$ ./mergeSort 10000000 4
Merge Sort of 10000000 with 4
Starting sort
libgomp: Thread creation failed: Resource temporarily unavailable
libgomp: Thread creation failed: Resource temporarily unavailable
$ uptime
02:14:12 up 1 day, 11:13, 4 users, load average: 482.21, 522.87, 338.75
Watching the processes, I can spot 4 threads being launched. If I comment out the omp_set_nested(1) the program runs normally but only uses 2 threads
Edit:
If i use tasks and remove omp_set_nested then it launches the threads correctly, but it doesn't speed up. Execution with 1 thread becomes faster than with 4. With sections, it speeds up. but only by a factor less than two (as it launches only 2 threads at a time)
I tested your code and it did create 4 or more threads, didn't get what you meant exactly. Also I suggest you to change omp section to omp task, as by definition in a section only 1 thread handles a given section and in your recursive call you would never utilize your idle threads.