Multithreaded matrix multiplication in C++ - c++

I've been having trouble with this parallel matrix multiplication code, I keep getting an error when trying to access a data member in my structure.
This is my main function:
struct arg_struct
{
int* arg1;
int* arg2;
int arg3;
int* arg4;
};
int main()
{
pthread_t allthreads[4];
int A [N*N];
int B [N*N];
int C [N*N];
randomMatrix(A);
randomMatrix(B);
printMatrix(A);
printMatrix(B);
struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct));
args.arg1 = A;
args.arg2 = B;
int x;
for (int i = 0; i < 4; i++)
{
args.arg3 = i;
args.arg4 = C;
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args);
if(x!=0)
exit(1);
}
return 0;
}
and the matrixMultiplication method used from another C file:
void *matrixMultiplication(void* arguments)
{
struct arg_struct* args = (struct arg_struct*) arguments;
int block = args.arg3;
int* A = args.arg1;
int* B = args.arg2;
int* C = args->arg4;
free(args);
int startln = getStartLineFromBlock(block);
int startcol = getStartColumnFromBlock(block);
for (int i = startln; i < startln+(N/2); i++)
{
for (int j = startcol; j < startcol+(N/2); j++)
{
setMatrixValue(C,0,i,j);
for(int k = 0; k < N; k++)
{
C[i*N+j] += (getMatrixValue(A,i,k) * getMatrixValue(B,k,j));
usleep(1);
}
}
}
}
Another error I am getting is when creating the thread: "invalid conversion from ‘void ()(int, int*, int, int*)’ to ‘void* ()(void)’ [-fpermissive]
"
Can anyone please tell me what I'm doing wrong?

First you mix C and C++ very badly, either use plain C or use C++, in C++ you can simply use new and delete.
But the reason of your error is you allocate arg_struct in one place and free it in 4 threads. You should allocate one arg_struct for each thread

Big Boss is right in the sense that he has identified the problem, but to add to/augment the reply he made.
Option 1:
Just create an arg_struct in the loop and set the members, then pass it through:
for(...)
{
struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct));
args->arg1 = A;
args->arg2 = B; //set up args as now...
...
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args);
....
}
keep the free call in the thread, but now you could then use the passed struct directly rather than creating locals in your thread.
Option 2:
It looks like you want to copy the params from the struct internally to the thread anyway so you don't need to dynamically allocate.
Just create an arg_struct and set the members, then pass it through:
arg_struct args;
//set up args as now...
for(...)
{
...
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)&args);
}
Then remove the free call.
However as James pointed out you would need to synchronize in the thread/parent on the structure to make sure that it wasn't changed. That would mean a Mutex or some other mechanism. So probably the move of the allocation to the for loop is easier to begin with.
Part 2:
I'm working on windows (so I can't experiment currently), but pthread_create param 3 is referring to the thread function matrixMultiplication which is defined as void* matrixMultiplication( void* ); - it looks correct to me (signature wise) from the man pages online, void* fn (void* )
I think I'll have to defer to someone else on your second error. Made this post a comunnity wiki entry so answer can be put into this if desired.

It's not clear to me what you are trying to do. You start some threads,
then you return from main (exiting the process) before getting any
results from them.
In this case, I'ld probably not use any dynamic allocation, directly.
(I would use std::vector for the matrices, which would use dynamic
allocation internally.) There's no reason to dynamically allocate the
arg_struct, since it can safely be copied. Of course, you'll have to
wait until each thread has successfully extracted its data before
looping to construct the next thread. This would normally be done using
a conditional: the new thread would unblock the conditional once it has
extracted the arguments from the arg_struct (or even better, you could
use boost::thread, which does this part for you). Alternatively, you
could use an array of arg_struct, but there is absolutely no reason to
allocate them dynamically. (If for some reason you cannot use
std::vector for A, B and C, you will want to allocate these
dynamically, in order to avoid any risk of stack overflow. But
std::vector is a much better solution.)
Finally, of course, you must wait for all of the threads to finish
before leaving main. Otherwise, the threads will continue working on
data that doesn't exist any more. In this case, you should
pthread_join all of the threads before exiting main. Presumably,
too, you want to do something with the results of the multiplication,
but in any case, exiting main before all of the threads have finished
accessing the matrices will cause undefined behavior.

Related

Intermittent application crash when execute pthread_join

Sometimes my application crash when executing pthread_join and sometime it is OK. Can someone please advise what could be the problem with my code below?
functionA will pass some arguments and create a thread that do some calculation and store the result into ResultPool (global) for later use. The functionA will be called few times and each time it passes different arguments and create a new thread. All the thread_id will be store in global variable and at the end of the execution, the thread_id will be retrieved from the ThreadIdPool and check the completion of the thread, and then output the result from the ResultPool. The thread status checking and output the result are at different class and the ThreadIdPool is a global variable.
The threadCnt will be initialized to -1 before start of functionA and it is defined somewhere in my code.
int threadCnt;
struct ThreadData
{
int td_tnum;
float td_Freq;
bool td_enablePlots;
int td_ifBin;
int td_RAT;
};
typedef struct ThreadData structThreadDt;
void *thread_A(void *td);
map<int, float> ResultPool;
map<int, pthread_t> ThreadIdPool;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
pthread_t thread_id[10];
void FunctionA(int tnum, float msrFrequency, bool enablePlots)
{
//Pass the value to the variables.
int ifBin;
int RAT;
/*
Some calculation here and the results are assigned to ifBin and RAT
*/
structThreadDt *td;
td =(structThreadDt *)malloc(sizeof(structThreadDt));
td->td_tnum = tnum;
td->td_Freq = msrFrequency;
td->td_enablePlots = enablePlots;
td->td_ifBin = ifBin;
td->td_RAT = RAT;
threadCnt = threadCnt+1;
pthread_create(&thread_id[threadCnt], NULL, thread_A, (void*) td);
//Store the thread id to be check for the status later.
ThreadIdPool[tnum]=thread_id[threadCnt];
}
void* thread_A(void* td)
{
int ifBin;
int RAT;
bool enablePlots;
float msrFrequency;
int tnum;
structThreadDt *tds;
tds=(structThreadDt*)td;
enablePlots = tds->td_enablePlots;
msrFrequency = tds->td_Freq;
tnum = tds->td_tnum;
ifBin = tds->td_ifBin ;
RAT = tds->td_RAT;
/*
Do some calculation here with those ifBIN, RAT, TNUM and frequency.
*/
//Store the result to shared variable with mutex lock
pthread_mutex_lock( &mutex2 );
ResultPool[tnum] = results;
pthread_mutex_unlock( &mutex2 );
free(tds);
return NULL;
}
And here is the threadId status checking. It will first iterate the ThreadIdPool to retrieve the threadID and check the completion of the thread. If the thread is completed, it will output the result. The pthread_join execution will sometimes crash my application.
void StatusCheck()
{
int tnum;
pthread_t threadiD;
map<int, pthread_t>::iterator itr;
float res;
int ret;
//Just to make sure it has been done
for (itr = ThreadIdPool.begin(); itr != ThreadIdPool.end(); ++itr) {
tnum = itr->first;
threadiD = itr->second;
//Check if the thread is completed before get the results.
ret=pthread_join(threadiD, NULL);
if (ret!=0)
{
cout<<"Tnum="<<tnum<<":Error in joining thread."<<endl;
}
res = ResultPool[tnum];
cout<<"Results="<<res<<endl;
}
}
This will be a global answer :
First of all, your code is 99% C and 1% C++. I don't know why, but if you want to write C++ write C++, not C like code. Or do C, that can be what you need.
For example, you are using ton of global function, static array, raw pointers etc. Replace them with classes and methods, std::array, smart_pointers etc. The STL is here to be used. You can write a class to wrap your pthread object and, instead of having free functions, use a constructor. If smart pointers are not available, replace your mallocs / free with (at least) new and delete. By the way, NULL as its equivalent in nullptr in C++.
Secondly, DO NOT USE GLOBAL VARIABLES. It is not necessary in 99.99% of the case as it can be variable declared then passed as pointers / references to functions.
For what can crash your program there are several things to test :
Are you variables correctly initialized ?
You said that threadCount is initialized with -1. Why ? Since it is a count it should has start at 0, or maybe it is an index and not a count.
If you can, give use more informations :
Where are these functions used, and how, by who ?
What is you compiler and which version are you using ?
What is the C++ version you are using ?
What is the goal of this projet ? Maybe there are better ways of doing it.
One problem I see is that when you collect the data there is an access to result_pool without a lock. One of the threads that is still running could be accessing result_pool adding more keys to it at the same time you're accessing it to collect the data.

C++ threads and variables

I have a problem within the program that I write. I have functions returning pointers and within the main() I want to run them in threads.
I'm able to execute the functions in threads:
double* SplitFirstArray_1st(double *arr0){
const UI arrSize = baseElements/4;
std::cout << "\n1st split: \n";
double *arrSplited1=nullptr;
arrSplited1 = new double [arrSize];
for(UI i=0; i<arrSize; i++){
arrSplited1 = arr0;
}
for(UI j=0; j< arrSize; ++j){
std::cout << arrSplited1[j] << " ";
}
return arrSplited1;
delete [] arrSplited1, arr0;
}
in main()
std::thread _th1(SplitFirstArray_1st, rootArr);
_th1.join();
The above is not what I'm after. I have another pointer:
*arrTh1=nullptr;
I would like to use it in a thread so it would be assigned with the value returned by my function SplitFirstArray_1st
arrTh1 = SplitFirstArray_1st(xxx);
Is such action is possible to be executed in a tread ?
Don't return the variable, pass a pointer to the variable and set the value at what this points too.
i.e.:
void set_int(int* toset) {
*toset = 4;
}
This works fine with things that are already pointers:
void set_ptr(int** toset) {
*toset = new int[4];
// ...
*toset[0] = 2;
}
You can know the data is safe to use if the function has returned.
Completely unrelated note:
return foo;
// No point placing code here unless you used goto as it won't get executed.
// Also: don't use goto.
}
Something like this:
std::thread _th1([&]() { arrTh1 = SplitFirstArray_1st(rootArr); });
Functions which start the thread cannot return values in a normal way. Therefore they should be declared as void.
Common way is to assign a protected global variable. You should protect one with mutexes (or other methods) to avoid races.
mutex m;
double *arrTh1 = nullptr;
double* aSplitFirstArray_1st(double *arr0){
...
m.lock();
arrTh1 = arrSplited1;
m.unlock();
}
When you use the pointer in other threads (including the main one), you need to protect the usage as well with the same mutex (or choose other methods).
and please, do not delete arrSopited1 and arr0. it will make the arrTh1 pointer unusable.
Note, if you use async functions, you could use futures to return values.

Creating an array of objects causes an issue

I create the two following objects:
bool Reception::createNProcess()
{
for (int y = 0; y < 3; ++y)
{
Process *pro = new Process; // forks() at construction
Thread *t = new Thread[5];
this->addProcess(pro); // Adds the new process to a vector
if (pro->getPid() == 0)
{
for (int i = 0; i < 5; ++i)
{
pro->addThread(&t[i]); // Adds the new thread to a vector
t[i].startThread();
}
}
}
Where I create 3 processes (that I have encapsulated in Process) and create 5 threads in each of these processes.
But I'm not sure the following line is correct:
Thread *t = new Thread[5];
Because my two functions addProcess and addThread both take a pointer to Process and Thread respectively and yet the compiler asks for a reference to t[i] for addThread and I don't understand why.
void Process::addThread(Thread *t)
{
this->threads_.push_back(t);
}
void Reception::addProcess(Process *p)
{
this->createdPro.push_back(p);
}
createdPro is defined in the Reception class as follows:
std::vector<Process *> createdPro;
and threads_ in the Process class like such:
std::vector<Thread *> threads_;
And the error message (as obvious as it is) is as follows:
error: no matching function for call to ‘Process::addThread(Thread&)’
pro->addThread(t[i]);
process.hpp:29:10: note: candidate: void Process::addThread(Thread*)
void addThread(Thread *);
process.hpp:29:10: note: no known conversion for argument 1 from ‘Thread’ to ‘Thread*’
Even though I defined my Thread as a pointer.
You have defined the member function to take a pointer:
void Process::addThread(Thread *t)
{
...
}
You then invoke this function for &t[i], which is a pointer and should work perfectly:
pro->addThread(&t[i]); // Adds the new thread to a vector
You could also invoke it with t+i and it would still be ok. However your error message tells us something different: the compiler doesn't find a match for pro->addThread(t[i]); (i.e. the & is missing).
Either you made a typo in your question, or you made a typo in your code. Or you have another invocation somewhere where you've forgotten the ampersand: t[i] would of course designate an object (it's equivalent to *(t+i) ) and not a pointer, and cause the error message you have (demo mcve)

OLE macro in a for loop

According to MSDN documentation for OLE conversion macros, if we use a macro in a for loop for example, it may end up allocating more memory on stack leading to stack overflow.
This is the example provided on MSDN
void BadIterateCode(LPCTSTR* lpszArray)
{
USES_CONVERSION;
for (int ii = 0; ii < 10000; ii++)
pI->SomeMethod(ii, T2COLE(lpszArray[ii]));
}
In the above example T2COLE is used inside a for loop which may lead to stack overflow, to avoid this the method call is encapsulated into a function like this
void CallSomeMethod(int ii, LPCTSTR lpsz)
{
USES_CONVERSION;
pI->SomeMethod(ii, T2COLE(lpsz));
}
void MuchBetterIterateCode2(LPCTSTR* lpszArray)
{
for (int ii = 0; ii < 10000; ii++)
CallSomeMethod(ii, lpszArray[ii]);
}
Can we just send the LPCTSTR to another function instead of encapsulating the whole method like this,
LPCOLESTR CallSomeMethod(LPCTSTR lpsz)
{
USES_CONVERSION;
return T2COLE(lpsz);
}
void BadIterateCode(LPCTSTR* lpszArray)
{
for (int ii = 0; ii < 10000; ii++)
pI->SomeMethod(ii, CallSomeMethod(lpszArray[ii]));
}
Can anyone tell me if it is safe use of OLE macro or still we may run into stack overflow?
Will there be any other issues by using the above method?
The third example will not work because the T2COLE object created in the method will be destroyed as soon as you return from the function. As you note in your question, the object is created on the stack, and the usual stack rules apply in this situation - the object will be destroyed as soon as you go out of scope, and you'll be accessing garbage data in the 3rd case.
The second case is the correct mechanism to use for using the data without triggering a stack overflow as upon return from the function, the memory that was allocated by the T2COLE will be freed.
I'm not aware of how the implementation of T2COLE works, but in C, you could achieve the same behaviour by using the alloca function which suffers from the same issue - as soon as you return from the function, you should consider the pointer and the data that it points at as invalid.

Accessing Function Variable After calling it while Being in main()

I want to access variable v1 & v2 in Func() while being in main()
int main(void)
{
Func();
int k = ? //How to access variable 'v1' which is in Func()
int j = ? //How to access variable 'v2' which is in Func()
}
void Func()
{
int v1 = 10;
int v2 = 20;
}
I have heard that we can access from Stack. But how to do.
Thank you.
You can't legally do that. Automatic variables disappear once execution leaves the scope they're declared in.
I'm sure there are tricks, like inspecting the stack and going "backwards" in time, but all such tricks are platform-dependent, and might break if you, for instance, cause the stack to be overwritten in main().
Why do you want to do that? Do you want those values as return values? I would introduce a struct for that, according to the meaning of the values the struct would get a suitable name
struct DivideResult {
int div;
int rem;
};
DivideResult Func() {
DivideResult r = { 10, 20 };
return r;
}
int main() {
DivideResult r = Func();
}
Otherwise, such variables are for managing local state in the function while it is activated. They don't have any meaning or life anymore after the function terminated.
Some ways you can do this are:
Declare the variables in main() and pass them by pointer or reference into Func()
Return the variable, or vector< int >, or a struct that you made, etc. of the variables to main()
Dynamically allocate the variables in Func(), and return a pointer to them. You would then have to remember to delete the allocated memory later as well.
But there is no access to the stack of Func() from main() that is standard.
You can't do that portably. When Func()'s stack frame disappears, there's no reliable way to access it. It's free to be trampled. However, in x86-64, there is something known as the red zone, which is a 128B area below the stack pointer that is safe from trampling, and theoretically you might still be able to access it, but this is not portable, easy, nor correct. Simply put, don't do it.
Here's how I would do it:
int main(void)
{
int k, j;
Func(&k, &j);
}
void Func(int *a, int *b)
{
*a = 10;
*b = 20;
}
You're in C/C++ land. There are little you cannot do.
If this your own code, you shouldn't even try to do that. Like others suggested: pass a output parameter by reference (or by pointer in C) or return the values in a struct.
However, since you asked the question, I assume you are attempting to look into something you only have binary access to. If it is just an one time thing, using a debugger will be easier.
Anyway, to answer your original question, try the following code. You have to compile it in for x86 CPU, with optimization and any stack debug flag turned off.
void f() {
int i = 12345;
int j = 54321;
}
int main()
{
int* pa = 0;
int buf[16] = {0};
f();
// get the stack pointer
__asm {
mov dword ptr [pa],ESP
}
// copy the stack, try not to do anything that "use" the stack
// before here
for (int i = 0; i < 16; ++i, --pa) {
buf[i] = *pa;
}
// print out the stack, assuming what you want to see
// are aligned at sizeof(int)
for (int i = 0; i < 16; ++i) {
std::cout << i << ":" << buf[i] << std::endl;
}
return 0;
}