parallelizing (openmp) a function call causes memory error - c++

I'm calling a class function in parallel using OpenMP. It's working okay in serial or putting the my_func part in the critical section (slow apparently). Running in parallel, however, keeps giving the following error message,
malloc: *** error for object 0x7f961dcef750: pointer being freed was
not allocated
I think the problem is with the new operators in the my_func, i.e., The pointer myDyna seems to be shared among threads. My questions are,
1)Isn't anything inside the parallel region private, including all the pointers in the my_func? Meaning that, each thread should have its own copy of myDyna, but why the error occurred?
2)Without changing the my_func too much, what can be done to make the parallelization work at the main level? For example, would it work by adding a copy constructor in the myDyna?
void my_func(){
Variables *theVars = new Variables();
//trDynaPS Constructor for the class.
trDynaPS *myDyna = new trDynaPS(theVars, starttime, stepsize, stoptime);
ResultRate = myDyna->trDynaPS_Drive(1, 1);
if (theVars != nullptr) {
myDyna->theVars = nullptr;
delete theVars;
}
delete myDyna;
};
int main()
{
#pragma omp parallel for
{
for (int i = 0;i<10;i++){
//I have multiple copies of myclass as a vector
myclass[i]->run(); //my_func is in the run operation
}
}
return 0;
}

Related

How allocate dynamic memory in OpenMP C++

I tried to parallel function, that allocates memory, but I had an exception of bad heap. Memory must have used some threads in one time.
void GetDoubleParameters( CInd *ci )
{
for(int i=0;i<ci->size();i++)
{
void *tmp;
#pragma omp parallel private (tmp)
{
for(int j=0;j<ci[i].getValues().size();j++)
{
tmp = (void*)new double(ci[i].getValues()[j]);
ci->getParameters().push_back(tmp);
}
}
}
}
The problem is the line:
ci->getParameters().push_back(tmp);
ci is accessed by all parallel threads at once, and its parameters element with the push_back routine (probably a std::vector) is probably not thread-safe.
You will have to organize some guards around this code. Something like:
omp_lock_t my_lock;
...
// initialize lock
omp_init_lock (&my_lock);
...
// do something sensible in parallel
...
{
omp_guard my_guard (my_lock);
// protected region starts here
// only one thread at a time works here
}
// more parallel work
...
}
omp_destroy_lock (&my_lock);

Creating std threads in C++ crashes the program

Whenever I execute the following piece of code using threads, the program has this error:
Debug Error!
Program: ... /path/to/.exe
abort() has been called
I want to create a thread that calls a member function. Here is the function I am using:
void ServerVote::createConnexionThreads()
{
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads.push_back(&(std::thread(&ServerVote::acceptConnection,*this, i)));
}
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads[i]->join();
}
}
I can provide additional code if required. When using the debugger, I find that the program crashes right after the first thread is created, after the thread is pushed_back. ~thread() is then called and it crashes inside this function. Here is the vector declaration:
std::vector<std::thread*> m_connexionThreads;
I am using Visual Studio 2015. The acceptConnection function has a while(true) inside it and is planned to be terminated later.
Edit:
Thank you for your answers, but I cannot compile when using a thread object instead of a pointer. So when I try to push into this vector:
std::vector<std::thread> m_connexionThreads;
for (int i = 0; i <= 50; ++i)
{
m_connexionThreads.push_back((std::thread(&ServerVote::acceptConnection,*this, i)));
}
I get this error while compiling:
error C2280: 'std::thread::thread(const std::thread &)': attempting to reference a deleted function
You should not try to use address of the temporary in any context. As a matter of fact, this is a bug in MSVC which allows this code. Any standard-conforming compiler would produce an error here.
Instead, you should use the thread object like this (see my edit below the code on why this is preferred):
#include <thread>
#include <vector>
void acceptConnection(int);
void foo() {
std::vector<std::thread> vec;
for (int i = 0; i <= 50; ++i)
vec.push_back(std::thread(acceptConnection, i));
}
Why this approach is preferred over using an allocated pointer to the thread object? There are multiple benefits:
It is less typing - and even if nothing else, all things being equal (though they are not!) less typing wins over more typing.
It takes caution to use the pointers. For instance, you shouldn't use the raw pointer as vector data type, you should use unique_ptr to ensure automatic memory cleanup - which makes the syntax even uglier!
Using dynamically allocated memory is a drag on performance. You are hit twice - first time when you allocate memory, second time when you free it. Why suffer this penalty?
You are creating a local instance of thread in stack, taking its address and pushing it to the vector. The thread object will be deleted on exit of the method, so you will be left with a pointer to a deleted object.
You should use new to create the thread object in heap so it will not be deleted on method exit, or not use pointers to thread objects.

Boost threads running serially, not in parallel

I'm a complete newbie to multi-threading in C++, and decided to start with the Boost Libraries. Also, I'm using Intel's C++ Compiler (from Parallel Studio 2011) with VS2010 on Vista.
I'm coding a genetic algorithm, and want to exploit the benefits of multi-threading: I want to create a thread for each individual (object) in the population, in order for them to calculate their fitness (heavy operations) in parallel, to reduce total execution time.
As I understand it, whenever I launch a child thread it stars working "in the background", and the parent thread continues to execute the next instruction, right? So, I thought of creating and launching all the child threads I need (in a for loop), and then wait for them to finish (call each thread's join() in another for loop) before continuing.
The problem I'm facing is that the first loop won't continue to the next iteration until the newly created thread is done working. Then, the second loop is as good as gone, since all the threads are already joined by the time that loop is hit.
Here are (what I consider to be) the relevant code snippets. Tell me if there is anything else you need to know.
class Poblacion {
// Constructors, destructor and other members
// ...
list<Individuo> _individuos;
void generaInicial() { // This method sets up the initial population.
int i;
// First loop
for(i = 0; i < _tamano_total; i++) {
Individuo nuevo(true);
nuevo.Start(); // Create and launch new thread
_individuos.push_back(nuevo);
}
// Second loop
list<Individuo>::iterator it;
for(it = _individuos.begin(); it != _individuos.end(); it++) {
it->Join();
}
_individuos.sort();
}
};
And, the threaded object Individuo:
class Individuo {
private:
// Other private members
// ...
boost::thread _hilo;
public:
// Other public members
// ...
void Start() {
_hilo = boost::thread(&Individuo::Run, this);
}
void Run() {
// These methods operate with/on each instance's own attributes,
// so they *can't* be static
generaHoc();
calculaAptitud();
borraArchivos();
}
void Join() {
if(_hilo.joinable()) _hilo.join();
}
};
Thank you! :D
If that's your real code then you have a problem.
for(i = 0; i < _tamano_total; i++) {
Individuo nuevo(true);
nuevo.Start(); // Create and launch new thread
_individuos.push_back(nuevo);
}
void Start() {
_hilo = boost::thread(&Individuo::Run, this);
}
This code creates a new Individuo object on the stack, then starts a thread that runs, passing the thispointer of that stack object to the new thread. It then copies that object into the list, and promptly destroys the stack object, leaving a dangling pointer in the new thread. This gives you undefined behaviour.
Since list never moves an object in memory once it has been inserted, you could start the thread after inserting into the list:
for(i = 0; i < _tamano_total; i++) {
_individuos.push_back(Individuo(true)); // add new entry to list
_individuos.back().Start(); // start a thread for that entry
}

OpenMP: Causes for heap corruption, anyone?

EDIT: I can run the same program twice, simultaneously without any problem - how can I duplicate this with OpenMP or with some other method?
This is the basic framework of the problem.
//Defined elsewhere
class SomeClass
{
public:
void Function()
{
// Allocate some memory
float *Data;
Data = new float[1024];
// Declare a struct which will be used by functions defined in the DLL
SomeStruct Obj;
Obj = MemAllocFunctionInDLL(Obj);
// Call it
FunctionDefinedInDLL(Data,Obj);
// Clean up
MemDeallocFunctionInDLL(Obj);
delete [] Data;
}
}
void Bar()
{
#pragma omp parallel for
for(int j = 0;j<10;++j)
{
SomeClass X;
X.Function();
}
}
I've verified that when some memory is attempted to be deallocated through MemDeallocFunctionInDLL(), the _CrtIsValidHeapPointer() assertion fails.
Is this because both threads are writing to the same memory?
So to fix this, I thought I'd make SomeClass private (this is totally alien to me, so any help is appreciated).
void Bar()
{
SomeClass X;
#pragma omp parallel for default(shared) private(X)
for(int j = 0;j<10;++j)
{
X.Function();
}
}
And now it fails when it tries to allocate memory in the beginning for Data.
Note: I can make changes to the DLL if required
Note: It runs perfectly without #pragma omp parallel for
EDIT: Now Bar looks like this:
void Bar()
{
int j
#pragma omp parallel for default(none) private(j)
for(j = 0;j<10;++j)
{
SomeClass X;
X.Function();
}
}
Still no luck.
Check out MemAllocFunctionInDLL, FunctionDefinedInDLL, MemDeallocFunctionInDLL are thread-safe, or re-entrant. In other words, do these functions static variables or shared variables? In such case, you need to make it sure these variables are not corrupted by other threads.
The fact without omp-for is fine could mean you didn't correctly write some functions to be thread-safe.
I'd like to see what kind of memory allocation/free functions has been used in Mem(Alloc|Dealloc)FunctionInDLL.
Added: I'm pretty sure your functions in DLL is not thread-safe. You can run this program concurrently without problem. Yes, it should be okay unless your program uses system-wide shared resources (such as global memory or shared memory among processes), which is very rare. In this case, no shared variables in threads, so your program works fine.
But, invoking these functions in mutithreads (that means in a single process) crashes your program. It means there are some shared variables among threads, and it could have been corrupted.
It's not a problem of OpenMP, but just a multithreading bug. It could be simple to solve this problem. Please take a look the DLL functions whether they are safe to be called in concurrent by many threads.
How to privatize static variables
Say that we have such global variables:
static int g_data;
static int* g_vector = new int[100];
Privatization is nothing but a creating private copy for each thread.
int g_data[num_threads];
int* g_vector[num_threads];
for (int i = 0; i < num_threads; ++i)
g_vector[i] = new int[100];
And, then any references on such variables are
// Thread: tid
g_data[tid] = ...
.. = g_vector[tid][..]
Yes, it's pretty simple. However, this sort of code may have a false sharing problem. But, false sharing is a matter of performance, not correctness.
First, just try to privatize any static and global variables. Then, check it correctness. Next, see the speedup you would get. If the speedup is scalable (say 3.7x faster on quad core), then it's okay. But, in case of low speedup (such as 2x speedup on quad core), then you probably look at the false sharing problem. To solve false sharing problem, all you need to do is just putting some padding in data structures.
Instead of
delete Data
you must write
delete [] Data;
Wherever you do new [], make sure to use delete [].
It looks like your problem is not specific to openmp. Did you try to run your application without including #pragma parallel?
default(shared) means all variables are shared between threads, which is not what you want. Change that to default(none).
Private(X) will make a copy of X for each thread, however, none of them will be initialised so any construction will not necessarily be performed.
I think you'd be better with your initial approach, put a breakpoint in the Dealloc call, and see what the memory pointer is and what it contains. You can see the guard bytes to tell if the memory has been overwritten at the end of a single call, or after a thread.
Incidentally, I am assuming this works if you run it once, without the omp loop?

Deleting a pointer a different places results in different behaviors (crash or not)

This question is a refinement of this one, which went in a different direction than expected.
In my multithreaded application, the main thread creates parameters and stores them:
typedef struct {
int parameter1;
double parameter2;
float* parameter3;
} jobParams;
typedef struct {
int ID;
void* params;
} jobData;
std::vector<jobData> jobs;
// main thread
for (int i = 0; i < nbJobs; ++i) {
jobParams* p = new jobParams;
// fill and store params
jobData data;
data.ID = i;
data.params = p;
jobs.push_back(data);
}
// start threads and wait for their execution
// delete parameters
for (int i = 0; i < jobs.size(); ++i) {
delete jobs[i].params;
}
Then, each thread gets a pointer to a set of parameters, and calls a job function with it:
// thread (generic for any job function and any type of params)
jobData* job = main->getNextParams();
jobFunction(job->ID, job->params);
The whole thing takes void* as argument to be able to use any structure for the parameters, but then the job function casts it back to the right struct:
void* jobFunction(void* param) {
jobParams* params = (jobParams*) param;
// do stuff
return 0;
}
My problem is the following: if I delete params at the end of jobFunction(), it works perfectly. However, I'd prefer to have the deletion taken care of by the threads or the main thread, such that I don't have to remember to delete the params for each jobFunction() that I write.
If I try to delete params just after calling jobFunction() in the treads, or even in the main thread after being sure that all threads are done (and thus the params are not needed anymore), I get a heap corruption error:
HEAP[prog]: Invalid Address specified to RtlFreeHeap( 02E90000, 03C2EE38 )
I'm using Visual Studio 2008 Pro, and I thus can't use valgrind or other *nix tools for debugging. All access to the main thread from the "child threads" are synchronized using a mutex, so the problem is not that I delete the same parameters twice.
In fact, by using VS memory viewer, I know that the memory pointed by the jobParams pointer does not change between the end of jobFunction() and the point where I try to delete it (either in the main thread or in the "child threads").
I added the definition of both structures, as well as the way I'd like to delete the params.
Just as a thought .. can you try
for (int i = 0; i < jobs.size(); ++i) {
delete (jobParams*)jobs[i].params;
}
newing a type jobParams and then deleteing a void* might be the cause of your problems.
Is there any reason you store params as a void* in jobData? I'd argue if you wish to have different types of jobParams then you should be using an inheritance hierarchy and not blindly casting to a void*.
That sort of bug generally means you have a data race somewhere. Does main->getNextParams() do the right thing even if it's called by several threads at once? If it gives the same params to both, you could have a double-free in your hands.
Also, instead of
jobFunction(jobData->ID, jobData->params);
You probably meant
jobFunction(job->ID, job->params);
To debug it you could add a deleted member to the jobParams class and set that to true instead of actually deleting the object. Then see check the deleted flag in every method of jobParams and throw an exception if it's true. Then see where the exception gets thrown.