Intermittent application crash when execute pthread_join - c++

Sometimes my application crash when executing pthread_join and sometime it is OK. Can someone please advise what could be the problem with my code below?
functionA will pass some arguments and create a thread that do some calculation and store the result into ResultPool (global) for later use. The functionA will be called few times and each time it passes different arguments and create a new thread. All the thread_id will be store in global variable and at the end of the execution, the thread_id will be retrieved from the ThreadIdPool and check the completion of the thread, and then output the result from the ResultPool. The thread status checking and output the result are at different class and the ThreadIdPool is a global variable.
The threadCnt will be initialized to -1 before start of functionA and it is defined somewhere in my code.
int threadCnt;
struct ThreadData
{
int td_tnum;
float td_Freq;
bool td_enablePlots;
int td_ifBin;
int td_RAT;
};
typedef struct ThreadData structThreadDt;
void *thread_A(void *td);
map<int, float> ResultPool;
map<int, pthread_t> ThreadIdPool;
pthread_mutex_t mutex2 = PTHREAD_MUTEX_INITIALIZER;
pthread_t thread_id[10];
void FunctionA(int tnum, float msrFrequency, bool enablePlots)
{
//Pass the value to the variables.
int ifBin;
int RAT;
/*
Some calculation here and the results are assigned to ifBin and RAT
*/
structThreadDt *td;
td =(structThreadDt *)malloc(sizeof(structThreadDt));
td->td_tnum = tnum;
td->td_Freq = msrFrequency;
td->td_enablePlots = enablePlots;
td->td_ifBin = ifBin;
td->td_RAT = RAT;
threadCnt = threadCnt+1;
pthread_create(&thread_id[threadCnt], NULL, thread_A, (void*) td);
//Store the thread id to be check for the status later.
ThreadIdPool[tnum]=thread_id[threadCnt];
}
void* thread_A(void* td)
{
int ifBin;
int RAT;
bool enablePlots;
float msrFrequency;
int tnum;
structThreadDt *tds;
tds=(structThreadDt*)td;
enablePlots = tds->td_enablePlots;
msrFrequency = tds->td_Freq;
tnum = tds->td_tnum;
ifBin = tds->td_ifBin ;
RAT = tds->td_RAT;
/*
Do some calculation here with those ifBIN, RAT, TNUM and frequency.
*/
//Store the result to shared variable with mutex lock
pthread_mutex_lock( &mutex2 );
ResultPool[tnum] = results;
pthread_mutex_unlock( &mutex2 );
free(tds);
return NULL;
}
And here is the threadId status checking. It will first iterate the ThreadIdPool to retrieve the threadID and check the completion of the thread. If the thread is completed, it will output the result. The pthread_join execution will sometimes crash my application.
void StatusCheck()
{
int tnum;
pthread_t threadiD;
map<int, pthread_t>::iterator itr;
float res;
int ret;
//Just to make sure it has been done
for (itr = ThreadIdPool.begin(); itr != ThreadIdPool.end(); ++itr) {
tnum = itr->first;
threadiD = itr->second;
//Check if the thread is completed before get the results.
ret=pthread_join(threadiD, NULL);
if (ret!=0)
{
cout<<"Tnum="<<tnum<<":Error in joining thread."<<endl;
}
res = ResultPool[tnum];
cout<<"Results="<<res<<endl;
}
}

This will be a global answer :
First of all, your code is 99% C and 1% C++. I don't know why, but if you want to write C++ write C++, not C like code. Or do C, that can be what you need.
For example, you are using ton of global function, static array, raw pointers etc. Replace them with classes and methods, std::array, smart_pointers etc. The STL is here to be used. You can write a class to wrap your pthread object and, instead of having free functions, use a constructor. If smart pointers are not available, replace your mallocs / free with (at least) new and delete. By the way, NULL as its equivalent in nullptr in C++.
Secondly, DO NOT USE GLOBAL VARIABLES. It is not necessary in 99.99% of the case as it can be variable declared then passed as pointers / references to functions.
For what can crash your program there are several things to test :
Are you variables correctly initialized ?
You said that threadCount is initialized with -1. Why ? Since it is a count it should has start at 0, or maybe it is an index and not a count.
If you can, give use more informations :
Where are these functions used, and how, by who ?
What is you compiler and which version are you using ?
What is the C++ version you are using ?
What is the goal of this projet ? Maybe there are better ways of doing it.

One problem I see is that when you collect the data there is an access to result_pool without a lock. One of the threads that is still running could be accessing result_pool adding more keys to it at the same time you're accessing it to collect the data.

Related

Reset thread-local variables in OpenMP

I need a consistent way of resetting all thread-local variables my program creates. The problem lies in that the thread-local data is created in places different from where they are used.
My program outline is the following:
struct data_t { /* ... */ };
// 1. Function that fetches the "global" thread-local data
data_t& GetData()
{
static data_t *d = NULL;
#pragma omp threadprivate(d); // !!!
if (!d) { d = new data_t(); }
return *d;
}
// 2 example function that uses the data
void user(int *elements, int num, int *output)
{
#pragma omp parallel for shared(elements, output) if (num > 1000)
for (int i = 0; i < num; ++i)
{
// computation is a heavy calculation, on memoized data
computation(GetData());
}
}
Now, my problem is I need a function that resets data, i.e. every thread-local object created must be accounted for.
For now, my solution, is to use a parallel region, that hopefully uses equal or more threads than the "parallel for" so every object is "iterated" through:
void ClearThreadLocalData()
{
#pragma omp parallel
{
// assuming data_t has a "clear()" method
GetData().clear();
}
}
Is there a more idiomatic / safe way to implement ClearThreadLocalData() ?
You can create and use a global version number for your data. Increment it every time you need to clear the existing caches. Then modify GetData to check the version number if there is an existing data object, discarding the existing one and creating a new one if it is out of date. (The version number for the allocated data_t object can be stored within data_t if you can modify the class, or in a second thread local variable if not.) You'd end up with something like
static int dataVersion;
data_t& GetData()
{
static data_t *d = NULL;
#pragma omp threadprivate(d); // !!!
if (d && d->myDataVersion != dataVersion) {
delete d;
d = nullptr;
}
if (!d) {
d = new data_t();
d->myDataVersion = dataVersion;
}
return *d;
}
This doesn't depend on the existence of a Clear method in data_t, but if you have one replace the delete-and-reset with a call to Clear. I'm using d = nullptr to avoid duplicating the call to new data_t().
The global dataVersion could be a static member of data_t if you want to avoid the global variable, and it can be atomic if necessary although GetData would need changes to handle that.
When it comes time to reset the data, just change the global version number:
++dataVersion;

Multithreading with member functions and constructor carrying argument(s)

I have a situation in which i need to instantiate a vector of boost::threads to solve the following:
I have a class called Instrument to hold Symbol information, which looks something like below:
class Instrument
{
public:
Instrument(StringVector symbols, int i);
virtual ~Instrument();
const Instrument& operator= (const Instrument& inst)
{
return *this;
}
String GetSymbol() { return Symbol_; }
LongToSymbolInfoPairVector GetTS() { return TS_; }
bool OrganiseData(TimeToSymbolsInfoPairVector& input, int i);
static int getRandomNumber(const int low, const int high);
static double getProbability();
bool ConstructNewTimeSeries(const int low, const int high);
bool ReconstructTimeSeries(TimeToSymbolsInfoPairVector& reconstructeddata, int i);
private:
LongToSymbolInfoPairVector TS_;
String Symbol_;
const int checkWindow_;
String start_, end_;
long numberofsecsinaday_;
static std::default_random_engine generator_;
};
This class will have as many objects as the number of symbols. These symbols shall be accessed in another class Analysis for further work, whose constructor accepts the vector of the above Instrument class, as shown below.
class Analysis
{
public:
Analysis(std::vector<Instrument>::iterator start, std::vector<Instrument>::iterator end);
virtual ~Analysis();
bool buildNewTimeSeries(TimeToSymbolsInfoPairVector& reconstructeddata);
bool printData(TimeToSymbolsInfoPairVector& reconstructeddata);
private:
std::vector<Instrument> Instruments_;
};
Now i want to multithread this process so that i can separate out say 7 symbols per thread and spawn out, say, 4 threads.
Following is the updated main.
std::vector<Instrument>::iterator block_start = Instruments.begin();
int first = 0, last = 0;
for (unsigned long i=0; i<MAX_THREADS; i++)
{
std::vector<Instrument>::iterator block_end = block_start;
std::advance(block_end, block_size);
last = (i+1)*block_size;
Analysis* analyzed = new Analysis(block_start, block_end /*first, last*/);
analyzed->setData(output, first, last);
threads.push_back(std::thread(std::bind(&Analysis::buildNewTimeSeries, std::ref(*analyzed))));
block_start = block_end;
first = last;
}
for (int i=0; i<MAX_THREADS; i++)
{
(threads[i]).join();
}
This is evidently incorrect, although i know how to instantiate a thread's constructor to pass a class constructor an argument or a member function an argument, but i seem to be facing an issue when my purpose is:
a) Pass the constructor of class Analysis a subset of vector and
b) Call the buildNewTimeSeries(TimeToSymbolsInfoPairVector& reconstructeddata)
for each of the 4 threads and then later on join them.
Can anyone suggest a neat way of doing this please ?
The best way to go about partitioning a vector of resources (like std::vector in ur case) on to limited number of threads is by using a multi-threaded design paradigm called threadpools. There is no standard thread-pool in c++ and hence you might have to build one yourself(or use open source libraries). You can have a look at one of the many good opensource implementations here:- https://github.com/progschj/ThreadPool
Now, I am not going to be using threadpools, but will just give you a couple of suggestions to help u fix ur problem without modifying ur core functionality/idea.
In main you are dynamically creating vectors using new and are passing on the reference of the vector by dereferencing the pointer. Analysis* analyzed = new. I understand that your idea here, is to use the same vector analysis* in both main and the thread function. In my opinion this is not a good design. There is a better way to do it.
Instead of using std::thread use std::async. std::async creates tasks as opposed to threads. There are numerous advantages using tasks by using async. I do not want to make this a long answer by describing thread/tasks. But, one main advantage of tasks which directly helps you in your case is that it lets you return values(called future) from tasks back to the main function.
No to rewrite your main function async, tweak your code as follows,
Do not dynamically create a vector using new, instead just create a
local vector and just move the vector using std::move to the task
while calling async.
Modify Analysis::buildNewTimeSeries to accept rvalue reference.
Write a constructor for analysis with rvalue vector
The task will then modify this vector locally and then
return this vector to main function.
while calling async store the return value of the async
calls in a vector < future < objectType > >
After launching all the tasks using async, you can call the .get() on each of the element of this future vector.
This .get() method will return the vector modified and returned from
thread.
Merge these returned vectors into the final result vector.
By moving the vector from main to thread and then returning it back, you are allowing only one owner to have exclusive access on the vector. So you can not access a vector from main after it gets moved to thread. This is in contrast to your implementation, where both the main function and the thread function can access the newly created vector that gets passed by reference to the thread.

An attempt to create atomic reference counting is failing with deadlock. Is this the right approach?

So I'm attempting to create copy-on-write map that uses an attempt at atomic reference counting on the read-side to not have locking.
Something isn't quite right. I see some references getting over-incremented and some are going down negative, so something isn't really atomic. In my tests I have 10 reader threads looping 100 times each doing a get() and 1 writer thread doing 100 writes.
It gets stuck in the writer because some of the references never go down to zero, even though they should.
I'm attempting to use the 128-bit DCAS technique laid explained by this blog.
Is there something blatantly wrong with this or is there an easier way to debugging this rather than playing with it in the debugger?
typedef std::unordered_map<std::string, std::string> StringMap;
static const int zero = 0; //provides an l-value for asm code
class NonBlockingReadMapCAS {
public:
class OctaWordMapWrapper {
public:
StringMap* fStringMap;
//std::atomic<int> fCounter;
int64_t fCounter;
OctaWordMapWrapper(OctaWordMapWrapper* copy) : fStringMap(new StringMap(*copy->fStringMap)), fCounter(0) { }
OctaWordMapWrapper() : fStringMap(new StringMap), fCounter(0) { }
~OctaWordMapWrapper() {
delete fStringMap;
}
/**
* Does a compare and swap on an octa-word - in this case, our two adjacent class members fStringMap
* pointer and fCounter.
*/
static bool inline doubleCAS(OctaWordMapWrapper* target, StringMap* compareMap, int64_t compareCounter, StringMap* swapMap, int64_t swapCounter ) {
bool cas_result;
__asm__ __volatile__
(
"lock cmpxchg16b %0;" // cmpxchg16b sets ZF on success
"setz %3;" // if ZF set, set cas_result to 1
: "+m" (*target),
"+a" (compareMap), //compare target's stringmap pointer to compareMap
"+d" (compareCounter), //compare target's counter to compareCounter
"=q" (cas_result) //results
: "b" (swapMap), //swap target's stringmap pointer with swapMap
"c" (swapCounter) //swap target's counter with swapCounter
: "cc", "memory"
);
return cas_result;
}
OctaWordMapWrapper* atomicIncrementAndGetPointer()
{
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter +1))
return this;
else
return NULL;
}
OctaWordMapWrapper* atomicDecrement()
{
while(true) {
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter -1))
break;
}
return this;
}
bool atomicSwapWhenNotReferenced(StringMap* newMap)
{
return doubleCAS(this, this->fStringMap, zero, newMap, 0);
}
}
__attribute__((aligned(16)));
std::atomic<OctaWordMapWrapper*> fReadMapReference;
pthread_mutex_t fMutex;
NonBlockingReadMapCAS() {
fReadMapReference = new OctaWordMapWrapper();
}
~NonBlockingReadMapCAS() {
delete fReadMapReference;
}
bool contains(const char* key) {
std::string keyStr(key);
return contains(keyStr);
}
bool contains(std::string &key) {
OctaWordMapWrapper *map;
do {
map = fReadMapReference.load()->atomicIncrementAndGetPointer();
} while (!map);
bool result = map->fStringMap->count(key) != 0;
map->atomicDecrement();
return result;
}
std::string get(const char* key) {
std::string keyStr(key);
return get(keyStr);
}
std::string get(std::string &key) {
OctaWordMapWrapper *map;
do {
map = fReadMapReference.load()->atomicIncrementAndGetPointer();
} while (!map);
//std::cout << "inc " << map->fStringMap << " cnt " << map->fCounter << "\n";
std::string value = map->fStringMap->at(key);
map->atomicDecrement();
return value;
}
void put(const char* key, const char* value) {
std::string keyStr(key);
std::string valueStr(value);
put(keyStr, valueStr);
}
void put(std::string &key, std::string &value) {
pthread_mutex_lock(&fMutex);
OctaWordMapWrapper *oldWrapper = fReadMapReference;
OctaWordMapWrapper *newWrapper = new OctaWordMapWrapper(oldWrapper);
std::pair<std::string, std::string> kvPair(key, value);
newWrapper->fStringMap->insert(kvPair);
fReadMapReference.store(newWrapper);
std::cout << oldWrapper->fCounter << "\n";
while (oldWrapper->fCounter > 0);
delete oldWrapper;
pthread_mutex_unlock(&fMutex);
}
void clear() {
pthread_mutex_lock(&fMutex);
OctaWordMapWrapper *oldWrapper = fReadMapReference;
OctaWordMapWrapper *newWrapper = new OctaWordMapWrapper(oldWrapper);
fReadMapReference.store(newWrapper);
while (oldWrapper->fCounter > 0);
delete oldWrapper;
pthread_mutex_unlock(&fMutex);
}
};
Maybe not the answer but this looks suspicious to me:
while (oldWrapper->fCounter > 0);
delete oldWrapper;
You could have a reader thread just entering atomicIncrementAndGetPointer() when the counter is 0 thus pulling the rug underneath the reader thread by deleting the wrapper.
Edit to sum up the comments below for potential solution:
The best implementation I'm aware of is to move fCounter from OctaWordMapWrapper to fReadMapReference (You don't need the OctaWordMapWrapper class at all actually). When the counter is zero swap the pointer in your writer. Because you can have high contention of reader threads which essentially blocks the writer indefinitely you can have highest bit of fCounter allocated for reader lock, i.e. while this bit is set the readers spin until the bit is cleared. The writer sets this bit (__sync_fetch_and_or()) when it's about to change the pointer, waits for the counter to fall down to zero (i.e. existing readers finish their work) and then swap the pointer and clears the bit.
This approach should be waterproof, though it's obviously blocking readers upon writes. I don't know if this is acceptable in your situation and ideally you would like this to be non-blocking.
The code would look something like this (not tested!):
class NonBlockingReadMapCAS
{
public:
NonBlockingReadMapCAS() :m_ptr(0), m_counter(0) {}
private:
StringMap *acquire_read()
{
while(1)
{
uint32_t counter=atom_inc(m_counter);
if(!(counter&0x80000000))
return m_ptr;
atom_dec(m_counter);
while(m_counter&0x80000000);
}
return 0;
}
void release_read()
{
atom_dec(m_counter);
}
void acquire_write()
{
uint32_t counter=atom_or(m_counter, 0x80000000);
assert(!(counter&0x80000000));
while(m_counter&0x7fffffff);
}
void release_write()
{
atom_and(m_counter, uint32_t(0x7fffffff));
}
StringMap *volatile m_ptr;
volatile uint32_t m_counter;
};
Just call acquire/release_read/write() before & after accessing the pointer for read/write. Replace atom_inc/dec/or/and() with __sync_fetch_and_add(), __sync_fetch_and_sub(), __sync_fetch_and_or() and __sync_fetch_and_and() respectively. You don't need doubleCAS() for this actually.
As noted correctly by #Quuxplusone in a comment below this is single producer & multiple consumer implementation. I modified the code to assert properly to enforce this.
Well, there are probably lots of problems, but here are the obvious two.
The most trivial bug is in atomicIncrementAndGetPointer. You wrote:
if (doubleCAS(this, this->fStringMap, this->fCounter, this->fStringMap, this->fCounter +1))
That is, you're attempting to increment this->fCounter in a lock-free way. But it doesn't work, because you're fetching the old value twice with no guarantee that the same value is read each time. Consider the following sequence of events:
Thread A fetches this->fCounter (with value 0) and computes argument 5 as this->fCounter +1 = 1.
Thread B successfully increments the counter.
Thread A fetches this->fCounter (with value 1) and computes argument 3 as this->fCounter = 1.
Thread A executes doubleCAS(this, this->fStringMap, 1, this->fStringMap, 1). It succeeds, of course, but we've lost the "increment" we were trying to do.
What you wanted is more like
StringMap* oldMap = this->fStringMap;
int64_t oldCounter = this->fCounter;
if (doubleCAS(this, oldMap, oldValue, oldMap, oldValue+1))
...
The other obvious problem is that there's a data race between get and put. Consider the following sequence of events:
Thread A begins to execute get: it fetches fReadMapReference.load() and prepares to execute atomicIncrementAndGetPointer on that memory address.
Thread B finishes executing put: it deletes that memory address. (It is within its rights to do so, because the wrapper's reference count is still at zero.)
Thread A starts executing atomicIncrementAndGetPointer on the deleted memory address. If you're lucky, you segfault, but of course in practice you probably won't.
As explained in the blog post:
The garbage collection interface is omitted, but in real applications you would need to scan the hazard pointers before deleting a node.
Another user has suggested a similar approach, but if you are compiling with gcc (and perhaps with clang), you could use the intrinsic __sync_add_and_fetch_4 which does something similar to what your assembly code does, and is likely much more portable.
I have used it when I implemented refcounting in an Ada library (but the algorithm remains the same).
int __sync_add_and_fetch_4 (int* ptr, int value);
// increments the value pointed to by ptr by value, and returns the new value
Although I'm not sure how your reader threads work, I suspect your problem is that you are not catching and handling possible out_of_range exceptions in your get() method that might arise from this line: std::string value = map->fStringMap->at(key);. Note that if key is not found in the map, this will throw, and exit the function without decrementing the counter, which would lead to the condition you describe (of getting stuck in the while-loop within the writer thread while waiting for the counters to decrement).
In any event, whether this is the cause of the issues you're seeing or not, you definitely need to either handle this exception (and any others) or modify your code such that there's no risk of a throw. For the at() method, I would probably just use find() instead, and then check the iterator it returns. However, more generally, I would suggest using the RAII pattern to ensure that you don't let any unexpected exceptions escape without unlocking/decrementing. For example, you might check out boost::scoped_lock to wrap your fMutex and then write something simple like this for the OctaWordMapWrapper increment/decrement:
class ScopedAtomicMapReader
{
public:
explicit ScopedAtomicMapReader(std::atomic<OctaWordMapWrapper*>& map) : fMap(NULL) {
do {
fMap = map.load()->atomicIncrementAndGetPointer();
} while (NULL == fMap);
}
~ScopedAtomicMapReader() {
if (NULL != fMap)
fMap->atomicDecrement();
}
OctaWordMapWrapper* map(void) {
return fMap;
}
private:
OctaWordMapWrapper* fMap;
}; // class ScopedAtomicMapReader
With something like that, then for example, your contains() and get() methods would simplify to (and be immune to exceptions):
bool contains(std::string &key) {
ScopedAtomicMapReader mapWrapper(fReadMapReference);
return (mapWrapper.map()->fStringMap->count(key) != 0);
}
std::string get(std::string &key) {
ScopedAtomicMapReader mapWrapper(fReadMapReference);
return mapWrapper.map()->fStringMap->at(key); // Now it's fine if this throws...
}
Finally, although I don't think you should have to do this, you might also try declaring fCounter as volatile as well (given your access to it in the while-loop in the put() method will be on a different thread than the writes to it on the reader threads.
Hope this helps!
By the way, one other minor thing: fReadMapReference is leaking. I think you should delete this in your destructor.

Multithreaded matrix multiplication in C++

I've been having trouble with this parallel matrix multiplication code, I keep getting an error when trying to access a data member in my structure.
This is my main function:
struct arg_struct
{
int* arg1;
int* arg2;
int arg3;
int* arg4;
};
int main()
{
pthread_t allthreads[4];
int A [N*N];
int B [N*N];
int C [N*N];
randomMatrix(A);
randomMatrix(B);
printMatrix(A);
printMatrix(B);
struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct));
args.arg1 = A;
args.arg2 = B;
int x;
for (int i = 0; i < 4; i++)
{
args.arg3 = i;
args.arg4 = C;
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args);
if(x!=0)
exit(1);
}
return 0;
}
and the matrixMultiplication method used from another C file:
void *matrixMultiplication(void* arguments)
{
struct arg_struct* args = (struct arg_struct*) arguments;
int block = args.arg3;
int* A = args.arg1;
int* B = args.arg2;
int* C = args->arg4;
free(args);
int startln = getStartLineFromBlock(block);
int startcol = getStartColumnFromBlock(block);
for (int i = startln; i < startln+(N/2); i++)
{
for (int j = startcol; j < startcol+(N/2); j++)
{
setMatrixValue(C,0,i,j);
for(int k = 0; k < N; k++)
{
C[i*N+j] += (getMatrixValue(A,i,k) * getMatrixValue(B,k,j));
usleep(1);
}
}
}
}
Another error I am getting is when creating the thread: "invalid conversion from ‘void ()(int, int*, int, int*)’ to ‘void* ()(void)’ [-fpermissive]
"
Can anyone please tell me what I'm doing wrong?
First you mix C and C++ very badly, either use plain C or use C++, in C++ you can simply use new and delete.
But the reason of your error is you allocate arg_struct in one place and free it in 4 threads. You should allocate one arg_struct for each thread
Big Boss is right in the sense that he has identified the problem, but to add to/augment the reply he made.
Option 1:
Just create an arg_struct in the loop and set the members, then pass it through:
for(...)
{
struct arg_struct *args = (arg_struct*)malloc(sizeof(struct arg_struct));
args->arg1 = A;
args->arg2 = B; //set up args as now...
...
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)args);
....
}
keep the free call in the thread, but now you could then use the passed struct directly rather than creating locals in your thread.
Option 2:
It looks like you want to copy the params from the struct internally to the thread anyway so you don't need to dynamically allocate.
Just create an arg_struct and set the members, then pass it through:
arg_struct args;
//set up args as now...
for(...)
{
...
x = pthread_create(&allthreads[i], NULL, &matrixMultiplication, (void*)&args);
}
Then remove the free call.
However as James pointed out you would need to synchronize in the thread/parent on the structure to make sure that it wasn't changed. That would mean a Mutex or some other mechanism. So probably the move of the allocation to the for loop is easier to begin with.
Part 2:
I'm working on windows (so I can't experiment currently), but pthread_create param 3 is referring to the thread function matrixMultiplication which is defined as void* matrixMultiplication( void* ); - it looks correct to me (signature wise) from the man pages online, void* fn (void* )
I think I'll have to defer to someone else on your second error. Made this post a comunnity wiki entry so answer can be put into this if desired.
It's not clear to me what you are trying to do. You start some threads,
then you return from main (exiting the process) before getting any
results from them.
In this case, I'ld probably not use any dynamic allocation, directly.
(I would use std::vector for the matrices, which would use dynamic
allocation internally.) There's no reason to dynamically allocate the
arg_struct, since it can safely be copied. Of course, you'll have to
wait until each thread has successfully extracted its data before
looping to construct the next thread. This would normally be done using
a conditional: the new thread would unblock the conditional once it has
extracted the arguments from the arg_struct (or even better, you could
use boost::thread, which does this part for you). Alternatively, you
could use an array of arg_struct, but there is absolutely no reason to
allocate them dynamically. (If for some reason you cannot use
std::vector for A, B and C, you will want to allocate these
dynamically, in order to avoid any risk of stack overflow. But
std::vector is a much better solution.)
Finally, of course, you must wait for all of the threads to finish
before leaving main. Otherwise, the threads will continue working on
data that doesn't exist any more. In this case, you should
pthread_join all of the threads before exiting main. Presumably,
too, you want to do something with the results of the multiplication,
but in any case, exiting main before all of the threads have finished
accessing the matrices will cause undefined behavior.

Boost, Shared Memory and Vectors

I need to share a stack of strings between processes (possibly more complex objects in the future). I've decided to use boost::interprocess but I can't get it to work. I'm sure it's because I'm not understanding something. I followed their example, but I would really appreciate it if someone with experience with using that library can have a look at my code and tell me what's wrong. The problem is it seems to work but after a few iterations I get all kinds of exceptions both on the reader process and sometimes on the writer process. Here's a simplified version of my implementation:
using namespace boost::interprocess;
class SharedMemoryWrapper
{
public:
SharedMemoryWrapper(const std::string & name, bool server) :
m_name(name),
m_server(server)
{
if (server)
{
named_mutex::remove("named_mutex");
shared_memory_object::remove(m_name.c_str());
m_segment = new managed_shared_memory (create_only,name.c_str(),65536);
m_stackAllocator = new StringStackAllocator(m_segment->get_segment_manager());
m_stack = m_segment->construct<StringStack>("MyStack")(*m_stackAllocator);
}
else
{
m_segment = new managed_shared_memory(open_only ,name.c_str());
m_stack = m_segment->find<StringStack>("MyStack").first;
}
m_mutex = new named_mutex(open_or_create, "named_mutex");
}
~SharedMemoryWrapper()
{
if (m_server)
{
named_mutex::remove("named_mutex");
m_segment->destroy<StringStack>("MyStack");
delete m_stackAllocator;
shared_memory_object::remove(m_name.c_str());
}
delete m_mutex;
delete m_segment;
}
void push(const std::string & in)
{
scoped_lock<named_mutex> lock(*m_mutex);
boost::interprocess::string inStr(in.c_str());
m_stack->push_back(inStr);
}
std::string pop()
{
scoped_lock<named_mutex> lock(*m_mutex);
std::string result = "";
if (m_stack->size() > 0)
{
result = std::string(m_stack->begin()->c_str());
m_stack->erase(m_stack->begin());
}
return result;
}
private:
typedef boost::interprocess::allocator<boost::interprocess::string, boost::interprocess::managed_shared_memory::segment_manager> StringStackAllocator;
typedef boost::interprocess::vector<boost::interprocess::string, StringStackAllocator> StringStack;
bool m_server;
std::string m_name;
boost::interprocess::managed_shared_memory * m_segment;
StringStackAllocator * m_stackAllocator;
StringStack * m_stack;
boost::interprocess::named_mutex * m_mutex;
};
EDIT Edited to use named_mutex. Original code was using interprocess_mutex which is incorrect, but that wasn't the problem.
EDIT2 I should also note that things work up to a point. The writer process can push several small strings (or one very large string) before the reader breaks. The reader breaks in a way that the line m_stack->begin() does not refer to a valid string. It's garbage. And then further execution throws an exception.
EDIT3 I have modified the class to use boost::interprocess::string rather than std::string. Still the reader fails with invalid memory address. Here is the reader/writer
//reader process
SharedMemoryWrapper mem("MyMemory", true);
std::string myString;
int x = 5;
do
{
myString = mem.pop();
if (myString != "")
{
std::cout << myString << std::endl;
}
} while (1); //while (myString != "");
//writer
SharedMemoryWrapper mem("MyMemory", false);
for (int i = 0; i < 1000000000; i++)
{
std::stringstream ss;
ss << i; //causes failure after few thousand iterations
//ss << "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" << i; //causes immediate failure
mem.push(ss.str());
}
return 0;
There are several things that leaped out at me about your implementation. One was the use of a pointer to the named mutex object, whereas the documentation of most boost libraries tends to bend over backwards to not use a pointer. This leads me to ask for a reference to the program snippet you worked from in building your own test case, as I have had similar misadventures and sometimes the only way out was to go back to the exemplar and work forward one step at a time until I come across the breaking change.
The other thing that seems questionable is your allocation of a 65k block for shared memory, and then in your test code, looping to 1000000000, pushing a string onto your stack each iteration.
With a modern PC able to execute 1000 instructions per microsecond and more, and operating systems like Windows still doling out execution quanta in 15 millisecond. chunks, it won't take long to overflow that stack. That would be my first guess as to why things are haywire.
P.S.
I just returned from fixing my name to something resembling my actual identity. Then the irony hit that my answer to your question has been staring us both in the face from the upper left hand corner of the browser page! (That is, of course, presuming I was correct, which is so often not the case in this biz.)
Well maybe shared memory is not the right design for your problem to begin with. However we would not know, because we don't know what you try to achieve in the first place.