How to return a class object from threads or how to persist its state?
struct DataStructure
{
MapSmoother *m1;
std::vector<Vertex*> v1;
std::vector<Vertex *>::iterator vit;
DataStructure() {
m1 = NULL;
v1;
vit;
}
};
DWORD WINAPI thread_fun(void* p)
{
DataStructure *input = (DataStructure*)p;
for( ; (input->vit) != (input->v1).end(); ){
Vertex *v = *input->vit++;
(*(input->m1)).relax(v);
}
return 0;
}
main()
{
//Reading srcMesh
//All the vertices in srcMesh will be encoded with color
MapSmoother msmoother(srcMesh,dstMesh); //initial dstMesh will be created with no edge weights
DataStructure* input = new DataStructure; //struct datatype which holds msmoother object and vector "verList". I am passing this one to thread as a function argument
for(int color = 1; color <= 7 ; color++)
{
srcMesh.reportVertex(color,verList); //all the vertices in srcMesh with the same color index will be stored in verList datastructure(vector)
std::vector<Vertex *>::iterator vit = verList.begin();
input->vit = vit;
for(int i = 0; i < 100; i++)
HANDLE hThread[i] = createThread(0,0,&thread_fun,&input,0,NULL);
WaitForMultipleObjects(100,hThread,TRUE,INFINITE);
for(int i = 0; i < 100; i++)
CloseHandle(hThread[i]);
}
msmoother.computeEnergy(); // compute harmonic energy based on edge weights
}
In thread_fun, i am calling a method on msmoother object in order to update msmoother object with edge weights as well as dstMesh. dstMesh is updated perfectly with thread function. In order to perform computeEnergy on msmoother object, object should be returned to main thread or its state should be persisted. But it returns energy as '0'. How can i achieve this?
Memory is shared between threads, so all modification they make on shared data eventually become visible without any additional effort (to return or persist something).
Your problem is, apparently, that you don't wait for threads to complete before attempting to use data they should have prepared. As you already have an array of thread handles, WaitForMultipleObjects should be a convenient way to wait for all threads' completion (notice bWaitAll parameter). Note that WaitForMultipleObjects can't wait for more than 64 objects at once, so you need two calls if you have 100 threads.
If computeEnergy() requires all threads to have completed you can pass the handle to each thread to a WaitForMultipleObject which supports waiting for threads to complete. Within each thread you can add or modify a value within the msmoother object (as passed by pointer to thread_fun).
The msmoother object will live until the threads all return so passing a pointer to it is acceptable.
Related
Suppose I have some global:
std::atomic_int next_free_block;
and a number of threads each with access to a
std::atomic_int child_offset;
that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:
if (child_offset != 0) child_offset = next_free_block++;
Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.
I have also considered the following:
int expected = child_offset;
do {
if (expected == 0) break;
int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);
But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.
I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.
The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:
struct tree_page {
atomic<uint32_t> allocated;
uint32_t child_offset[8];
uint32_t nodes[1015];
};
The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.
The code I am trying to write looks like this:
int expected = allocated.load(relaxed), updated;
do {
updated = expected + num_children;
if (updated > NODES_PER_PAGE) {
expected = -1; break;
}
} while (!allocated.compare_exchange_weak(&expected, updated));
if (expected != -1) {
// successfully allocated in the same page
} else {
for (int i = 0; i < 8; ++i) {
// this is the operation I would like to be atomic
if (child_offset[i] == 0)
child_offset[i] = next_free_block++;
int offset = try_allocating_at_page(pages[child_offset[i]]);
if (offset != -1) {
// successfully allocated at child_offset i
// ...
break;
}
}
}
As far as I understood from you description you array of child_offset is filled with 0 initially and then filled with some concrete values concurrently by different threads.
In this case you can atomically "tag" value first and if you are successful assign valid value. Something like this:
constexpr int INVALID_VALUE = -1;
for (int i = 0; i < 8; ++i) {
int expected = 0;
// this is the operation I would like to be atomic
if (child_offset[i].compare_exchange_weak(expected, INVALID_VALUE)) {
child_offset[i] = next_free_block++;
}
// Not sure if this is needed in your environment, but just in case
if (child_offset[i] == INVALID_VALUE) continue;
...
}
This doesn't guarantee that all values in child_offset array will be in ascending order. But if you need that why not fill it without multithreading involved?
I have 8 datasets stored as pairs in a vector and decide to pass one by one into a class and do some job with a function inside that class. Segmentation fault is generated. Here is my code:
vector<thread> threads;
for (int i = 0; i < 8; i++) { // generate 8 threads
LogOdd CEB; // create LogOdd obj
CEB.set_data(coord[i].second, coord[i].first); // pass parameters to private members
threads.push_back(thread(&LogOdd::scan, &CEB));
for (int i = 0; i < 8; i++){
threads[i].join();
}
the class looks like:
class LogOdd {
private:
string sequence;
string chromosome;
public:
void scan() { // function to be threaded
...
}
void set_data(string SEQUENCE, string CHROMOSOME) { // set parameters
sequence = SEQUENCE;
chromosome = CHROMOSOME;
}
};
I'm pretty sure the segmentation fault generated in the first threading for loop but have no idea... I know this topic might be a duplicate but I have done a lot of search already. Please help!
UPDATE
Thanks for answering my question. I edited my code in 2 ways and they work!
vector<thread> threads;
for (int i = 0; i < 8; i++) { // generate 8 threads
LogOdd * CEB = new LogOdd; // create LogOdd obj
CEB->sequence = coord[i].second;
CEB->chromosome = coord[i].first;
threads.push_back(thread(&LogOdd::scan, CEB));
}
Another way I do is storing all 8 obj into a vector first and then assign to threads:
vector<thread> threads;
vector<LogOdd> LogOddvec;
for (int i = 0; i < 8; i++) {
LogOdd CEB;
CEB.sequence = coord[i].second;
CEB.chromosome = coord[i].first;
LogOddvec.push_back(CEB);
}
for (int i = 0; i < 8; i++) {
threads.push_back(thread(&LogOdd::scandinuc, &LogOddvec[i]));
}
Lets look at these lines:
for (int i = 0; i < 8; i++) { // generate 8 threads
LogOdd CEB; // create LogOdd obj
...
threads.push_back(thread(&LogOdd::scan, &CEB));
...
}
Inside the loop you define the variable CEB. You pass a pointer to this variable to the thread. Then the loop iterates, and CEB goes out of scope and is destructed.
That means the threads are passed a pointer to a destructed object. Dereferencing that pointer in the threads will lead to undefined behavior which is a very common caues of crashes like yours.
The simplest solution is to allocate the LogOdd object dynamically with new. A possibly better solution would be to pass CEB by value to the thread functions. Another solution would be to pass coord[i].second and coord[i].first as arguments to the thread function (or possibly a constant reference to coord[i]), and have the thread function create its own LogOdd object.
I believe the problem is in LogOdd CEB. You allocate it statically inside a block, so it gets destroyed by the end of the block, which is at the end of iteration in which it has been created.
You are then using a pointer to object that no longer exists, which ends up being undefined behaviour. Easiest solution would be to use new to allocate it dynamically.
I am using 3 threads to chunk a for loop, and the 'data' is a global array so I want to lock that part in 'calculateAll' function,
std::vector< int > calculateAll(int ***data,std::vector<LineIndex> indexList)
{
std::vector<int> v_a=std::vector<int>();
for(int a=0;a<indexList.size();a++)
{
mylock.lock();
v_b.push_back(/*something related with data*/);
mylock.unlock();
v_a.push_back(a);
}
return v_a;
}
for(int i=0;i<3;i++)
{
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
t[i]=std::thread(calculateAll,data,indexList,s,e);
}
for (int i = 0; i < 3; ++i)
{
t[i].join();
}
my question is, how can I get the return value which is a vector from each thread and then combine them together? The reason I want to do that is because if I declare ’v_a‘ as a global vector, when each thread trys to push_back their value in this vector 'v_a' it will be some crash(or not?).So I am thinking to declare a vector for each of the thread, then combine them into a new vector for further use( like I do it without thread ).
Or is there some better methods to deal with that concurrency problem? The order of 'v_a' does not matter.
I appreciate for any suggestion.
First of all, rather than the explicit lock() and unlock() seen in your code, always use std::lock_guard where possible. Secondly, it would be better you use std::futures to this thing. Launch each thread with std::async then get the results in another loop while aggregating the results at the same time. Like this:
using Vector = std::vector<int>;
using Future = std::future<Vector>;
std::vector<Future> futures;
for(int i=0;i<3;i++)
{
int s =firstone+i*chunk;
int e = ((s+chunk)<indexList.size())? (s+chunk) : indexList.size();
auto fut = std::async(std::launch::async, calculateAll, data, indexList, s, e);
futures.push_back( std::move(fut) );
}
//Combine the results
std::vector<int> result;
for(auto& fut : futures){ //Iterate through in the order the future was created
auto vec = fut.get(); //Get the result of the future
result.insert(result.end(), vec.begin(), vec.end()); //append it to results in order
}
Here is a minimal, complete and working example based on your code - which demonstrates what I mean: Live On Coliru
Create a structure with a vector and a lock. Pass an instance of that structure, with the lock pre-locked, to each thread as it starts. Wait for all locks to become unlocked. Each thread then does its work and unlocks its lock when done.
I have a problem. I need to use a struct of OpenCV Mat images for passing multiple arguments to a thread.
I have a struct like this:
struct Args
{
Mat in[6];
Mat out[6];
};
And a void function called by thread, like this:
void grey (void *param){
while (TRUE)
{
WaitForSingleObject(mutex,INFINITE);
Args* arg = (Args*)param;
cvtColor(*arg->in,*arg->out,CV_BGR2GRAY);
ReleaseMutex(mutex);
_endthread();
}
}
For launch the grey function as thread with two Mat array arguments, I use the follow line in main:
Args dati;
*dati.in = *inn;
*dati.out = *ou;
handle1 = (HANDLE) _beginthread(grey,0,&dati);
Now, my problem is: I need to access to all 6 elements of two array "in" and "out" in struct passed to thread from thread itself or however, find a mode to shift array from 0 to 5 to elaborate all elements with the "grey" functions.
How can I do this from thread or from main? I mean using grey function for elaborate all 6 elements of array Mat in[6] of struct Args that I pass to thread in that mode.
Can someone help me or gime me an idea? I don't know how do this.
Before you create the thread, you assign the array like this:
*dati.in = *inn;
*dati.out = *ou;
This will only assign the first entry in the array. The rest of the array will be untouched.
You need to copy all of the source array into the destination array. You can use std::copy for this:
std::copy(std::begin(dati.in), std::end(dati.in), std::begin(inn));
Of course, that requires that the source "array" inn contains at least as many items as the destination array.
Then in the thread simply loop over the items:
for (int i = 0; i < 6; i++)
{
cvtColor(arg->in[i], arg->out[i], CV_BGR2GRAY);
}
When you launch your thread, this code:
Args dati;
*dati.in = *inn;
*dati.out = *ou;
is only initialising one of the six elements. If inn and ou are actually 6 element arrays, you will need a loop to initialise all 6.
Args dati;
for (int i = 0; i < 6; i++) {
dati.in[i] = inn[i];
dati.out[i] = ou[i];
}
Similarly, in your thread, you're only processing the first element in the array. So this code:
Args* arg = (Args*)param;
cvtColor(*arg->in,*arg->out,CV_BGR2GRAY);
would need to become something like this:
Args* arg = (Args*)param;
for (int i = 0; i < 6; i++) {
cvtColor(arg->in[i],arg->out[0],CV_BGR2GRAY);
}
I have 2 threads and global Queue, one thread (t1) push the data and another one(t2) pops the data, I wanted to sync this operation without using function where we can use that queue with critical section using windows API.
The Queue is global, and I wanted to know how to sync, is it done by locking address of Queue?
Is it possible to use Boost Library for the above problem?
One approach is to have two queues instead of one:
The producer thread pushes items to queue A.
When the consumer thread wants to pop items, queue A is swapped with empty queue B.
The producer thread continues pushing items to the fresh queue A.
The consumer, uninterrupted, consumes items off queue B and empties it.
Queue A is swapped with queue B etc.
The only locking/blocking/synchronization happens when the queues are being swapped, which should be a fast operation since it's really a matter of swapping two pointers.
I thought you could make a queue with those conditions without using any atomics or any thread safe stuff at all?
like if its just a circle buffer, one thread controls the read pointer, and the other controls the write pointer. both don't update until they are finished reading or writing. and it just works?
the only point of difficulty comes with determining when read==write whether the queue is full or empty, but you can overcome this by just having one dummy item always in the queue
class Queue
{
volatile Object* buffer;
int size;
volatile int readpoint;
volatile int writepoint;
void Init(int s)
{
size = s;
buffer = new Object[s];
readpoint = 0;
writepoint = 1;
}
//thread A will call this
bool Push(Object p)
{
if(writepoint == readpoint)
return false;
int wp = writepoint - 1;
if(wp<0)
wp+=size;
buffer[wp] = p;
int newWritepoint = writepoint + 1;
if(newWritepoint==size)
newWritePoint = 0;
writepoint = newWritepoint;
return true;
}
// thread B will call this
bool Pop(Object* p)
{
writepointTest = writepoint;
if(writepointTest<readpoint)
writepointTest+=size;
if(readpoint+1 == writepoint)
return false;
*p = buffer[readpoint];
int newReadpoint = readpoint + 1;
if(newReadpoint==size)
newReadPoint = 0;
readpoint = newReadPoint;
return true;
}
};
Another way to handle this issue is to allocate your queue dynamically and assign it to a pointer. The pointer value is passed off between threads when items have to be dequeued, and you protect this operation with a critical section. This means locking for every push into the queue, but much less contention on the removal of items.
This works well when you have many items between enqueueing and dequeueing, and works less well with few items.
Example (I'm using some given RAII locking class to do the locking). Also note...really only safe when only one thread dequeueing.
queue* my_queue = 0;
queue* pDequeue = 0;
critical_section section;
void enqueue(stuff& item)
{
locker lock(section);
if (!my_queue)
{
my_queue = new queue;
}
my_queue->add(item);
}
item* dequeue()
{
if (!pDequeue)
{ //handoff for dequeue work
locker lock(section);
pDequeue = my_queue;
my_queue = 0;
}
if (pDequeue)
{
item* pItem = pDequeue->pop(); //remove item and return it.
if (!pItem)
{
delete pDequeue;
pDequeue = 0;
}
return pItem;
}
return 0;
}