I want to parallelize a loop (using tbb) which contains some expensive but vectorizable iterations (randomly spread). My idea was to buffer those and flush the buffer whenever it reaches the vector size. Such a buffer must be thread-local. For example,
// dummy for testing
void do_vectorized_work(size_t k, size_t*indices)
{}
// dummy for testing
bool requires_expensive_work(size_t k)
{ return (k&7)==0; }
struct buffer
{
size_t K=0, B[vector_size];
void load(size_t i)
{
B[K++]=i;
if(K==vector_size)
flush();
}
void flush()
{
do_vectorized_work(K,B);
K=0;
}
};
void do_work_in_parallel(size_t N)
{
tbb::enumerable_thread_specific<buffer> tl_buffer;
tbb::parallel_for(size_t(0),N,[&](size_t i)
{
if(requires_expensive_work(i))
tl_buffer.local().load(i);
});
}
However, this leaves the buffers non-empty, so I still have to flush each of them a final time
for(auto&b:tl_buffer)
b.flush();
but this is serial! Of course, I can also try to do this in parallel
using tl_range = typename tbb::enumerable_thread_specific<buffer>::range_type;
tbb::parallel_for(tl_buffer.range(),[](tl_range const&range)
{
for(auto r:range)
r->flush();
});
But I'm not sure this is efficient (since there are only as many buffers as there are threads). I was wondering whether it is possible to avoid this final flush after the event. I.e. is it possible to use tbb::tasks (replacing tbb::parallel_for) in such a way that each thread's final task is to flush its buffer?
No, a worker thread does not have complete information about whether this particular task is the last task of the given work or not (this is how work-stealing works). Thus, it is not possible to implement such a function on the level of parallel_for or the scheduler itself. Thus, I'd recommend you to go with these two approaches you describe.
There are two other things you can do about this though.
make it asynchronous. I.e. enqueue a task which will get everything flushed. It will help to remove this code from the hot path on the main thread. Just be careful if there are any dependencies which need to be set on completion of this task.
use tbb::task_scheduler_observer in order to initialize thread-specific data and release it lazily when threads get shut down or when there is no work remains for some time. The latter requires using local observer feature which is not yet officially supported but remains stable for few years already.
Example:
#define TBB_PREVIEW_LOCAL_OBSERVER 1
#include <tbb/tbb.h>
#include <assert.h>
typedef void * buffer_t;
const static int bufsz = 1024;
class thread_buffer_allocator: public tbb::task_scheduler_observer {
tbb::enumerable_thread_specific<buffer_t> _buf;
public:
thread_buffer_allocator( )
: tbb::task_scheduler_observer( /*local=*/ true ) {
observe(true); // activate the observer
}
~thread_buffer_allocator( ) {
observe(false); // deactivate the observer
for(auto &b : _buf) {
printf("destructor: cleared: %p\n", b);
free(b);
}
}
/*override*/ void on_scheduler_entry( bool worker ) {
assert(_buf.local() == nullptr);
_buf.local() = malloc(bufsz);
printf("on entry: %p\n", _buf.local());
}
/*override*/ void on_scheduler_exit( bool worker ) {
printf("on exit\n");
if(_buf.local()) {
printf("on exit: cleared %p\n", _buf.local());
free(_buf.local());
_buf.local() = nullptr;
}
}
};
int main() {
thread_buffer_allocator buffers_scope;
tbb::parallel_for(0, 1024*1024*1024, [&](auto i){
usleep(i%3);
});
return 0;
}
It occurred to me that this can be solved by reduction.
struct buffer
{
std::size_t K=0, B[vector_size];
void load(std::size_t i)
{
B[K++]=i;
if(K==vector_size) flush();
}
void flush()
{
do_vectorized_work(K,B);
K=0;
}
buffer(buffer const&, tbb::split)
{}
void operator()(tbb::block_range<std::size_t> const&range)
{ for(i:range) load(i); }
bool empty()
{ return K==0; }
std::size_t pop()
{ return K? B[--K] : 0; }
void join(buffer&rhs)
{ while(!rhs.empty()) load(rhs.pop()); }
};
void do_work_in_parallel(std::size_t N)
{
buffer buff;
tbb::parallel_reduce(tbb::block_range<std::size_t>(0,N,vector_size),buff);
if(!buff.empty())
buff.flush();
}
Related
I am asking this question from an Unreal Engine C++ code point of view but I am wondering if my problem is more to do with the nuances of C++'s way of operating.
I have a Unreal actor. A simple class that holds an array of my own structs and runs a timer which triggers my own function. This function passes a reference of the actors array to an asynchronous task.
This async task then goes to work, first creating a new struct, then adding two floats to its own internal TArray of floats and then adds that struct to the main actors array.
The problem:
After the async task has completed and I delete the actor from the level editor window, the system RAM is decreased as I call the Empty() function on the main actors array in the Destroyed() function but the RAM used by all of the structs (ie: The float array inside each struct) is left in memory and never cleared out.
Observations:
If I do not use an async task and run the same function inside the main thread ALL of the memory is cleared successfully.
If I do not create the struct inside the async task and instead initalize the array with a load of structs which in turn are initialized with N number of floats inside the main thread, then pass that as a reference to the async task which works on the data, then the memory is also cleared out successfully.
What I would like to happen
I would like to pass a reference of the main actors array of structs to the async task. The async task would then go to work creating the data. Once it is complete, the main actor would then have access to the data and when the actor is deleted in the level editor window, ALL of the memory would be freed.
The code:
The definition of the data struct I am using:
struct FMyDataStruct
{
TArray<float> ArrayOfFloats;
FMyDataStruct()
{
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
FMyDataStruct(int32 FloatCount)
{
ArrayOfFloats.Init(0.f, FloatCount);
}
~FMyDataStruct()
{
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
};
The main actors definition of the array I am using:
TArray<FMyDataStruct> MyMainArray;
The main actors custom function I am running:
//CODE 1: This part DOES empty the RAM when run (ie: Run on main thread)
/*for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyMainArray.Add(MyDataStruct);
}*/
//CODE 2: This does NOT empty the RAM when run. The two floats * 50,000,000 are left in system memory after the actor is deleted.
auto Result = Async(EAsyncExecution::Thread, [&]()
{
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyDataStruct.ArrayOfFloats.Add(FMath::Rand());
MyMainArray.Add(MyDataStruct);
}
});
An example of initializing the array in the main thread, then working on it inside the async task:
//Initialize the array and its structs (plus the float array inside the struct)
MyMainArray.Init(FMyDataStruct(2), 50000000);
//TFuture/Async task
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
for (int32 Index = 0; Index < 50000000; Index++)
{
Self->MyMainArray[Index].ArrayOfFloats[0] = FMath::Rand();
Self->MyMainArray[Index].ArrayOfFloats[1] = FMath::Rand();
}
//Call the main threads task completed function
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->MyTaskComplete();
}
});
});
Final thoughts:
Ultimately what I am asking is can anyone explain to me why from a C++ point of view the structs and their data would be removed from memory successfully when created/added from the main thread but then not removed from memory if created inside the async task/thread?
Update #1:
Here is a minimum reproducible example:
Create a new project in either Unreal Engine 4.23, 4.24 or 4.25.
Add a new C++ actor to the project and name it "MyActor".
Edit the source with the following:
MyActor.h
#pragma once
#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "MyActor.generated.h"
struct FMyDataStruct
{
FMyDataStruct()
{
//Default Constructor
}
FMyDataStruct(const FMyDataStruct& other)
: ArrayOfFloats(other.ArrayOfFloats)
{
//Copy constructor
}
FMyDataStruct(FMyDataStruct&& other)
{
//Move constructor
if (this != &other)
{
ArrayOfFloats = MoveTemp(other.ArrayOfFloats);
}
}
FMyDataStruct& operator=(const FMyDataStruct& other)
{
//Copy assignment operator
if (this != &other) //avoid self assignment
{
ArrayOfFloats = other.ArrayOfFloats; //UE4 TArray deep copy
}
return *this;
}
FMyDataStruct& operator=(FMyDataStruct&& other)
{
//Move assignment operator
if (this != &other) //avoid self assignment
{
ArrayOfFloats = MoveTemp(other.ArrayOfFloats);
}
return *this;
}
FMyDataStruct(int32 FloatCount)
{
//Custom constructor to initialize the float array
if (FloatCount > 0)
{
ArrayOfFloats.Init(0.f, FloatCount);
}
}
~FMyDataStruct()
{
//Destructor
ArrayOfFloats.Empty();
ArrayOfFloats.Shrink();
}
public:
TArray<float> ArrayOfFloats;
};
UCLASS()
class BASICPROJECT1_API AMyActor : public AActor
{
GENERATED_BODY()
public:
AMyActor();
protected:
virtual void Destroyed() override;
public:
bool IsEditorOnly() const override;
bool ShouldTickIfViewportsOnly() const override;
virtual void Tick(float DeltaTime) override;
void DoSomething();
void AsyncTaskComplete();
bool bShouldCount = true;
float TimeCounter = 0.f;
TArray<FMyDataStruct> MyMainArray;
};
MyActor.cpp
#include "MyActor.h"
AMyActor::AMyActor()
{
PrimaryActorTick.bCanEverTick = true;
}
void AMyActor::Tick(float DeltaTime)
{
if (!HasAnyFlags(RF_ClassDefaultObject)) //Check for not CDO. We only want to run in the instance
{
if (bShouldCount)
{
TimeCounter += DeltaTime;
if (TimeCounter >= 5.f)
{
bShouldCount = false;
DoSomething();
}
}
}
}
void AMyActor::Destroyed()
{
Super::Destroyed();
MyMainArray.Empty();
MyMainArray.Shrink();
UE_LOG(LogTemp, Warning, TEXT("Actor got Destroyed!"));
}
bool AMyActor::IsEditorOnly() const
{
return true;
}
bool AMyActor::ShouldTickIfViewportsOnly() const
{
return true;
}
void AMyActor::DoSomething()
{
//Change the code that is run:
//1 = Main thread only
//2 = Async only
//3 = Init on main thread and process in async task
//======================
int32 CODE_SAMPLE = 1;
UE_LOG(LogTemp, Warning, TEXT("Actor is running DoSomething()"));
TWeakObjectPtr<AMyActor> Self = this;
if (CODE_SAMPLE == 1)
{
//CODE 1: Run on main thread. This part DOES empty the RAM when run. BLOCKS the editor window.
//=========================================================================
MyMainArray.Empty();
MyMainArray.Shrink();
MyMainArray.Reserve(50000000);
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Reserve(2);
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyMainArray.Emplace(MyDataStruct);
}
UE_LOG(LogTemp, Warning, TEXT("Main thread array fill is complete!"));
}
else if (CODE_SAMPLE == 2)
{
//CODE 2: Run on async task. This does NOT empty the RAM when run
//(4 bytes per float * 2 floats * 50,000,000 structs = 400Mb is left in system memory after the actor is deleted)
//=========================================================================
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
if (Self != nullptr)
{
Self->MyMainArray.Empty();
Self->MyMainArray.Shrink();
Self->MyMainArray.Reserve(50000000);
for (int32 Index = 0; Index < 50000000; Index++)
{
FMyDataStruct MyDataStruct;
MyDataStruct.ArrayOfFloats.Reserve(2);
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
MyDataStruct.ArrayOfFloats.Emplace(FMath::Rand());
Self->MyMainArray.Emplace(MyDataStruct);
}
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->AsyncTaskComplete();
}
});
}
});
}
else if (CODE_SAMPLE == 3)
{
//CODE 3: Initialize the array in the main thread and work on the data in the async task
//=========================================================================
MyMainArray.Init(FMyDataStruct(2), 50000000);
auto Result = Async(EAsyncExecution::Thread, [Self]()
{
if (Self != nullptr)
{
for (int32 Index = 0; Index < 50000000; Index++)
{
Self->MyMainArray[Index].ArrayOfFloats[0] = FMath::Rand();
Self->MyMainArray[Index].ArrayOfFloats[1] = FMath::Rand();
}
AsyncTask(ENamedThreads::GameThread, [Self]()
{
if (Self != nullptr)
{
Self->AsyncTaskComplete();
}
});
}
});
}
}
void AMyActor::AsyncTaskComplete()
{
UE_LOG(LogTemp, Warning, TEXT("Async task is complete!"));
}
Compile and run the project.
Drag the actor into the level editor window.
After 5 seconds the code will run and the RAM usage will increase to 1750Mb.
Select the actor in the outliner window and delete it.
The RAM usage will perform like this:
CODE 1: RAM is cleared out all the way to the starting RAM usage of 650Mb.
CODE 2: RAM is cleared down to 1000Mb and never returns to starting usage.
CODE 3: RAM is cleared out all the way to the starting RAM usage of 650Mb.
I thank you for your help.
I am having issues in a code having structure similar to the following minimum example. There is only one instance of MainClass. It makes new instance of Classlet on each call to its MainClass::makeclasslet()
I have multiple classlets writing to a single list buffer. After some time I need to copy/ dump the values from list buffer (FIFO).
The problem is that I am getting the following output in MainClass::clearbuffer()
>>>>>>>>>> 704 >>>>>>>>>>>>>>>>>>> Buffer size: 65363..... 1
I am unable to understand why the std::list::empty() returns true even when the buffer is locked with an atomic bool flag.
I have tried moving the call to clearbuffer() (in addval()) to the main application thread so that not each Classlet event calls clearbuffer().
I have also tried adding delay QThread::msleep(10); after setting busy = true;.
But some time after the application starts, I am getting the output shown above. Instead of popping all 65363+704 values in the list, it only popped 704 and broke the loop on list::isempty() being true (apparently).
class MainClass : public QObject {
Q_OBJECT
private:
std:: list<int> alist;
std::atomic<bool> busy;
MainClass() {
busy = false;
}
~MainClass() {
// delete all classlets
}
void makeclasslet() {
Classlet newclasslet = new Classlet();
// store the reference
}
void addval(int val) {
alist.push_back(val);
if (alist.size() > 100)
{
if (!busy)
{
clearbuffer();
}
}
}
void clearbuffer() {
if (!busy)
{
busy = true;
int i = 0;
while (!alist.empty())
{
i = i + 1;
// save alist.front() to file
alist.pop_front();
}
printf(">>>>>>>>>> %d >>>>>>>>>>> Buffer size: %d ..... %d\n", i, m_lstCSVBuffer.size(), m_lstCSVBuffer.empty());
busy = false;
}
}
}
class Classlet {
private:
Mainclass* parent;
void onsomeevent(int val) {
parent->addval(val);
}
}
I am using qt5.9 on Ubuntu 18.04. GCC/ G++ 7.5.0
The purpose of the following code is to have various classes publish data to an observable. Some classes will observe every data, some will observe periodically with buffer_with_time().
This works well until the program exits, then it crashes, probably because the observer using buffer_with_time() is still hanging on to some thread.
struct Data
{
Data() : _subscriber(_subject.get_subscriber()) { }
~Data() { _subscriber.on_completed(); }
void publish(std::string data) { _subscriber.on_next(data); }
rxcpp::observable<std::string> observable() { return _subject.get_observable(); }
private:
rxcpp::subjects::subject<std::string> _subject;
rxcpp::subscriber<std::string> _subscriber;
};
void foo()
{
Data data;
auto period = std::chrono::milliseconds(30);
auto s1 = data.observable()
.buffer_with_time(period , rxcpp::observe_on_new_thread())
.subscribe([](std::vector<std::string>& data)
{ std::cout << data.size() << std::endl; });
data.publish("test 1");
data.publish("test 2");
std::this_thread::sleep_for(std::chrono::milliseconds(100));
// hope to call something here so s1's thread can be joined.
// program crashes upon exit
}
I tried calling "s1.unsubscribe()", and various as_blocking(), from(), merge(), but still can't get the program to exit gracefully.
Note that I used "subjects" here because "publish" can then be called from different places (which can be from different threads). I am not sure if this is the best mechanism to do that, I am open to other ways to accomplish that.
Advice?
This is very close to working..
However, having the Data destructor complete the input while also wanting the subscription to block the exit of foo until input is completed makes this more complex.
Here is a way to ensure that foo blocks after Data destructs. This is using the existing Data contract.
void foo1()
{
rxcpp::observable<std::vector<std::string>> buffered;
{
Data data;
auto period = std::chrono::milliseconds(30);
buffered = data.observable()
.buffer_with_time(period , rxcpp::observe_on_new_thread())
.publish().ref_count();
buffered
.subscribe([](const std::vector<std::string>& data)
{ printf("%lu\n", data.size()); },
[](){printf("data complete\n");});
data.publish("test 1");
data.publish("test 2");
// hope to call something here so s1's thread can be joined.
// program crashes upon exit
}
buffered.as_blocking().subscribe();
printf("exit foo1\n");
}
Alternatively, the changing the shape of Data (add a complete method) would allow the following code:
struct Data
{
Data() : _subscriber(_subject.get_subscriber()) { }
~Data() { complete(); }
void publish(std::string data) { _subscriber.on_next(data); }
void complete() {_subscriber.on_completed();}
rxcpp::observable<std::string> observable() { return _subject.get_observable(); }
private:
rxcpp::subjects::subject<std::string> _subject;
rxcpp::subscriber<std::string> _subscriber;
};
void foo2()
{
printf("foo2\n");
Data data;
auto newthread = rxcpp::observe_on_new_thread();
auto period = std::chrono::milliseconds(30);
auto buffered = data.observable()
.buffer_with_time(period , newthread)
.tap([](const std::vector<std::string>& data)
{ printf("%lu\n", data.size()); },
[](){printf("data complete\n");});
auto emitter = rxcpp::sources::timer(std::chrono::milliseconds(0), newthread)
.tap([&](long) {
data.publish("test 1");
data.publish("test 2");
data.complete();
});
// hope to call something here so s1's thread can be joined.
// program crashes upon exit
buffered.combine_latest(newthread, emitter).as_blocking().subscribe();
printf("exit foo2\n");
}
I think that this better expresses the dependencies..
this code is make queues for the the operating system
I used structures to implement my processes
and used arr_processes to handle all of this processes
and new_processes array to sort this processes according to its arrival time
but when i run this code on visual studio 2010
it produces this run time error
Run-Time Check Failure #2 - Stack around the variable arr_processes was corrupted!
this is the code
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
int id;
int arr_time;
int serv_time;
int deadline;
} process;
void print_process(process n);
int main()
{
process arr_processes[8];
process new_processes[8];
process real_processes[3];
process ready_processes[5];
process tmp_process[1];
int length_ready;
int i,length,j;
int length_real;
arr_processes[0].id=1;
arr_processes[0].arr_time=12;
arr_processes[0].serv_time=4;
arr_processes[0].deadline=0;
arr_processes[1].id=2;
arr_processes[1].arr_time=10;
arr_processes[1].serv_time=5;
arr_processes[1].deadline=0;
arr_processes[2].id=3;
arr_processes[2].arr_time=9;
arr_processes[2].serv_time=2;
arr_processes[2].deadline=0;
arr_processes[3].id=4;
arr_processes[3].arr_time=8;
arr_processes[3].serv_time=4;
arr_processes[3].deadline=10;
arr_processes[4].id=5;
arr_processes[4].arr_time=5;
arr_processes[4].serv_time=2;
arr_processes[4].deadline=8;
arr_processes[5].id=6;
arr_processes[5].arr_time=3;
arr_processes[5].serv_time=3;
arr_processes[5].deadline=0;
arr_processes[6].id=7;
arr_processes[6].arr_time=2;
arr_processes[6].serv_time=3;
arr_processes[6].deadline=0;
arr_processes[7].id=8;
arr_processes[7].arr_time=1;
arr_processes[7].serv_time=1;
arr_processes[7].deadline=28;
length=sizeof(arr_processes)/sizeof(arr_processes[0]);
printf("\t length of the processes=%i\n\n",length);
printf("\t The Original processes \n\n");
for(i=0;i<8;i++)
print_process(arr_processes[i]);
// now we want to sort the processes according to their arrival time
for(i=0;i<8;i++)
{
new_processes[i]=arr_processes[i];
}
for(i=0;i<length;i++)
{
for(j=0;j<length-i;j++)
{
if((new_processes[j].arr_time)>(new_processes[j+1].arr_time))
{
tmp_process[0]=new_processes[j];
new_processes[j]=new_processes[j+1];
new_processes[j+1]=tmp_process[0];
}
}
}
printf("\t The New processes \n\n");
for(i=0;i<8;i++)
print_process(new_processes[i]); // the new queue
ready_processes[0]=arr_processes[0];
ready_processes[1]=arr_processes[1];
ready_processes[2]=arr_processes[2];
ready_processes[3]=arr_processes[5];
ready_processes[4]=arr_processes[6];
length_ready=sizeof(ready_processes)/sizeof(ready_processes[0]);
// now we want to design the ready queue
for(i=0;i<length_ready;i++)
{
for(j=0;j<length_ready-i;j++)
{
if((ready_processes[j].arr_time)>ready_processes[j+1].arr_time)
{
tmp_process[0]=ready_processes[j];
ready_processes[j]=ready_processes[j+1];
ready_processes[j+1]=tmp_process[0];
}
}
}
printf("\t The ready processes \n\n");
for(i=0;i<length_ready;i++)
print_process(ready_processes[i]); // the ready queue
// now we want to design the ready real queue for the shortest deadline first
// we donnot need to check for the new proesses at each instant of time
//but we need to check for the service time from now
real_processes[0]=arr_processes[3];
real_processes[1]=arr_processes[4];
real_processes[2]=arr_processes[7];
length_real=sizeof(real_processes)/sizeof(real_processes[0]);
for(i=0;i<length_real;i++)
{
for(j=0;j<length_real-i;j++)
{
if((real_processes[j].deadline)>real_processes[j+1].deadline)
{
tmp_process[0]=real_processes[j];
real_processes[j]=real_processes[j+1];
real_processes[j+1]=tmp_process[0];
}
}
}
printf("\t The real processes \n\n");
for(i=0;i<length_real;i++)
print_process(real_processes[i]); // the ready real queue
// removed real process
process removed_real;
removed_real.id=0;
removed_real.arr_time=0;
removed_real.serv_time=0;
removed_real.deadline=0;
process running_process;
running_process.id=0;
running_process.arr_time=0;
running_process.serv_time=0;
running_process.deadline=0;
int counter=0;
int start_time;
while(counter<=28)
{
printf("when time = %i\n\n",counter);
// printf("\t The real processes when the counter=%i \n\n",counter);
// for(i=0;i<length_real;i++)
// print_process(real_processes[i]); // the ready real queue
// first we must check for the real processes
for(i=0;i<length_real;i++)
{
if((counter==real_processes[i].arr_time)
&&((real_processes[i].deadline)-counter)>=(real_processes[i].serv_time))
{
running_process=real_processes[i];
printf("The non zero deadline process is:%i\n",running_process.id);
real_processes[i]=removed_real;
start_time=counter; // real process
while(counter!=(start_time+running_process.serv_time))
{
printf("At time = %i,The Running Process is...\n",counter);
print_process(running_process);
counter++;
}
}
}
counter++;
}
return 0;
}
void print_process(process n)
{
if(n.deadline!=0)
printf("ID=%i\narr_time=%i\nserv_time=%i\ndeadline=%i\n\n\n",n.id,n.arr_time,n.serv_time,n.deadline);
else if(n.deadline==0)
printf("ID=%i\narr_time=%i\nserv_time=%i\n\n\n",n.id,n.arr_time,n.serv_time);
}
As you run out of index, here is a sort example:
for(i=0; i<length - 1; i++)
{
for(j=i + 1;j<length;j++)
{
if((new_processes[j].arr_time)>(new_processes[i].arr_time))
{
tmp_process[0]=new_processes[j];
new_processes[j]=new_processes[i] ;
new_processes[i]=tmp_process[0] ;
}
}
}
Or, you can use standard function:
void qsort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));
Define a comparison function:
int compare_by_arr_time(const void* a, const void* b)
{
int a_int = ((const process*)a)->arr_time;
int b_int = ((const process*)b)->arr_time;
return a_int - b_int; // or b_int - a_int
}
And use it as follows:
qsort(new_processes,
sizeof(new_processes)/sizeof(new_processes[0]),
sizeof(new_processes[0]),
compare_by_arr_time);
You get these kinds of errors when you go out of the bounds of an array.
for(i=0;i<length;i++)
{
for(j=0;j<length-i;j++)
{
if((new_processes[j].arr_time)>(new_processes[j+1].arr_time))
{
tmp_process[0]=new_processes[j];
new_processes[j]=new_processes[j+1] ;
new_processes[j+1]=tmp_process[0] ;
}
}
}
In the first iteration, i = 0, j = 0 and j must be less than 8 - i, which is 8.
Notice the expression j+1. This expression will return values in the range of [1 ... 9] during the first iteration of the outer-most loop, and thus, you will be going out of the bounds your new_processes array.
There's your problem.
edit: This problem may also be present in the for loops that follow the first one.
My app consist of the main-process and two threads, all running concurrently and making use of three fifo-queues:
The fifo-q's are Qmain, Q1 and Q2. Internally the queues each use a counter that is incremented when an item is put into the queue, and decremented when an item is 'get'ed from the queue.
The processing involve two threads,
QMaster, which get from Q1 and Q2, and put into Qmain,
Monitor, which put into Q2,
and the main process, which get from Qmain and put into Q1.
The QMaster-thread loop consecutively checks the counts of Q1 and Q2 and if any items are in the q's, it get's them and puts them into Qmain.
The Monitor-thread loop obtains data from external sources, package it and put it into Q2.
The main-process of the app also runs a loop checking the count of Qmain, and if any items, get's an item
from Qmain at each iteration of the loop and process it further. During this processing it occasionally
puts an item into Q1 to be processed later (when it is get'ed from Qmain in turn).
The problem:
I've implemented all as described above, and it works for a randomly (short) time and then hangs.
I've managed to identify the source of the crashing to happen in the increment/decrement of the
count of a fifo-q (it may happen in any of them).
What I've tried:
Using three mutex's: QMAIN_LOCK, Q1_LOCK and Q2_LOCK, which I lock whenever any get/put operation
is done on a relevant fifo-q. Result: the app doesn't get going, just hangs.
The main-process must continue running all the time, must not be blocked on a 'read' (named-pipes fail, socketpair fail).
Any advice?
I think I'm not implementing the mutex's properly, how should it be done?
(Any comments on improving the above design also welcome)
[edit] below are the processes and the fifo-q-template:
Where & how in this should I place the mutex's to avoid the problems described above?
main-process:
...
start thread QMaster
start thread Monitor
...
while (!quit)
{
...
if (Qmain.count() > 0)
{
X = Qmain.get();
process(X)
delete X;
}
...
//at some random time:
Q2.put(Y);
...
}
Monitor:
{
while (1)
{
//obtain & package data
Q2.put(data)
}
}
QMaster:
{
while(1)
{
if (Q1.count() > 0)
Qmain.put(Q1.get());
if (Q2.count() > 0)
Qmain.put(Q2.get());
}
}
fifo_q:
template < class X* > class fifo_q
{
struct item
{
X* data;
item *next;
item() { data=NULL; next=NULL; }
}
item *head, *tail;
int count;
public:
fifo_q() { head=tail=NULL; count=0; }
~fifo_q() { clear(); /*deletes all items*/ }
void put(X x) { item i=new item(); (... adds to tail...); count++; }
X* get() { X *d = h.data; (...deletes head ...); count--; return d; }
clear() {...}
};
An example of how I would adapt the design and lock the queue access the posix way.
Remark that I would wrap the mutex to use RAII or use boost-threading and that I would use stl::deque or stl::queue as queue, but staying as close as possible to your code:
main-process:
...
start thread Monitor
...
while (!quit)
{
...
if (Qmain.count() > 0)
{
X = Qmain.get();
process(X)
delete X;
}
...
//at some random time:
QMain.put(Y);
...
}
Monitor:
{
while (1)
{
//obtain & package data
QMain.put(data)
}
}
fifo_q:
template < class X* > class fifo_q
{
struct item
{
X* data;
item *next;
item() { data=NULL; next=NULL; }
}
item *head, *tail;
int count;
pthread_mutex_t m;
public:
fifo_q() { head=tail=NULL; count=0; }
~fifo_q() { clear(); /*deletes all items*/ }
void put(X x)
{
pthread_mutex_lock(&m);
item i=new item();
(... adds to tail...);
count++;
pthread_mutex_unlock(&m);
}
X* get()
{
pthread_mutex_lock(&m);
X *d = h.data;
(...deletes head ...);
count--;
pthread_mutex_unlock(&m);
return d;
}
clear() {...}
};
Remark too that the mutex still needs to be initialized as in the example here and that count() should also use the mutex
Use the debugger. When your solution with mutexes hangs look at what the threads are doing and you will get a good idea about the cause of the problem.
What is your platform? In Unix/Linux you can use POSIX message queues (you can also use System V message queues, sockets, FIFOs, ...) so you don't need mutexes.
Learn about condition variables. By your description it looks like your Qmaster-thread is busy looping, burning your CPU.
One of your responses suggest you are doing something like:
Q2_mutex.lock()
Qmain_mutex.lock()
Qmain.put(Q2.get())
Qmain_mutex.unlock()
Q2_mutex.unlock()
but you probably want to do it like:
Q2_mutex.lock()
X = Q2.get()
Q2_mutex.unlock()
Qmain_mutex.lock()
Qmain.put(X)
Qmain_mutex.unlock()
and as Gregory suggested above, encapsulate the logic into the get/put.
EDIT: Now that you posted your code I wonder, is this a learning exercise?
Because I see that you are coding your own FIFO queue class instead of using the C++ standard std::queue. I suppose you have tested your class really well and the problem is not there.
Also, I don't understand why you need three different queues. It seems that the Qmain queue would be enough, and then you will not need the Qmaster thread that is indeed busy waiting.
About the encapsulation, you can create a synch_fifo_q class that encapsulates the fifo_q class. Add a private mutex variable and then the public methods (put, get, clear, count,...) should be like put(X) { lock m_mutex; m_fifo_q.put(X); unlock m_mutex; }
question: what would happen if you have more than one reader from the queue? Is it guaranteed that after a "count() > 0" you can do a "get()" and get an element?
I wrote a simple application below:
#include <queue>
#include <windows.h>
#include <process.h>
using namespace std;
queue<int> QMain, Q1, Q2;
CRITICAL_SECTION csMain, cs1, cs2;
unsigned __stdcall TMaster(void*)
{
while(1)
{
if( Q1.size() > 0)
{
::EnterCriticalSection(&cs1);
::EnterCriticalSection(&csMain);
int i1 = Q1.front();
Q1.pop();
//use i1;
i1 = 2 * i1;
//end use;
QMain.push(i1);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs1);
}
if( Q2.size() > 0)
{
::EnterCriticalSection(&cs2);
::EnterCriticalSection(&csMain);
int i1 = Q2.front();
Q2.pop();
//use i1;
i1 = 3 * i1;
//end use;
QMain.push(i1);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs2);
}
}
return 0;
}
unsigned __stdcall TMoniter(void*)
{
while(1)
{
int irand = ::rand();
if ( irand % 6 >= 3)
{
::EnterCriticalSection(&cs2);
Q2.push(irand % 6);
::LeaveCriticalSection(&cs2);
}
}
return 0;
}
unsigned __stdcall TMain(void)
{
while(1)
{
if (QMain.size() > 0)
{
::EnterCriticalSection(&cs1);
::EnterCriticalSection(&csMain);
int i = QMain.front();
QMain.pop();
i = 4 * i;
Q1.push(i);
::LeaveCriticalSection(&csMain);
::LeaveCriticalSection(&cs1);
}
}
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
::InitializeCriticalSection(&cs1);
::InitializeCriticalSection(&cs2);
::InitializeCriticalSection(&csMain);
unsigned threadID;
::_beginthreadex(NULL, 0, &TMaster, NULL, 0, &threadID);
::_beginthreadex(NULL, 0, &TMoniter, NULL, 0, &threadID);
TMain();
return 0;
}
You should not lock second mutex when you already locked one.
Since the question is tagged with C++, I suggest to implement locking inside get/add logic of the queue class (e.g. using boost locks) or write a wrapper if your queue is not a class.
This allows you to simplify the locking logic.
Regarding the sources you have added: queue size check and following put/get should be done in one transaction otherwise another thread can edit the queue in between
Are you acquiring multiple locks simultaneously? This is generally something you want to avoid. If you must, ensure you are always acquiring the locks in the same order in each thread (this is more restrictive to your concurrency and why you generally want to avoid it).
Other concurrency advice: Are you acquiring the lock prior to reading the queue sizes? If you're using a mutex to protect the queues, then your queue implementation isn't concurrent and you probably need to acquire the lock before reading the queue size.
1 problem may occur due to this rule "The main-process must continue running all the time, must not be blocked on a 'read'". How did you implement it? what is the difference between 'get' and 'read'?
Problem seems to be in your implementation, not in the logic. And as you stated, you should not be in any dead lock because you are not acquiring another lock whether in a lock.