I have a list of class pointers. I have a function that calls a method from these pointers. Each pointer in the list is a derived class from a main class. What i am currently doing is iterate through the list and call the method of 1st pointer in the list, wait for it to finish, then go to the 2nd class object pointer and call the method and so on.
Now i have like 20 derived classes and it is taking forever to complete through the list. So i wanted to use fork to execute maybe 4-5 class methods at once so that the whole process is that much fast..
list<Myclass *> check;
myfunc(list<Myclass *> check)
{
for(list<Myclass*>::iterator a= check.begin();a!=check.end();a++)
(*a)->run();
}
this is kinda a skeleton of what i have...
What i want is like each time it will fork and create a child process to execute the command and moveon to the next one...
Yes, you can use fork() to do some work in a child thread. However, once the child process is done doing it's work, it returns and you are not sharing data between them. I am not clear on your implementation but if the intent is to spawn off some processes to do some extra work, then that seems OK, but you probably want a thread, not fork.
You are more likely to want to start a thread than fork a process. It is easier when there are pointers involved, since pointers can be shared inside a process but not outside.
Also, forking a process has some performance overhead.
So i wanted to use fork to execute maybe 4-5 class methods at once so that the whole process is that much fast..
As many others have already mentioned, you probably want to use threads rather than fork here. There is a lot more overhead with fork than there is with spawning a new thread.
What others have not said is that spawning a thread or a process does not guarantee a speedup. For example, you might will get a slowdown rather than a speedup if you spawn many more CPU-bound threads at once than the number of available CPUs. What happens is that each of those threads compete with the others for their turn on the limited number of CPUs. A thread will run a little bit of time and then be swapped out for another.
It's a good idea to make the number of active threads less than the number of CPUs available. Even if you do that, you can still run into trouble when some other CPU-bound application happens to be running at the same time.
You're not passing any memory back with fork though. You probably want a thread. Here's how to do it though:
int i = 0;
int n = 4; //or 5;
list<Myclass> check; // You can't use pointers here though, as the memory is not shared.
myfunc(list<Myclass> check)
{
for(list<Myclass>::iterator a= check.begin();a!=check.end();a++) {
if(i >= n) {
wait();
} else {
if(fork() == 0) {
a->run();
exit(0);
} else {
i++;
}
}
}
// Prevent a voodoo priest from making zombies of these processes.
while(i-->0) wait();
}
Related
I have an application which has a couple of processing levels like:
InputStream->Pre-Processing->Computation->OutputStream
Each of these entities run in separate thread.
So in my code I have the general thread, which owns the
std::vector<ImageRead> m_readImages;
and then it passes this member variable to each thread:
InputStream input{&m_readImages};
std::thread threadStream{&InputStream::start, &InputStream};
PreProcess pre{&m_readImages};
std::thread preStream{&PreProcess::start, &PreProcess};
...
And each of these classes owns a pointer member to this data:
std::vector<ImageRead>* m_ptrReadImages;
I also have a global mutex defined, which I lock and unlock on each read/write operation to that shared container.
What bothers me is that this mechanism is pretty obscure and sometimes I get confused whether the data is used by another thread or not.
So what is the more straightforward way to share this container between those threads?
The process you described as "Input-->preprocessing-->computation-->Output" is sequential by design: each step depends on the previous one so parallelization in this particular manner is not beneficial as each thread just has to wait for another to complete. Try to find out which step takes most time and parallelize that. Or try to set up multiple parallel processing pipelines that operate sequentially on independent, individual data sets. A usual approach for that would employ a processing queue which distributes the tasks among a set of threads.
It would seem to me that your reading and preprocessing could be done independently of the container.
Naively, I would structure this as a fan-out and then fan-in network of tasks.
First, make dispatch task (a task is a unit of work that is given to a thread to actually operate) that will create input-and-preprocess tasks.
Use futures as a means for the sub-tasks to communicate back a pointer to the completely loaded image.
Make a second task, the std::vector builder task that just calls join on the futures to get the results when they are done and adds them to the std::vector array.
I suggest you structure things this way because I suspect that any IO and preprocessing you are doing will take longer than setting a value in the vector. Using tasks instead of threads directly lets you tune the parallel portion of your work.
I hope that's not too abstracted away from the concrete elements. This is a pattern I find to be well balanced between saturating available hardware, reducing thrash / lock contention, and is understandable by future-you debugging it later.
I would use 3 separate queues, ready_for_preprocessing which is fed by InputStream and consumed by Pre-processing, ready_for_computation which is fed by Pre-Processing and consumed by Computation, and ready_for_output which is fed by Computation and consumed by OutputStream.
You'll want each queue to be in a class, which has an access mutex (to control actually adding and removing items from the queue) and an "image available" semaphore (to signal that items are available) as well as the actual queue. This would allow multiple instances of each thread. Something like this:
class imageQueue
{
std::deque<ImageRead> m_readImages;
std::mutex m_changeQueue;
Semaphore m_imagesAvailable;
public:
bool addImage( ImageRead );
ImageRead getNextImage();
}
addImage() takes the m_changeQueue mutex, adds the image to m_readImages, then signals m_imagesAvailable;
getNextImage() waits on m_imagesAvailable. When it becomes signaled, it takes m_changeQueue, removes the next image from the list, and returns it.
cf. http://en.cppreference.com/w/cpp/thread
Ignoring the question of "Should each operation run in an individual thread", it appears that the objects that you want to process move from thread to thread. In effect, they are uniquely owned by only one thread at a time (no thread ever needs to access any data from other threads, ). There is a way to express just that in C++: std::unique_ptr.
Each step then only works on its owned image. All you have to do is find a thread-safe way to move the ownership of your images through the process steps one by one, which means the critical sections are only at the boundaries between tasks. Since you have multiple of these, abstracting it away would be reasonable:
class ProcessBoundary
{
public:
void setImage(std::unique_ptr<ImageRead> newImage)
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer == nullptr)
{
// Image has been transferred to next step, so we can place this one here.
m_imageToTransfer = std::move(m_newImage);
return;
}
}
std::this_thread::yield();
}
}
std::unique_ptr<ImageRead> getImage()
{
while (running)
{
{
std::lock_guard<m_mutex> guard;
if (m_imageToTransfer != nullptr)
{
// Image has been transferred to next step, so we can place this one here.
return std::move(m_imageToTransfer);
}
}
std::this_thread::yield();
}
}
void stop()
{
running = false;
}
private:
std::mutex m_mutex;
std::unique_ptr<ImageRead> m_imageToTransfer;
std::atomic<bool> running; // Set to true in constructor
};
The process steps would then ask for an image with getImage(), which they uniquely own once that function returns. They process it and pass it to the setImage of the next ProcessBoundary.
You could probably improve on this with condition variables, or adding a queue in this class so that threads can get back to processing the next image. However, if some steps are faster than others they will necessarily be stalled by the slower ones eventually.
This is a design pattern problem. I suggest to read about concurrency design pattern and see if there is anything that would help you out.
If you wan to add concurrency to the following sequential process.
InputStream->Pre-Processing->Computation->OutputStream
Then I suggest to use the active object design pattern. This way each process is not blocked by the previous step and can run concurrently. It is also very simple to implement(Here is an implementation:
http://www.drdobbs.com/parallel/prefer-using-active-objects-instead-of-n/225700095)
As to your question about each thread sharing a DTO. This is easily solved with a wrapper on the DTO. The wrapper will contain write and read functions. The write functions blocks with a mutext and the read returns const data.
However, I think your problem lies in design. If the process is sequential as you described, then why are each process sharing the data? The data should be passed into the next process once the current one completes. In other words, each process should be decoupled.
You are correct in using mutexes and locks. For C++11, this is really the most elegant way of accessing complex data between threads.
Suppose an object X is supposed to run forever. X is running threads with infinite loops inside, so the program will never exit.
My question is this: is it a good practice to put use the join() method at all, for example, in the deconstructor, or would it make more sense to do something like
int main() {
X myX;
while(1) {
}
return 0;
}
Are there any differences between the two approaches?
Sometimes it is required, often it is not. If you can design your app so that it does not, so much the better.
You would want to call join() if some part of your program needed to wait to run until a thread exited. It also makes the thread object destroyable so you don't create a memory leak. Threads that haven't been joined are like zombie processes and waste resources.
I'm programming in C++, but I'm only using pthread.h, no boost or C++11 threads.
So I'm trying to use threads but based on one of my previous questions (link), this doesn't seem feasible since threads terminate right after completion of its task, and one of the more prevalent reasons to use a thread-pool implementation is to reduce thread-creation overhead by reusing these threads for multiple tasks.
So is the only other way to implement this in C to use fork(), and create a pipe from the main to child processes? Or is there a way to set up a pipe between threads and their parent that I don't know about?
Many thanks in advance!
Yes, you can create a thread-safe queue between the threads. Then the threads in the pool will sit in a loop retrieving an item from the queue, executing whatever it needs, then going back and getting another.
That's generally a bit easier/simpler in C++ because it's a little easier to agree on some of the interface (e.g., overload operator() to execute the code for a task), but at a fundamental level you can do all the same things in C (e.g., each task struct you put in the queue will contain a pointer to a function to carry out the work for that task).
In your case, since you are using C++, it's probably easier to use an overload of operator() to do the work though. The rest of the task struct (or whatever you choose to call it) will contain any data needed, etc.
From the POSIX standard:
int pthread_create(pthread_t *restrict thread,
const pthread_attr_t *restrict attr,
void *(*start_routine)(void*), void *restrict arg);
(...) The thread is created executing start_routine with arg as its sole argument.
So, you should create a bunch of threads with this function, and have them all execute a function that goes something like
void *consumer(void *arg)
{
WorkQueue *queue = static_cast<WorkQueue *>(arg);
for (task in queue) {
if (task == STOP_WORKING)
break;
do work;
}
return WHATEVER;
}
(At the end of input, push n STOP_WORKING items to the queue where n is the number of threads.)
Mind you, pthreads is a very low-level API that offers very little type-safety (all data is passed as void pointers). If you're trying to parallelize CPU-intensive tasks, you might want to look at OpenMP instead.
'doesn't seem feasible since threads terminate right after completion of its task' what??
for(;;){
Task *myTask=theCommonProducerConsumerQueue->pop();
myTask->run();
}
.. never return anything, in fact, never return.
You may find it helpful to look at the source code for libdispatch, which is the basis for Apple's Grand Central Dispatch and uses thread pools.
I would suggest using Threaded Building Blocks from Intel to accomplish work-queue/threadpool like tasks. A fairly contrived example using TBB 3.0:
class PoorExampleTask : public tbb::task {
PoorExampleTask(int foo, tbb::concurrent_queue<float>& results)
: _bar(foo), _results(results)
{ }
tbb::task* execute() {
_results.push(pow(2.0, foo));
return NULL;
}
private:
int _bar;
tbb::concurrent_queue<float>& _results;
}
Used later on like so:
tbb::concurrent_queue<float> powers;
for (int ww = 0; ww < LotsOfWork; ++ww) {
PoorExampleTask* tt
= new (tbb::task::allocate_root()) PoorExampleTask(ww, powers);
tbb::task::enqueue(*tt);
}
http://people.clarkson.edu/~jmatthew/cs644.archive/cs644.fa2001/proj/locksmith/code/ExampleTest/threadpool.c
I used google a couple months ago, you should try it.
Edit: it seems maybe you want a group instead. I was able to create one with some minor alteration of the above so that the worker didn't perform work, but just joined threads.
I am not sure how to put this question in this forum any way i am asking and hopefully get some inputs.
I am writing a thread pool for my project. I have following design.
I am maintaining vector of threads std::vector<ThreadWrapper <threadFuncParam>* > m_vecThreads;
and pushing the threds in to list m_vecThreads.push_back(pThreadWrapper);
When new request comes i am taking the thread pool as below
if(!m_vecThreads.empty() )
{
ThreadWrapper <threadFuncParam>* pWrapper = m_vecThreads.back();
m_vecThreads.pop_back();
//... Awake threadd
}
When thread job is done it is pushed back in to pool of thread.
Now while gracefull shutdown i have stop the threads gracefully now with the design above i am facing problem how can i stop threads as in vector container i am poping from vector when request is serviced, so i lost the pointer till service is completed.
Is there better i can do this or handle this scenario like map or other container which is supported by standard C++?
Another question is
During shutdown i have a scenario threads are doing process here in my case reading from database which may take time so i cannot wait till it is complete
and i want to send reply to clients for pending requests which threads are processing and i am about to kill that value is bad.
Thanks!
If you still need access to what you pass out from your pool, then you should store the items in a "used" container.
However, at that moment, you are sharing your pointers, so you should use shared_ptr and pass out weak_ptr, so the threads can also be deleted and the users don't have a dangling pointer
The best cointainer for the used items would be a set, so the returned thread can be found and removed easily.
To solve your first problem, push it on to another vector, say m_vecBusyThreads, and when it's done, take it off there (note, you'll have to have some mechanism to search for the finished thread).
For your second problem, cleanest solution is to join each thread till it has "shutdown", any other approach could end up with some undesired side effects (esp. for example if it's connecting to a db etc.) Now that you have the busy container, iterate through tell each to shutdown, then iterate through each of your free containers, shutting down and joining each thread. Then go back to the busy container and attempt to join each thread. This may give a little time to the busy threads to shutdown cleanly.
boost::threads supports this concept of interrupt points, and the idea is that you can interrupt a thread at any of these points, however some calls are not interruptible (typically blocking calls), you need to find the best way to stop each type (socket read for example may be to send a dummy packet etc.)
I have done it in C, so the solution is not "C++"ish, but I was using two arrays: one containing the threads, and the other containing a representation of used / unused (~boolean).
I would be something like:
pthread_t[INITIAL_SIZE] thread_pool;
boolean[INITIAL_SIZE] threads_availability;
int first_available = 0;
pthread_t * get_thread() {
int ind = 0;
if (first_available<=INITIAL_SIZE) {
ind = first_available;
// find the next available spot
for (first_available; first_available < INITIAL_SIZE && threads_availability[first_available]; first_available++);
threads_availability[ind] = 0;
return thread_pool[ind];
}
}
void put_thread(pthread_t* thethread)
{
int i = 0;
pthread_t *it = thread_pool;
while (!pthread_equals(it, thethread)) {
it++;
i++;
}
thread_availability[i] = 1;
}
please keep in mind that this is pseudo code, and this is not optimal.
But this is an idea.
This is not a direct answer to your problem as other people already answered your original question.
I just wanted to say that you could look into boost::asio and/or boost::thread.
I would probably go for boost::asio because it has everything you need to do asynchronous operations based on timers and whatnot. You could use shared_ptr and boost::enable_shared_from_this in order to let your "jobs" go and be destroyed automatically when they finish their job.
Example:
boost::shared_ptr<async_job> aj( new async_job(
io_, boost::bind(&my_job::handle_completion, shared_from_this(), _1, _2)));
This code would execute your custom async_job on a thread pool (io_ is boost::asio::io_service). Your 'my_job' instance will be automatically destroyed when the async_job finishes and invokes handle_completion on it. Or you can let it live if you take shared_from_this() again inside handle_completion.
HTH,
Alex
I'm working on a graphical application which looks something like this:
while (Simulator.simulating)
{
Simulator.update();
InputManager.processInput();
VideoManager.draw();
}
I do this several times a second, and in the vast majority of cases my computation will be taking up 90 - 99% of my processing time. What I would like to do is take out the processInput and draw functions and have each one run independently.
That way, I can have the input thread always checking for input (at a reasonable rate), and the draw thread attempting to redraw at a given frame rate.
The simulator is already (internally) multithreaded and there is no issues with multiple threads writing to the same data (each one processes a segment).
My issue is I'm not sure how I can properly do this. How would I properly initialize my pthread_t and associated pthread_attr_t so that the thread runs without blocking what I'm doing? In other words, how can I create two threads, each of which run an infinite loop?
To generalize even more, I'm trying to figure out how to do this:
for (int i = 0; i < threads; i++)
pthread_create(&th[i], NULL, func[i], NULL)
for (int i = 0; i < threads; i++)
pthread_join(th[i], NULL);
Where func[i] is some arbitrary function which runs in an infinite loop doing some arbitrary thing.
Any help or even a link is appreciated, thanks!
Edit: I should mention it is an interactive simulator, so I do need to have two infinite loops running independent of each other. I can only seem to run at once.
Double buffering is your friend here. Have 2 buffers of data. One is the drawing buffer and one is the calculating buffer. When you have finished calculating then wait for the current draw to finish and then swap the buffers over. Now it will continue drawing the newly calculated data while you are calculating the next frames worth of data. Drawing and Simulation are now almost completely de-coupled ...
First I would suggest using boost::thread as opposed to pthreads since you are using C++. With boost::thread you can do something like this:
#include <boost/thread.hpp>
void input_thread()
{
//...
}
void draw_thread()
{
//...
}
int main()
{
boost::thread input_th(&input_thread);
boost::thread draw_th(&draw_thread);
input_th.join();
draw_th.join();
return 0;
}
The constructor of a boost::thread automatically spawns a new thread and calls the function passed in. You can also use member functions as threads by using boost::bind. The join function blocks until the thread exits. This is necessary because if main() exits, all of your threads are killed.
Hopefully that will get you started, but the hard part is synchronizing (sharing data among threads). I suggest you look at the documentation for mutexes and condition variables. Remember that you need to make sure that only one thread is writing to the same memory location at once. Mutexes help solve this problem. Condition variables help by allowing you to signal and wait for signals between threads.
For instance, in the input thread you may fill a buffer with input then use a condition variable to signal to the draw thread that input is ready. In each thread, a mutex should be locked when accessing the buffer so that it is not overwritten by the input thread while the draw thread is trying to read it. As Goz suggested, a double buffer would make this easier and probably more efficient.