Is there much overhead with AfxBeginThread? - c++

How much overhead is there when AfxBeginThread does it's thing?
I have an embarrassingly parallel project, and I want to launch batches of 4-15 threads with AfxBeginThread, wait for each to finish naturally, compare the results, then repeat zillions of times.
What has me concerned is that each worker thread is going to do much less than a second's worth of work, maybe 1/50th of a second or less, and frankly I don't know how many cycles go into the voodoo AfxBeginThread does to register the new thread, set it up, enter it and exit it naturally when the function ends.
Any thoughts?

As a general principle, you probably want to avoid starting and stopping threads all the time. Create the worker threads once, and then feed them data zillions of times. Then you don't have to worry about the thread creation and destruction overhead (which is small but nontrivial).

Related

Best way to handle job cancellation on thread

I wrote a simple job queue that uses a thread to run the jobs in the queue one-by-one. The thread itself is from a pool, so it's lifetime lasts as long as the job queue object is around. The job is popped off the queue, then run() is called on the job, and then it's discarded once finished.
I'm wondering what sorts of paradigms could I use to abort a job in mid-process. The naive approach is to have an abort flag which I check at regular intervals. The problem is that some jobs take a while because of I/O blocking or some other computationally heavy task.
Another option I thought was to kill the thread entirely. This is a potentially dirty and error prone solution.
Are there other ways of doing this?
EDIT: Since I'm in C++ land, is there a way to inject an exception into the other thread? It would immediately break execution and return to the thread main. This would be ideal, I think.
Depends on implementation of threads you use, there may be different ways to manipulate "abort flag". I would offer to look toward boost.threads & boost.interruption_points.
UPD: That injects exception in thread, if it's at interruption_point, as you wanted.
But if you have big unsplittable block of heavy calculations, then, I believe, ideologically it have to be finished full. Think, if you see any "moments" where you can stop it, inside this block, then you can split this block on parts, inserting there "abort flag" at those moments.
So, if it is monolith block, there can't be such moments. So you can't interrupt calculations normal way. So you have to wait for their finish.
But you can avoid waiting problems, if you will calculate your heavy block not in separated thread, but in separated process. Then you can kill it without being afraid for dirtying your main process memory, if needs, you can even left it to calculate what it needs for hours, after your main process was closed many minutes ago, and then silently die, if needs. No problems.

sub-threads of a thread?

Assuming you have a multi-threading program, each thread may call a function from a DLL, and the function in that DLL will processing data in a multi-threading manner, in general, are there any peformance benefit/hit by doing this instead of make these functions in the DLL single-threaded?
Maybe, maybe not. It depends on many things.
Firstly, spawning new threads is pretty expensive, so the amount of work that will be done in parallel will need to offset this cost.
Secondly, there need to be spare CPU cycles for those threads to actually run in parallel, and not be time-sliced onto the same core.
Lastly, the threads will need to be able to use those spare CPU cycles and not, for example, spend most of their time waiting for each other.

Fastest method to wait under thread contention

I'm using pthread on Linux. I have a circular buffer to pass data from one thread to another. Maybe the circular buffer is not the best structure to use here, but changing that would not make my problem go away, so we'll just refer it as a queue.
Whenever my queue is either full or empty, pop/push operations return NULL. This is problematic since my threads fire periodically. Waiting for another thread loop would take too long.
I've tried using semaphores (sem_post, sem_wait) but unlocking under contention takes up to 25 ms, which is about the speed of my loop. I've tried waiting with pthread_cond_t, but the unlocking takes up to between 10 and 15 ms.
Is there a faster mechanism I could use to wait for data?
EDIT*
Ok I used condition variables. I'm on an embedded device so adding "more cores or cpu power" is not an option. This made me realise I had all sorts of thread priorities set all over the place so I'll sort this out before going further
You should use condition variables. The only faster ways are platform-specific, and they're only negligibly faster.
You're seeing what you think is poor performance simply because your threads are being de-scheduled. You're seeing long "delays" when your thread is near the end of its timeslice and the scheduler allows the unblocked thread to pre-empt the running thread. If you have more cores than threads or set your thread to a higher priority, you won't see these delays.
But these delays are actually a good thing, and you shouldn't be concerned about them. Other threads just get a chance to run too.

fastest way to wake up a thread without using condition variable

I am trying to speed up a piece of code by having background threads already setup to solve one specific task. When it is time to solve my task I would like to wake up these threads, do the job and block them again waiting for the next task. The task is always the same.
I tried using condition variables (and mutex that need to go with them), but I ended up slowing my code down instead of speeding it up; mostly it happened because the calls to all needed functions are very expensive (pthread_cond_wait/pthread_cond_signal/pthread_mutex_lock/pthread_mutex_unlock).
There is no point in using a thread pool (that I don't have either) because it is a too generic construct; here I want to address only my specific task. Depending on the implementation I would also pay a performance penalty for the queue.
Do you have any suggestion for a quick wake-up without using mutex or con_var?
I was thinking in setup threads like timers reading an atomic variable; if the variable is set to 1 the threads will do the job; if it is set to 0 they will go to sleep for few microseconds (I would start with microsecond sleep since I would like to avoid using spinlocks that might be too expensive for the CPU). What do you think about it? Any suggestion is very appreciated.
I am using Linux, gcc, C and C++.
These functions should be fast. If they are taking a large fraction of your time, it is quite possible that you are trying to switch threads too often.
Try buffering up a work queue, and send the signal once a significant amount of work has accumulated.
If this is impossible due to dependencies between the tasks, then your application is not amenable to multithreading at all.
In order to gain performance in a multithreaded application, spawn as many threads as there are CPUs, not a separate thread for each task. Otherwise you end up with a lot of overhead from context switching.
You may also consider making your algorithm more linear (i.e. by using non-blocking calls).

do sem_wating threads cause more switching

I have several threads which act as backup for the main one spending most of their life blocked by sem_wait(). Is it OK to keep them or is it better to spawn new threads only when they need to do actual work? Does kernel switch to threads waiting on sem_wait() and "waste" CPU cycles?
Thanks.
No, blocked threads are never switched in for any common thread library and operating system (it would be an extremely badly designed one where they were). But they will still use memory, of course.
Choose option A.
The wasted cycles are minor. Your threads will always be in wait state.
On the other hand, the complexity of starting and stopping threads, instead of having them all up may seriously harm your program logic.