I've created a multi-threaded application using C++ and POSIX threads. In which I should now block a thread (main thread) until a boolean flag is set (becomes true).
I've found two ways to get this done.
Spinning through a loop without sleep.
while(!flag);
Spinning through a loop with sleep.
while(!flag){
sleep(some_int);
}
If I should follow the first way, why do some people write codes following the second way? If the second way should be used, why should we make current thread to sleep? And what are disadvantages of this way?
The first option (a "busy wait") wastes an entire core for the duration of the wait, preventing other useful work being done and/or wasting energy.
The second option is less wasteful - your waiting thread uses very little CPU and allows other threads to run. But it is still wasteful to keep switching back to the thread to check the flag.
Far better than either would be to use a condition variable, which allows the waiting thread to block without consuming any resources until it is able to proceed.
while(flag); will cause your thread to use all of its allocated time checking the condition. This wastes a lot of CPU cycles checking something which has likely not changed.
Sleeping for a bit causes the thread to pause and give up the CPU to programs that actually need it.
You shouldn't do either though; you should use a threading library to create a flag object and call its wait function, so that the kernel will pause the thread until the flag is set.
The first way (just the plain while) is wasting resources, specifically the processor time of your process.
When a thread is put into sleep, OS may decide that the processor will be used for different tasks when talking about systems with preemptive multitasking. In theory, if you had as many processors / cores as threads, there would not have to be any difference.
If a solution is good or not depends on the operating system used, and sometimes architecture the program is running on. You should consult your syscall reference to find out more about this.
Related
I'm working on an embedded Linux system (3.12.something), and our application, after some random amount of time, starts hogging the CPU. I've run strace on our application, and right when the problem happens, I see a lot of lines similar to this in the strace output:
[48530666] futex(0x485f78b8, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.009002>
I'm pretty sure this is the smoking gun I'm looking for and there is a race of some sort. However, I now need to figure out how to identify the place in the code that's trying to get this mutex. How can I do that? Our code is compiled with GCC and has debugging symbols in it.
My current thinking (that I haven't tried yet) is to print out a string to stdout and flush before trying to grab any mutex in our system, with the expectation that the string will print right before strace complains about getting the lock ... but there are a LOT of places in the code that would have to be instrumented like this.
EDIT: Another strange thing that I just realized is that our program doesn't start hogging the CPU until some random time has passed since it was run (5 minutes to 5 hours and anywhere in between). During that time, there are zero futex syscalls happening. Why do they suddenly start? From what I've read, I think maybe they are being used properly in userspace until something fails and falls back to making a futex() syscall...
Any suggestions?
If you perpetually and often lock a mutex for a short time from different threads, like e.g. one protecting a global logger, you might cause a so-called thread convoy. The problem doesn't occur until two threads compete for the lock. The first gets the lock and holds it for a short time, then, when it needs the lock a second time, it gets preempted because the second one is waiting already. The second one does the same. The timeslice available to each thread is suddenly reduced to the time between two lock attempts, causing many context switches and the according slowdown. Further, all but one thread is always blocked on the mutex, effectively disabling any parallel execution.
In order to fix this, make your threads cooperate instead of competing for resources. For above example of a logger, consider e.g. a lock-free queue for the entries or separate queues for each thread using thread-local storage.
Concerning the futex() calls, the idea is to poll an atomic flag and after some rotations use the actual OS mutex. The atomic flag is available without the expensive switch between user-space and kernel-space. For longer breaks, using the kernel preemption (with futex()) avoids blocking the CPU with polling. This explains why the program doesn't need any futex() calls in normal operation.
You, basically need to generate core file at this moment.
Then you could load program+core in GDB and look at it
man gcore
or
generate-core-file
During that time, there are zero futex syscalls happening. Why do they suddenly start?
This is due to the fact that uncontested mutex, implemented via futex, doesn't make a system call, only atomic increment, purely in user space. Only CONTESTED lock is visible as system call
In a multi threaded app, is
while (result->Status == Result::InProgress) Sleep(50);
//process results
better than
while (result->Status == Result::InProgress);
//process results
?
By that, I'm asking will the first method be polite to other threads while waiting for results rather than spinning constantly? The operation I'm waiting for usually takes about 1-2 seconds and is on a different thread.
I would suggest using semaphores for such case instead of polling. If you prefer active waiting, the sleep is much better solution than evaluating the loop condition constantly.
It's better, but not by much.
As long as result->Status is not volatile, the compiler is allowed to reduce
while(result->Status == Result::InProgress);
to
if(result->Status == Result::InProgress) for(;;) ;
as the condition does not change inside the loop.
Calling the external (and hence implicitly volatile) function Sleep changes this, because this may modify the result structure, unless the compiler is aware that Sleep never modifies data. Thus, depending on the compiler, the second implementation is a lot less likely to go into an endless loop.
There is also no guarantee that accesses to result->Status will be atomic. For specific memory layouts and processor architectures, reading and writing this variable may consist of multiple steps, which means that the scheduler may decide to step in in the middle.
As all you are communicating at this point is a simple yes/no, and the receiving thread should also wait on a negative reply, the best way is to use the appropriate thread synchronisation primitive provided by your OS that achieves this effect. This has the advantage that your thread is woken up immediately when the condition changes, and that it uses no CPU in the meantime as the OS is aware what your thread is waiting for.
On Windows, use CreateEvent and co. to communicate using an event object; on Unix, use a pthread_cond_t object.
Yes, sleep and variants give up the processor. Other threads can take over. But there are better ways to wait on other threads.
Don't use the empty loop.
That depends on your OS scheduling policy too.For example Linux has CFS schedular by default and with that it will fairly distribute the processor to all the tasks. But if you make this thread as real time thread with FIFO policy then code without sleep will never relenquish the processor untill and unless a higher priority thread comes, same priority or lower will never get scheduled untill you break from the loop. if you apply SCHED_RR then processes of same priority and higher will get scheduled but not lower.
I have a program with a main thread and a diagnostics thread. The main thread is basically a while(1) loop that performs various tasks. One of these tasks is to provide a diagnostics engine with information about the system and then check back later (i.e. in the next loop) to see if there are any problems that should be dealt with. An iteration of the main loop should take no longer than 0.1 seconds. If all is well, then the diagnostic engine takes almost no time to come back with an answer. However, if there is a problem, the diagnostic engine can take seconds to isolate the problem. For this reason each time the diagnostic engine receives new information it spins up a new diagnostics thread.
The problem we're having is that the diagnostics thread is stealing time away from the main thread. Effectively, even though we have two threads, the main thread is not able to run as often as I would like because the diagnostic thread is still spinning.
Using Boost threads, is it possible to limit the amount of time that a thread can run before moving on to another thread? Also of importance here is that the diagnostic algorithm we are using is blackbox, so we can't put any threading code inside of it. Thanks!
If you run multiple threads they will indeed consume CPU time. If you only have a single processor, and one thread is doing processor intensive work then that thread will slow down the work done on other threads. If you use OS-specific facilities to change the thread priority then you can make the diagnostic thread have a lower priority than the main thread. Also, you mention that the diagnostic thread is "spinning". Do you mean it literally has the equivalent of a spin-wait like this:
while(!check_done()) ; // loop until done
If so, I would strongly suggest that you try and avoid such a busy-wait, as it will consume CPU time without achieving anything.
However, though multiple threads can cause each other to slow-down, if you are seeing an actual delay of several seconds this would suggest there is another problem, and that the main thread is actually waiting for the diagnostic thread to complete. Check that the call to join() for the diagnostic thread is outside the main loop.
Another possibility is that the diagnostic thread is locking a mutex needed by the main thread loop. Check which mutexes are locked and where.
To really help, I'd need to see some code.
looks like your threads are interlocked, so your main thread waits until background thread finished its work. check any multithreading sychronization that can cause this.
to check that it's nothing related to OS scheduling run you program on double-core system, so both threads can be executed really in parallel
From the way you've worded your question, it appears that you're not quite sure how threads work. I assume by "the amount of time that a thread can run before moving on to another thread" you mean the number of cpu cycles spent per thread. This happens hundreds of thousands of times per second.
Boost.Thread does not have support for thread priorities, although your OS-specific thread API will. However, your problem seems to indicate the necessity for a fundamental redesign -- or at least heavy profiling to find bottlenecks.
You can't do this generally at the OS level, so I doubt boost has anything specific for limiting execution time. You can kinda fake it with small-block operations and waits, but it's not clean.
I would suggest looking into processor affinity, either at a thread or process level (this will be OS-specific). If you can isolate your diagnostic processing to a limited subset of [logical] processors on a multi-core machine, it will give you a very course mechanism to control maximum execution amount relative to the main process. That's the best solution I have found when trying to do a similar type of thing.
Hope that helps.
I have a program with several threads, one thread will change a global when it exits itself and the other thread will repeatedly poll the global. No any protection on the globals.
The program works fine on uni-processor. On dual core machine, it works for a while and then halt either on Sleep(0) or SuspendThread(). Would anyone be able to help me out on this?
The code would be like this:
Thread 1:
do something...
while(1)
{
.....
flag_thread1_running=false;
SuspendThread(GetCurrentThread());
continue;
}
Thread 2
flag_thread1_running=true;
ResumeThread(thread1);
.....do some other work here....
while(flag_thread1_running) Sleep(0);
....
The fact that you don't see any problem on a uniprocessor machine, but see problems on a multiproc machine is an artifact of the relatively large granularity of thread context switching on a uniprocessor machine. A thread will execute for N amount of time (milliseconds, nanoseconds, whatever) before the thread scheduler switches execution to a different thread. A lot of CPU instructions can execute in the typical thread timeslice. You can think of it as having a fairly large chunk of "free play" exclusive processor time during which you probably won't run into resource collisions because nothing else is executing on the processor.
When running on a multiproc machine, though, CPU instructions in two threads execute exactly at the same time. The size of the "free play" chunk of time is near zero.
To reproduce a resource contention issue between two threads, you need to get thread 1 to be accessing the resource and thread 2 to be accessing the resource at the same time, or very nearly the same time.
In the large-granularity thread switching that takes place on a uniprocessor machine, the chances that a thread switch will happen exactly in the right spot are slim, so the program may never exhibit a failure under normal use on a uniproc machine.
In a multiproc machine, the instructions are executing at the same time in the two threads, so the chances of thread 1 and thread 2 accessing the same resource at the same time are much, much greater - thousands of times more likely than the uniprocessor scenario.
I've seen it happen many times: an app that has been running fine for years on uniproc machines suddenly starts failing all over the place when executed on a new multiproc machine. The cause is a latent threading bug in the original code that simply never hit the right coincidence of timeslicing to repro on the uniproc machines.
When working with multithreaded code, it is absolutely imperitive to test the code on multiproc hardware. If you have thread collision issues in your code, they will quickly present themselves on a multiproc machine.
As others have noted, don't use SuspendThread() unless you are a debugger. Use mutexes or other synchronization objects to coordinate between threads.
Try using something more like WaitForSingleObjectEx instead of SuspendThread.
You are hitting a race condition. Thread 2 may execute flag_thread1_running=true;
before thread 1 executes flag_thread1_running=false.
This is not likely to happen on single CPU, because with usual the scheduling quantum 10-20 ms you are not likely to hit the problem. It will happen there as well, but very rarely.
Using proper synchronization primitives is a must here. Instead of bool, use event. Instead of checking the bool in a loop, use WaitForSingleObject (or WaitForMultipleObjects for more elaborate stuff later).
It is possible to perform synchronization between threads using plain variables, but it is rarely a good idea and it is quite hard to do it right - cf. How can I write a lock free structure?. It is definitely not a good idea to perform schedulling using Sleep, Suspend or Resume.
I guess that you already know that polling a global flag is a "Bad Idea™" so I'll skip that little speech. Try adding volatile to the flag declaration. That should force each read of it to read from memory. Without volatile, the implementation could be reading the flag into a register and not fetching it from memory.
I have several threads which act as backup for the main one spending most of their life blocked by sem_wait(). Is it OK to keep them or is it better to spawn new threads only when they need to do actual work? Does kernel switch to threads waiting on sem_wait() and "waste" CPU cycles?
Thanks.
No, blocked threads are never switched in for any common thread library and operating system (it would be an extremely badly designed one where they were). But they will still use memory, of course.
Choose option A.
The wasted cycles are minor. Your threads will always be in wait state.
On the other hand, the complexity of starting and stopping threads, instead of having them all up may seriously harm your program logic.