Switching between pthreads on a timer

Switching between pthreads on a timer - c++

I want to make two pthreads in a c++ program using a single processor. Thread1 will be endlessly running but needs to be interrupted every 5 microseconds to allow thread2 to do one iteration of a while loop before switching back to thread1. I know how to make the pthreads and whatnot, but I can't figure out how to make the switching between threads based on a timer occur. Is there a way to do this?

if you don't want the two threads to run simultaneously then you should not use two threads. The second thread can be used as a function to be called from the first thread each 5 micro seconds.

The problem with what you want is the scale of the time interval for the interrupts. Usual OS do not switch threads that often, the normal rate lays in milliseconds scale. To achieve 5 microseconds intervals you need either
Real-time OS with low-cost high-precision interrupts or write your own kernel module for the same purpose.
Proactively schedule the threads. I.e. more or less similar to what Samer suggested.
I'm not familiar with RtOS or drivers. And I doubt you might need this very specialized solution. As for the second approach..
pthreads
You can still do it with pthreads but make sure the thread1 calls OS at least every 5 microseconds to give the OS a chance to reschedule it to thread2. And you'd need to assign higher priority for thread2 or set different scheduling policy like Round-Robin instead of Fair Scheduling: see here.
However, this threads approach will suffer from significant overheads for context switches: assuming roughly 1300 ns per context switch, and that you need 2 switches per 5 microseconds, it leads to about 50% time of overheads!
coroutines
Another approach could be user-level threads (coroutines/fibers) scheduling. It has significantly less overheads and enables thread2 to be written as a loop which sleeps periodically instead of exiting its function completely. Anyway, it also requires thread1 to take care of having checkpoints with the rate sufficient to allow switches every 5 microseconds.
function
And finally, if the thread2 can be easily implemented as function having a state in order to handle the loop, it is the best choice from both overheads and complexity points of view.

Related

Is it really impossible to suspend two std/posix threads at the same time?

I want to briefly suspend multiple C++ std threads, running on Linux, at the same time.
It seems this is not supported by the OS.
The threads work on tasks that take an uneven and unpredictable amount of time (several seconds).
I want to suspend them when the CPU temperature rises above a threshold.
It is impractical to check for suspension within the tasks, only inbetween tasks.
I would like to simply have all workers suspend operation for a few milliseconds.
How could that be done?
What I'm currently doing
I'm currently using a condition variable in a slim, custom binary semaphore class (think C++20 Semaphore).
A worker checks for suspension before starting the next task by acquiring and immediately releasing the semaphore.
A separate control thread occupies the control semaphore for a few milliseconds if the temperature is too high.
This often works well and the CPU temperature is stable.
I do not care much about a slight delay in suspending the threads.
However, when one task takes some seconds longer than the others, its thread will continue to run alone.
This activates CPU turbo mode, which is the opposite of what I want to achieve (it is comparatively power inefficient, thus bad for thermals).
I cannot deactivate CPU turbo as I do not control the hardware.
In other words, the tasks take too long to complete.
So I want to forcefully pause them from outside.

I want to suspend them when the CPU temperature rises above a threshold.
In general, that is putting the cart before the horse.
Properly designed hardware should have adequate cooling for maximum load and your program should not be able to exceed that cooling capacity.
In addition, since you are talking about Turbo, we can assume an Intel CPU, which will thermally throttle all on their own, making your program run slower without you doing anything.
In other words, the tasks take too long to complete
You could break the tasks into smaller parts, and check the semaphore more often.
A separate control thread occupies the control semaphore for a few milliseconds
It's really unlikely that your hardware can react to millisecond delays -- that's too short a timescale for anything thermal. You will probably be better off monitoring the temperature and simply reducing the number of tasks you are scheduling when the temperature is rising and getting close to your limits.
I've now implemented it with pthread_kill and SIGRT.
Note that suspending threads in unknown state (whatever the target task was doing at the time of signal receipt) is a recipe for deadlocks. The task may be inside malloc, may be holding arbitrary locks, etc. etc.
If your "control thread" also needs that lock, it will block and you lose. Your control thread must execute only direct system calls, may not call into libc, etc. etc.
This solution is ~impossible to test, and ~impossible to implement correctly.

Can I guarantee that Sleep() would not sleep for more than 10 ms?

I know that Sleep() is not accurate, but is there's a way to make it not sleep for more than 10 ms (i.e. only sleep between 1 ms and 10 ms)? Or does Sleep(1) already guarantee that?

If you really want guaranteed timings, you will not be using Windows at all.
To answer your question, Sleep() does not provide any means of guaranteeing an upper bound on the sleep time.
In windows, this is because Sleep() relinquishes the threads's time slice, and it is not guaranteed that the system scheduler will schedule the sleeping thread (i.e. allocate another time slice) to execute immediately after the sleep time is up. That depends on priorities of competing threads, scheduling policies, and things like that.
In practice, the actual sleep interval depends a lot on what other programs are running on the system, configuration of the system, whether other programs are accessing slow drives, etc etc.
With a lightly loaded system, it is a fair bet Sleep(1) will sleep between 1 and 2 ms on any modern (GHz frequency CPU or better). However, it is not impossible for your program to experience greater delays.
With a heavily loaded system (lots of other programs executing, using CPU and timer resources), it is a fair bet your program will experience substantially greater delays than 1ms, and even more than 10ms.
In short: no guarantees.

There is no way to guarantee it.
This is what real time OS are for.
In general case if your OS doesn't experience high loads sleep will be pretty accurate but as you increase load on it the more inaccurate it will get.

No. Or, yes, depending on your perspective.
According to the documentation:
After the sleep interval has passed, the thread is ready to run. If
you specify 0 milliseconds, the thread will relinquish the remainder
of its time slice but remain ready. Note that a ready thread is not
guaranteed to run immediately. Consequently, the thread may not run
until some time after the sleep interval elapses. For more
information, see Scheduling Priorities.
What this means is that the problem isn't Sleep. Rather, when Sleep ends, your thread may still need to wait to become active again.

You cannot count on 10 milliseconds, that's too low. Sleep() accuracy is affected by:
The clock tick interrupt frequency. In general, the processor tends to be in a quiescent state, not consuming any power and turned off by the HLT instruction. It is dead to the world, unaware that time is passing and unaware that your sleep interval has expired. A periodic hardware interrupt generated by the chipset wakes it up and makes it pay attention again. By default, this interrupt is generated 64 times per second. Or once every 15.625 milliseconds.
The thread scheduler runs at every clock interrupt. It is the one that notices that your sleep interval has expired, it will put the thread back into the ready-to-run state. And boosts its priority so that it is more likely to acquire a processor core. It will do so when no other threads with higher priority are ready to run.
There isn't much you can do about the 2nd bullet, you have to compete with everybody else and take your fair share. If the thread does a lot of sleeping and little computation then it is not unreasonable to claim more than your fair share, call SetThreadPriority() to boost your base priority and make it more likely that your sleep interval is accurate. If that isn't good enough then the only way to claim a high enough priority that will always beat everybody else is by writing ring 0 code, a driver.
You can mess with the 1st bullet, it is pretty common to do so. Also the reason why many programmers think that the default accuracy is 10 msec. Or if they use Chrome that it might be 1 msec, that browser jacks up the interrupt rate sky-high. A fairly unreasonable thing to do, bad for power consumption, unless you are in the business of making your mobile operating system products look good :)
Call timeBeginPeriod before you need to make your sleep intervals short enough, timeEndPeriod() when you're done. Use NtSetTimerResolution() if you need to go lower than 1 msec.

Sleep won't guarantee that.
The only way I know of doing that is to have a thread wait for a fast timer event and free a synchronization object every 10 ms or so.
You will pass a semaphore to this "wait server task", and it will free it on the next timer tick, thus giving you a response time between 0 and 10 ms.
Of couse if you want an extreme precision you will have to boost this thread priority above other tasks that might preempt it, and at any rate you might still be preempted by system processes and/or interrupt handlers, which will add some noise to your timer.

Threads: How to calculate precisely the execution time of an algorithm (duration of function) in C or C++?

There is easy way to calc duration of any function which described here: How to Calculate Execution Time of a Code Snippet in C++
start_timestamp = get_current_uptime();
// measured algorithm
duration_of_code = get_current_uptime() - start_timestamp;
But, it does not allow to get clear duration cause some time for execution other threads will be included in the measured time.
So question is: how to consider time which code spend in other threads?
OSX code preffer. Although it's great to look to windows or linux code also...
upd: Ideal? concept of code
start_timestamp = get_this_thread_current_uptime();
// measured algorithm
duration_of_code = get_this_thread_current_uptime() - start_timestamp;

I'm sorry to say that in the general case there is no way to do what you want. You are looking for worst-case execution time, and there are several methods to get a good approximation for this, but there is no perfect way as WCET is equivalent to the Halting problem.

If you want to exclude the time spent in other threads then you could disable task context switches upon entering the function that you want to measure. This is RTOS dependent but one possibility is to raise the priority of the current thread to the maximum. If this thread is max priority then other threads won't be able to run. Remember to reset the thread priority again at the end of the function. This measurement may still include the time spent in interrupts, however.
Another idea is to disable interrupts altogether. This could remove other threads and interrupts from your measurement. But with interrupts disabled the timer interrupt may not function properly. So you'll need to setup a hardware timer appropriately and rely on the timer's counter value register (rather than any time value derived from a timer interrupt) to measure the time. Also make sure your function doesn't call any RTOS routines that allow for a context switch. And remember to restore interrupts at the end of your function.
Another idea is to run the function many times and record the shortest duration measured over those many times. Longer durations probably include time spent in other threads but the shortest duration may be just the function with no other threads.
Another idea is to set a GPIO pin upon entry to and clear it upon exit from the function. Then monitor the GPIO pin with an oscilloscope (or logic analyzer). Use the oscilloscope to measure the period for when the GPIO pin is high. In order to remove the time spent in other threads you would need to modify the RTOS scheduler routine that selects the thread to run. Clear the GPIO pin in the scheduler when another thread runs and set it when the scheduler returns to your function's thread. You might also consider clearing the GPIO pin in interrupt handlers.

Your question is entirely OS dependent. The only way you can accomplish this is to somehow get a guarantee from the OS that it won't preempt your process to perform some other task, and to my knowledge this is simply not possible in most consumer OS's.
RTOS often do provide ways to accomplish this though. With Windows CE, anything running at priority 0 (in theory) won't be preempted by another thread unless it makes a function/os api/library call that requires servicing from another thread.
I'm not super familer with OSx, but after glancing at the documentation, OSX is a "soft" realtime operating system. This means that technically what you want can't be guaranteed. The OS may decide that there is "Something" more important than your process that NEEDS to be done.
OSX does however allow you to specify a Real-time process which means the OS will make every effort to honor your request to not be interrupted and will only do so if it deems absolutely necessary.
Mac OS X Scheduling documentation provides examples on how to set up real-time threads

OSX is not an RTOS, so the question is mistitled and mistagged.
In a true RTOS you can lock the scheduler, disable interrupts or raise the task to the highest priority (with round-robin scheduling disabled if other tasks share that priority) to prevent preemption - although only interrupt disable will truly prevent preemption by interrupt handlers. In a GPOS, even if it has a priority scheme, that normally only controls the number of timeslices allowed to a process in what is otherwise round-robin scheduling, and does not prevent preemption.
One approach is to make many repeated tests and take the smallest value obtained, since that is likely to be the one where the fewest pre-emptions occurred. It will help also to set the process to the highest priority in order to minimise the number of preemtions. But bear in mind on a GPOS many interrupts from devices such as the mouse, keyboard, and system clock will occur and consume a small (an possibly negligible) amount of time.

Fastest method to wait under thread contention

I'm using pthread on Linux. I have a circular buffer to pass data from one thread to another. Maybe the circular buffer is not the best structure to use here, but changing that would not make my problem go away, so we'll just refer it as a queue.
Whenever my queue is either full or empty, pop/push operations return NULL. This is problematic since my threads fire periodically. Waiting for another thread loop would take too long.
I've tried using semaphores (sem_post, sem_wait) but unlocking under contention takes up to 25 ms, which is about the speed of my loop. I've tried waiting with pthread_cond_t, but the unlocking takes up to between 10 and 15 ms.
Is there a faster mechanism I could use to wait for data?
EDIT*
Ok I used condition variables. I'm on an embedded device so adding "more cores or cpu power" is not an option. This made me realise I had all sorts of thread priorities set all over the place so I'll sort this out before going further

You should use condition variables. The only faster ways are platform-specific, and they're only negligibly faster.
You're seeing what you think is poor performance simply because your threads are being de-scheduled. You're seeing long "delays" when your thread is near the end of its timeslice and the scheduler allows the unblocked thread to pre-empt the running thread. If you have more cores than threads or set your thread to a higher priority, you won't see these delays.
But these delays are actually a good thing, and you shouldn't be concerned about them. Other threads just get a chance to run too.

Could someone explain this interesting behaviour with Sleep(1)?

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA

Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager

Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.

The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.

To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.

I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js