Best way to slow down a thread? Is using Sleep() OK?

Best way to slow down a thread? Is using Sleep() OK? - c++

I've written a C++ library that does some seriously heavy CPU work (all of it math and calculations) and if left to its own devices, will easily consume 100% of all available CPU resources (it's also multithreaded to the number of available logical cores on the machine).
As such, I have a callback inside the main calculation loop that software using the library is supposed to call:
while(true)
{
//do math here
callback(percent_complete);
}
In the callback, the client calls Sleep(x) to slow down the thread.
Originally, the clientside code was a fixed Sleep(100) call, but this led to bad unreliable performance because some machines finish the math faster than others, but the sleep is the same on all machines. So now the client checks the system time, and if more than 1 second has passed (which == several iterations), it will sleep for half a second.
Is this an acceptable way of slowing down a thread? Should I be using a semaphore/mutex instead of Sleep() in order to maximize performance? Is sleeping x milliseconds for each 1 second of processing work fine or is there something wrong that I'm not noticing?
The reason I ask is that the machine still gets heavily bogged down even though taskman shows the process taking up ~10% of the CPU. I've already explored hard disk and memory contention to no avail, so now I'm wondering if the way I'm slowing down the thread is causing this problem.
Thanks!

Why don't you use a lower priority for the calculation threads? That will ensure other threads are scheduled while allowing your calculation threads to run as fast as possible if no other threads need to run.

What is wrong with the CPU at 100%? That's what you should strive for, not try to avoid. These math calculations are important, no? Unless you're trying to avoid hogging some other resource not explicitly managed by the OS (a mutex, the disk, etc) and used by the main thread, generally trying to slow your thread down is a bad idea. What about on multicore systems (which almost all systems will be, going forward)? You'd be slowing down a thread for absolutely no reason.
The OS has a concept of a thread quantum. It will take care of ensuring that no important thread on your system is starved. And, as I mentioned, on multicore systems spiking one thread on one CPU does not hurt performance for other threads on other cores at all.
I also see in another comment that this thread is also doing a lot of disk I/O - these operations will already cause your thread to yield while it's waiting for the results, so the sleeps will do nothing.
In general, if you're calling Sleep(x), there is something wrong/lazy with your design, and if x==0, you're opening yourself up to live locks (the thread calling Sleep(0) can actually be rescheduled immediately, making it a noop).

Sleep should be fine for throttling an app, which from your comments is what you're after. Perhaps you just need to be more precise how long you sleep for.
The only software in which I use a feature like this is the BOINC client. I don't know what mechanism it uses, but it's open-source and multi-platform, so help yourself.
It has a configuration option ("limit CPU use to X%"). The way I'd expect to implement that is to use platform-dependent APIs like clock() or GetSystemTimes(), and compare processor time against elapsed wall clock time. Do a bit of real work, check whether you're over or under par, and if you're over par sleep for a while to get back under.
The BOINC client plays nicely with priorities, and doesn't cause any performance issues for other apps even at 100% max CPU. The reason I use the throttle it is that otherwise, the client runs the CPU flat-out all the time, and drives up the fan speed and noise. So I run it at the level where the fan stays quiet. With better cooling maybe I wouldn't need it :-)

Another, not so elaborate, method could be to time one iteration and let the thread sleep for (x * t) milliseconds before the next iteration where t is the millisecond time for one iteration and x is the choosen sleep time fraction (between 0 and 1).

Have a look at cpulimit. It sends SIGSTOP and SIGCONT as required to keep a process below a given CPU usage percentage.
Even still, WTF at "crazy complaints and outlandish reviews about your software killing PC performance". I'd be more likely to complain that your software was slow and not making the best use of my hardware, but I'm not your customer.
Edit: on Windows, SuspendThread() and ResumeThread() can probably produce similar behaviour.

Related

Futex throughput on Linux

I have an async API which wraps some IO library. The library uses C style callbacks, the API is C++, so natural choice (IMHO) was to use std::future/std::promise to build this API. Something like std::future<void> Read(uint64_t addr, byte* buff, uint64_t buffSize). However, when I was testing the implementation I saw that the bottleneck is the future/promise, more precisely, the futex used to implement promise/future. Since the futex, AFAIK, is user space and the fastest mechanism I know to sync two threads, I just switched to use raw futexes, which somewhat improved the situation, but not something drastic. The performance floating somewhere around 200k futex WAKEs per second. Then I stumbled upon this article - Futex Scaling for Multi-core Systems which quite matches the effect I observe with futexes. My questions is, since the futex too slow for me, what is the fastest mechanism on Linux I can use to wake the waiting side. I dont need anything more sophisticated than binary semaphore, just to signal IO operation completion. Since IO operations are very fast (tens of microseconds) switching to kernel mode not an option. Busy wait not an option too, since CPU time is precious in my case.
Bottom line, user space, simple synchronization primitive, shared between two threads only, only one thread sets the completion, only one thread waits for completion.
EDIT001:
What if... Previously I said, no spinning in busy wait. But futex already spins in busy wait, right? But the implementation covers more general case, which requests global hash table, to hold the futexes, queues for all subscribers etc. Is it a good idea to mimic same behavior on some simple entity (like int), no locks, no atomics, no global datastructures and busy wait on it like futex already does?

In my experience, the bottleneck is due to linux's poor support for IPC. This probably isn't a multicore scaling issue, unless you have a large number of threads.
When one thread wakes another (by futex or any other mechanism), the system tries to run the 'wakee' thread immediately. But the waker thread is still running and using a core, so the system will usually put the wakee thread on a different core. If that core was previously idle, then the system will have to wake the core up from a power-down state, which takes some time. Any data shared between the threads must now be transferred between the cores.
Then, the waker thread will usually wait for a response from the wakee (it sounds like this is what you are doing). So it immediately goes to sleep, and puts its core to idle.
Then a similar thing happens again when the response comes. The continuous CPU wakes and migrations cause the slowdown. You may well discover that if you launch many instances of your process simultaneously, so that all your cores are busy, you see increased performance as the CPUs no longer have to wake up, and the threads may stop migrating between cores. You can get a similar performance increase if you pin the two threads to one core - it will do more than 1 million 'pings'/sec in this case.
So isn't there a way of saying 'put this thread to sleep and then wake that one'? Then the OS could run the wakee on the same core as the waiter? Well, Google proposed a solution to this with a FUTEX_SWAP api that does exactly this, but has yet to be accepted into the linux kernel. The focus now seems to be on user-space thread control via User Managed Concurrency Groups which will hopefully be able to do something similar. However at the time of writing this is yet to be merged into the kernel.
Without these changes to the kernel, as far as I can tell there is no way around this problem. 'You are on the fastest route'! UNIX sockets, TCP loopback, pipes all suffer from the same issue. Futexes have the lowest overhead, which is why they go faster than the others. (with TCP you get about 100k pings per sec, about half the speed of a futex impl). Fixing this issue in a general way would benefit a lot of applications/deployments - anything that uses connections to localhost could benefit.
(I did try a DIY approach where the waker thread pins the wakee thread to the same core that the waker is on, but if you don't want to to pin the waker, then every time you post the futex you need to pin the wakee to the current thread, and the system call to do this has too much overhead)

Reducing the CPU usage of a thread or process

There a bunch of other questions like this, but the only substantial answer I've seen is the one where you use SetPriorityClass to give priority to other processes. This is not what I want. I want to explicitly limit the CPU usage of my thread/process.
How can I do this?
Edit: I can't improve the efficiency of the process itself, because I'm not controlling it. I'm injecting my code into a game which I'd like to 'automate' in the background.

The best solution to limiting the cpu usage for a process or thread is to make sure that the thread or process uses less cpu.
That can best be done by improving the efficiency of the code, or by calling it less often.
The aim is to make sure that the process doesn't continually consume all of its available time slice.
Things to try:
Work out what is actually taking up all of the CPU. Optimize heavy processing areas - ideally with a change of algorithm.
Minimise polling wherever possible.
Try to rely on the operating system's ability to wake your process when necessary. eg. By waiting on files/sockets/fifos/mutexes/semaphores/message queue etc.
Have processes self regulate their processor usage. If your process is doing a lot of work in an endless loop insert a sched_yield() or sleep() after every N loops. If there are no other processes waiting for CPU usage then your process will get rescheduled almost immediately, but will allow the rest of the system to use cpu time when necessary.
Rearrange your processing to allow lower priority activities to be run when your process is at idle.
Carefully adjust thread or process priorities. But be aware, as #Mooing Duck has said, that by doing this you may just shift the CPU usage from one place to a different place without seeing an overall improvement.

How about issuing a sleep command at regular intervals?

Your question is broad -- I don't know what it's doing. You can certainly track the thread's I/O and force it to give up the cpu after a certain threshold is passed.

I ended up enumerating a list of threads, then having a 100ms timer that suspended the list of threads two out of every five iterations (which in theory reduces CPU usage by 40%).
Thanks for all the answers.

Linux, need accurate program timing. Scheduler wake up program

I have a thread running on a Linux system which i need to execute in as accurate intervals as possbile. E.g. execute once every ms.
Currently this is done by creating a timer with
timerfd_create(CLOCK_MONOTONIC, 0)
, and then passing the desired sleep time in a struct with
timerfd_settime (fd, 0, &itval, NULL);
A blocking read call is performed on this timer which halts thread execution and reports lost wakeup calls.
The problem is that at higher frequencies, the system starts loosing deadlines, even though CPU usage is below 10%. I think this is due to the scheduler not waking the thread often enough to check the blocking call. Is there a command i can use to tell the scheduler to wake the thread at certain intervals as far as it is possble?
Busy-waiting is a bad option since the system handles many other tasks.
Thank you.

You need to get RT linux*, and then increase the RT priority of the process that you want to wake up at regular intervals.
Other then that, I do not see problems in your code, and if your process is not getting blocked, it should work fine.
(*) RT linux - an os with some real time scheduling patches applied.

One way to reduce scheduler latency is to run your process using the realtime scheduler such as SCHED_FIFO. See sched_setscheduler .
This will generally improve latency a lot, but still theres little guarantee, to further reduce latency spikes, you'll need to move to the realtime brance of linux, or a realtime OS such as VxWorks, RTEMS or QNX.

You won't be able to do what you want unless you run it on an actual "Real Time OS".

If this is only Linux for x86 system I would choose HPET timer. I think all modern PCs has this hardware timer build in and it is very, very accurate. I allow you to define callback that will be called every millisecond and in this callback you can do your calculations (if they are simple) or just trigger other thread work using some synchronization object (conditional variable for example)
Here is some example how to use this timer http://blog.fpmurphy.com/2009/07/linux-hpet-support.html

Along with other advice such as setting the scheduling class to SCHED_FIFO, you will need to use a Linux kernel compiled with a high enough tick rate that it can meet your deadline.
For example, a kernel compiled with CONFIG_HZ of 100 or 250 Hz (timer interrupts per second) can never respond to timer events faster than that.
You must also set your timer to be just a little bit faster than you actually need, because timers are allowed to go beyond their requested time but never expire early, this will give you better results. If you need 1 ms, then I'd recommend asking for 999 us instead.

Can my thread help the OS decide when to context switch it out?

I am working on a threaded application on Linux in C++ which attempts to be real time, doing an action on a heartbeat, or as close to it as possible.
In practice, I find the OS is swapping out my thread and causing delays of up to a tenth of a second while it is switched out, causing the heartbeat to be irregular.
Is there a way my thread can hint to the OS that now is a good time to context switch it out? I could make this call right after doing a heartbeat, and thus minimize the delay due to an ill timed context switch.

It is hard to say what the main problem is in your case, but it is most certainly not something that can be corrected with a call to sched_yield() or pthread_yield(). The only well-defined use for yielding, in Linux, is to allow a different ready thread to preempt the currently CPU-bound running thread at the same priority on the same CPU under SCHED_FIFO scheduling policy. Which is a poor design decision in almost all cases.
If you're serious about your goal of "attempting to be real-time" in Linux, then first of all, you should be using a real-time sched_setscheduler setting (SCHED_FIFO or SCHED_RR, FIFO preferred).
Second, get the full preemption patch for Linux (from kernel.org if your distro does not supply one. It will also give you the ability to reschedule device driver threads and to execute your thread higher than, say, hard disk or ethernet driver threads.
Third, see RTWiki and other resources for more hints on how to design and set up a real-time application.
This should be enough to get you under 10 microseconds response time, regardless of system load on any decent desktop system. I have an embedded system where I only squeeze out 60 us response idle and 150 us under heavy disk/system load, but it's still orders of magnitude faster than what you're describing.

You can tell the current executing thread to pause execution with various commands such as yield.
Just telling the thread to pause is non-determanistic, 999 times it might provide good intervals and 1 time it doesn't.
You'll will probably want to look at real time scheduling for consistant results. This site http://www2.net.in.tum.de/~gregor/docs/pthread-scheduling.html seems to be a good starting spot for researching about thread scheduling.

use sched_yield
And fur threads there is an pthread_yield http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_yield.3.html

I'm a bit confused by the question. If your program is just waiting on a periodic heartbeat and then doing some work, then the OS should know to schedule other things when you go back to waiting on the heartbeat.
You aren't spinning on a flag to get your "heartbeat" are you?

You are using a timer function such as setitimer(), right? RIGHT???
If not, then you are doing it all wrong.
You may need to specify a timer interval that is just a little shorter than what you really need. If you are using a real-time scheduler priority and a timer, your process will almost always be woken up on time.
I would say always on time, but Linux isn't a perfect real-time OS yet.

I'm not too sure for Linux, but on Windows it's been explained that you can't ask the system to not interrupt you for several reasons (first paragraph mostly). Off my head, one of the reasons is hardware interrupts that can occur at any time and over which you have no control.
EDIT Some guy just suggested the use of sched_yield then deleted his answer. It'll relinquish time for your whole process though. You can also use sched_setscheduler to hint the kernel about what you need.

Allocate more processor cycles to my program

I've been working on win32, c,c++ for a while. I code on visual studio. Most of the time I see system idle process uses more cpu utilization. Is there a way to allocate more processor cycles to my program to run it faster? I understand there might be limitations from i/o, in those cases this question doesn't make any sense.
OR
did i misunderstood the task manager numbers? I'm in a confusion, please help me out.
And I want to do something in program itself, btw I will be happy if answers are specific to windows.
Thanks in advance
~calvin

If your program it the only program that has something to do (not wait for IO), its thread will always be assigned to a processor core.
However, if you have a multi-core processor, and a single-threaded program, the CPU usage of your process displayed in the task manager will always be limited by 100/Ncores.
For example, if you have a quad-core machine, your process will be at 25% (using one core), and the idle process at around 75%. You can only additional CPU power by dividing your tasks into chunks that can be worked on by separate threads which will then be run on the idle cores.

The idle process only "runs" when no other process needs to. If you want to use more CPU cycles, then use them.

If your program is idling, it doesn't do anything, i.e. there is nothing that could be done any faster. So the CPU is probably not the bottle-neck in your case.
Are you maybe waiting for data coming from the disk or network?
In case your processor has multiple cores and your program uses only one core to its full extent, making your program multi-threaded could work.

In a multitask / multithread OS the processor(s) time is splitted among threads.
If you want a specific thread to get bigger time chunk you can set its priority with the SetThreadPriority function, not wise to do it though.
Only special software (should) mess with those settings.
It's common for window applications to have a low cpu usage percent (which we see in the task manager)
because most of the time they just wait for messages.

Use threads to:
abstract away all the I/O waits.
assign work to all cores.
also, remove all sleep-wait states from main thread.
Defer all I/O to a thread, so that wait states are confined within it. Keep the actual computations in the foreground thread, and use synchronization mechanisms that make the I/O slave thread to wait for your main thread when communicating.
If your CPU is multi-core, and your problem is paralellizable, create as many threads as you have cores, research "set affinity" functions to assign them between the cores and still keep a separate thread for all I/O.
Also pay attention not to wait in your main thread - usleep(1) doesn't send you into background for 1 microsecond, but for "no less than..." and that may mean anything between 1ms and 100ms but hardly ever less than that, and never anything close to a microsecond.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js