Why does an empty loop use so much processor time? - c++

If I have an empty while loop in my code such as:
while(true);
It will drive the processor usage up to about 25%. However if I do the following:
while(true)
Sleep(1);
It will only use about 1%.
So why is that?
Update: Thanks for all the great replies, but I guess I really should have asked this question, What's the algorithm behind sleep()? which is more of want I wanted to know.

With the former, the condition true must be checked by the processor as often as the application can possibly get focus. As soon as the application gets processor attention, it checks true, just like any other loop condition, to see if the next instruction in the loop can be executed. Of course, the next instruction in the loop is also true. Basically, you are forcing the processor to constantly determine whether or not true is true, which, although trivial, when performed constantly bogs the processor down.
In the latter, the processor checks true once, and then executes the next statement. In this case, that is a 1ms wait. Since 1 ms is far greater than the amount of time required to check true, and the processor knows it can do other things during this wait, you free up a lot of power.

I am going to guess that you have four cores on your multicore processor, which would explain the 25%, as you are completely tying up one processor doing your busy loop, as the only break in it is when the application is delayed so another application can run (but that may not happen depending on the load).
When you put a thread to sleep then it allows the OS to do other operations, and it knows when, as a minimum, to come back and wake up the thread, so it can continue it's work.

The first continuously uses CPU operations. The latter switches the context of the currently running thread, putting is in sleep mode, thus allowing for other processes to be scheduled.

You have a quad-core machine, am I right? If so,
while(true);
is actually using 100% of one of your cores.
To the operating system, it seems your program has a lot of work to do. So the operating system lets the program go ahead with this work. It can't tell the difference between your program number crunching like crazy, and doing a useless infinite loop.
Sleep(1);
on the other hand explicitly tells the operating system that your have no work to do for the next millisecond. The OS will thus stop running your program and let other programs do work.

An empty loop isn't actually empty. A loop in itself is at least a comparison and a jump back to the comparison. Modern CPUs can do millions of these operations per second.
The sleep statement in the second loop relinquishes control to the operating system for at least 1 millisecond. In this state, the application is effectively halted and does not continue processing. The result of halting for x amount of time reduces the number of comparisons, and hence the % of cpu clock cycles the cpu can execute per second.
Concerning the 25%, Intel processors that support Hyperthreading or multi core processors might taint the performance statistics. The empty loop is effectively topping off at least one processor core.
Back in the day when multicore CPUs didn't exist, the users did have the need for multi processing/tasking. There are a couple of ways the illusion of running multiple processes at the same time was achieved.
One way was to design application in such a way that they needed to relinquish control to the system ever so often, as to let other processes run for a while. This was the case in the old Windows versions. If a given application was badly designed so that it didn't relinquish control, or got stuck in an endless loop, your entire PC effectively froze up.
Needless to say this wasn't the best way, and it was replaced by preemptive multitasking. Here a Programmable Interrupt Timer is instructed to interrupt the running process at a given interval to execute some scheduler code that lets the other processes have a go.

Basically, you've got several "process scheduler" states. I'll name three of them.
One: Ready
Two: Running
Three: Blocked
These states / queues only exist because of the limited number of cores on your processor. In Ready, processes are scheduled that are totally ready for execution. They don't have to wait for input, time, or whatever. On Running, processes actually "have" the processor and thus ARE running. State Blocked means your process is waiting for an event to happen before queueing for the processor.
When you keep on testing for while(true) you keep your process in the "ready" queue. Your process scheduler gives it a certain amount of time, and after a while, removes it from the processor (placing it on the back of the "ready" queue). And thus your process will keep coming back "on" the processor, keeping it busy.
When you execute a "sleep" statement, your process will not be scheduled on process until the prerequisity is fulfilled - in this particular case, as long as the time passed after the "sleep" command <= 1000 ms.

Sleep() is not really doing anything during the period that the thread is sleeping. It hands its time over to other processes to use. A loop on the other hand is continuously checking to see if the condition is true or false.

Because the Sleep is basically telling the processor to switch contexts and let some other programs get more CPU time.

The sleep in the second one is kinda like a 'yield' to the OS's process scheduler.

Because you're keeping the processor busy evaluating the loop over annd over.
Using Sleep actually lets other threads execute on the CPU and along with a very short context switch looks as if the CPU is free for a short while.

A cpu can do some billion operations per second. This means the empty loop runs mybe one million times per second. The loop with the sleep statement runs only 1000 times per second. In this case the cpu has some operations per second left to do other things.
Say we have a 3GHz cpu. 3Ghz = 3 000 000 000Hz - The cpu can run the loop three bilion times a second (simplyfied).
With the sleep statement the loop is executed 1000 times a second. This means the cpu load is
1000 / 3 000 000 000 * 100 = 0.0001%

Because that would run instructions all the time.
Normal programs don't madly run instructions all the time.
For instance, GUI programs just sit idle waiting for events, (such as keyboard input),
Note: sitting idle != while(true);
They only run instructions when events arrive, and the event-handling code is usually small and runs very quickly, (otherwise, the program will appear to be not-responding). Imagine your app gets 5 keystrokes per second, how much CPU time would it take?
That's why normal processes don't take that much CPU.
Now, earlier I said that sitting idle is not the same as an infinite empty loop. Why is that? Sitting idle means telling the OS that you don't have anything to run.
An infinite loops actually is something to run (a repeating jump instruction).
On the other hand, having nothing to run means basically that the OS won't even bother giving you any processor time at all, even if it's your turn.
Another example where programs sit idle is loading files: when you need to load a file, you basically send a signal to the disk and wait for it to find the data and load it into memory. While the data is being loaded (several milliseconds), the process just sits idle, doing nothing.
Yet another instance of a process sitting idle, is Sleep(1), here it's explicitly telling the OS not to give it any cpu-time before the specified time has passed.

Related

Is it really impossible to suspend two std/posix threads at the same time?

I want to briefly suspend multiple C++ std threads, running on Linux, at the same time.
It seems this is not supported by the OS.
The threads work on tasks that take an uneven and unpredictable amount of time (several seconds).
I want to suspend them when the CPU temperature rises above a threshold.
It is impractical to check for suspension within the tasks, only inbetween tasks.
I would like to simply have all workers suspend operation for a few milliseconds.
How could that be done?
What I'm currently doing
I'm currently using a condition variable in a slim, custom binary semaphore class (think C++20 Semaphore).
A worker checks for suspension before starting the next task by acquiring and immediately releasing the semaphore.
A separate control thread occupies the control semaphore for a few milliseconds if the temperature is too high.
This often works well and the CPU temperature is stable.
I do not care much about a slight delay in suspending the threads.
However, when one task takes some seconds longer than the others, its thread will continue to run alone.
This activates CPU turbo mode, which is the opposite of what I want to achieve (it is comparatively power inefficient, thus bad for thermals).
I cannot deactivate CPU turbo as I do not control the hardware.
In other words, the tasks take too long to complete.
So I want to forcefully pause them from outside.
I want to suspend them when the CPU temperature rises above a threshold.
In general, that is putting the cart before the horse.
Properly designed hardware should have adequate cooling for maximum load and your program should not be able to exceed that cooling capacity.
In addition, since you are talking about Turbo, we can assume an Intel CPU, which will thermally throttle all on their own, making your program run slower without you doing anything.
In other words, the tasks take too long to complete
You could break the tasks into smaller parts, and check the semaphore more often.
A separate control thread occupies the control semaphore for a few milliseconds
It's really unlikely that your hardware can react to millisecond delays -- that's too short a timescale for anything thermal. You will probably be better off monitoring the temperature and simply reducing the number of tasks you are scheduling when the temperature is rising and getting close to your limits.
I've now implemented it with pthread_kill and SIGRT.
Note that suspending threads in unknown state (whatever the target task was doing at the time of signal receipt) is a recipe for deadlocks. The task may be inside malloc, may be holding arbitrary locks, etc. etc.
If your "control thread" also needs that lock, it will block and you lose. Your control thread must execute only direct system calls, may not call into libc, etc. etc.
This solution is ~impossible to test, and ~impossible to implement correctly.

Supend/Resume Thread, Specificly execution on code by the procressor Linux

I'm working on a program that may be run under PBS with a specific CPU count, less than the total number of CPU's. I have previously that PBS enforces this limited by terminating the program if it exceeds the limit.
My program will use threads, however they will have several blocking commands, and I want to run other threads during this time, hence I would have the thread be suspended and placed in a queue while another thread would be resumed.
However Linux doesn't implement pthread_suspend and such. And the work around is to use mutexs however this works at low level by a loop that repeatably checks the state and locks it possible, I.e. it use cpu cycle even though the thread isn't running, hence it I use this method even though only N threads are running >N cpus may be being used and PBS would terminate the program.
Dose any one have a workaround that wouldn't cause this problem?
Ultimately my goal is to prevent my program from using more than N processor with out wasting processor time during blocking calls and solution to this (for Linux and Windows) I would be happy with.

Could someone explain this interesting behaviour with Sleep(1)?

I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA
Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager
Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.
The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.
To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.
I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.

Best way to slow down a thread? Is using Sleep() OK?

I've written a C++ library that does some seriously heavy CPU work (all of it math and calculations) and if left to its own devices, will easily consume 100% of all available CPU resources (it's also multithreaded to the number of available logical cores on the machine).
As such, I have a callback inside the main calculation loop that software using the library is supposed to call:
while(true)
{
//do math here
callback(percent_complete);
}
In the callback, the client calls Sleep(x) to slow down the thread.
Originally, the clientside code was a fixed Sleep(100) call, but this led to bad unreliable performance because some machines finish the math faster than others, but the sleep is the same on all machines. So now the client checks the system time, and if more than 1 second has passed (which == several iterations), it will sleep for half a second.
Is this an acceptable way of slowing down a thread? Should I be using a semaphore/mutex instead of Sleep() in order to maximize performance? Is sleeping x milliseconds for each 1 second of processing work fine or is there something wrong that I'm not noticing?
The reason I ask is that the machine still gets heavily bogged down even though taskman shows the process taking up ~10% of the CPU. I've already explored hard disk and memory contention to no avail, so now I'm wondering if the way I'm slowing down the thread is causing this problem.
Thanks!
Why don't you use a lower priority for the calculation threads? That will ensure other threads are scheduled while allowing your calculation threads to run as fast as possible if no other threads need to run.
What is wrong with the CPU at 100%? That's what you should strive for, not try to avoid. These math calculations are important, no? Unless you're trying to avoid hogging some other resource not explicitly managed by the OS (a mutex, the disk, etc) and used by the main thread, generally trying to slow your thread down is a bad idea. What about on multicore systems (which almost all systems will be, going forward)? You'd be slowing down a thread for absolutely no reason.
The OS has a concept of a thread quantum. It will take care of ensuring that no important thread on your system is starved. And, as I mentioned, on multicore systems spiking one thread on one CPU does not hurt performance for other threads on other cores at all.
I also see in another comment that this thread is also doing a lot of disk I/O - these operations will already cause your thread to yield while it's waiting for the results, so the sleeps will do nothing.
In general, if you're calling Sleep(x), there is something wrong/lazy with your design, and if x==0, you're opening yourself up to live locks (the thread calling Sleep(0) can actually be rescheduled immediately, making it a noop).
Sleep should be fine for throttling an app, which from your comments is what you're after. Perhaps you just need to be more precise how long you sleep for.
The only software in which I use a feature like this is the BOINC client. I don't know what mechanism it uses, but it's open-source and multi-platform, so help yourself.
It has a configuration option ("limit CPU use to X%"). The way I'd expect to implement that is to use platform-dependent APIs like clock() or GetSystemTimes(), and compare processor time against elapsed wall clock time. Do a bit of real work, check whether you're over or under par, and if you're over par sleep for a while to get back under.
The BOINC client plays nicely with priorities, and doesn't cause any performance issues for other apps even at 100% max CPU. The reason I use the throttle it is that otherwise, the client runs the CPU flat-out all the time, and drives up the fan speed and noise. So I run it at the level where the fan stays quiet. With better cooling maybe I wouldn't need it :-)
Another, not so elaborate, method could be to time one iteration and let the thread sleep for (x * t) milliseconds before the next iteration where t is the millisecond time for one iteration and x is the choosen sleep time fraction (between 0 and 1).
Have a look at cpulimit. It sends SIGSTOP and SIGCONT as required to keep a process below a given CPU usage percentage.
Even still, WTF at "crazy complaints and outlandish reviews about your software killing PC performance". I'd be more likely to complain that your software was slow and not making the best use of my hardware, but I'm not your customer.
Edit: on Windows, SuspendThread() and ResumeThread() can probably produce similar behaviour.

Allocate more processor cycles to my program

I've been working on win32, c,c++ for a while. I code on visual studio. Most of the time I see system idle process uses more cpu utilization. Is there a way to allocate more processor cycles to my program to run it faster? I understand there might be limitations from i/o, in those cases this question doesn't make any sense.
OR
did i misunderstood the task manager numbers? I'm in a confusion, please help me out.
And I want to do something in program itself, btw I will be happy if answers are specific to windows.
Thanks in advance
~calvin
If your program it the only program that has something to do (not wait for IO), its thread will always be assigned to a processor core.
However, if you have a multi-core processor, and a single-threaded program, the CPU usage of your process displayed in the task manager will always be limited by 100/Ncores.
For example, if you have a quad-core machine, your process will be at 25% (using one core), and the idle process at around 75%. You can only additional CPU power by dividing your tasks into chunks that can be worked on by separate threads which will then be run on the idle cores.
The idle process only "runs" when no other process needs to. If you want to use more CPU cycles, then use them.
If your program is idling, it doesn't do anything, i.e. there is nothing that could be done any faster. So the CPU is probably not the bottle-neck in your case.
Are you maybe waiting for data coming from the disk or network?
In case your processor has multiple cores and your program uses only one core to its full extent, making your program multi-threaded could work.
In a multitask / multithread OS the processor(s) time is splitted among threads.
If you want a specific thread to get bigger time chunk you can set its priority with the SetThreadPriority function, not wise to do it though.
Only special software (should) mess with those settings.
It's common for window applications to have a low cpu usage percent (which we see in the task manager)
because most of the time they just wait for messages.
Use threads to:
abstract away all the I/O waits.
assign work to all cores.
also, remove all sleep-wait states from main thread.
Defer all I/O to a thread, so that wait states are confined within it. Keep the actual computations in the foreground thread, and use synchronization mechanisms that make the I/O slave thread to wait for your main thread when communicating.
If your CPU is multi-core, and your problem is paralellizable, create as many threads as you have cores, research "set affinity" functions to assign them between the cores and still keep a separate thread for all I/O.
Also pay attention not to wait in your main thread - usleep(1) doesn't send you into background for 1 microsecond, but for "no less than..." and that may mean anything between 1ms and 100ms but hardly ever less than that, and never anything close to a microsecond.