I've looked around a fair amount and can't seem to find what I'm looking for, but let me first stress that I'm not looking for a high-precision sleep function.
Here's the background for the problem I'm trying to solve:
I've made a memory mapping library that operates a lot like a named pipe. You can put bytes into it, get bytes out of it, and query how many bytes are available to read/write, all that good stuff.
It's fast (mostly) processes communicating using it will average at 4GB/s if they're passing chunks of bytes 8KBs or larger. Performance goes down to around 300MB/s as you approach 512B chunk size.
The problem:
Very occasionally, on heavily loaded servers, very large lag times will occur (Upwards of 5s). My running theory for the cause of this issue is that when large transfers are taking place (larger than the size of the mapped memory), the process that's writing data will tight poll to wait for more space to be available in the circular buffer that's implemented on top of the memory map. There are no calls to sleep, so the polling process could be hogging the CPU for no good reason! The issue is that even the smallest call to sleep (1ms) would absolutely demolish performance. The memmap size is 16KB, so if it slept for 1ms every 16KB, performance would drop to a best-case scenario of 16MB/s.
The solution:
I want a function that I can call that will relinquish the CPU, but makes no limitations on when it gets rescheduled by the operating system (Windows 7 in this case).
Has anyone got any bright alternatives?/Does anyone know if such a function exists?
Thanks.
According to the MSDN documentation, on XP or newer, when you call Sleep with a timeout of 0 will yield to other processes of equal priority.
A value of zero causes the thread to relinquish the remainder of its
time slice to any other thread of equal priority that is ready to
run. If there are no other threads of equal priority ready to run, the
function returns immediately, and the thread continues execution.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspx
Another option that will require more work but that will work more reliably would be to share an event handle between the producer and consumer process. You can use CreateEvent to create your event and DuplicateHandle to get it into your other process. As the producer fills the buffer, it will call ResetEvent on the event handle and call WaitForSingleObject with it. When the consumer has removed some data from the full shared buffer, it will call SetEvent, which will wake the producer which was waiting in WaitForSingleObject.
std::this_thread::yield() probably does what you want. I believe it just calls Sleep with 0 in most implementations.
You need the SwitchToThread() function (which will only relinquish its time slice if something else can run), not Sleep(0) (which would relinquish its time slice even if nothing else can run).
If you're writing code that's designed to take advantage of hyperthreading, YieldProcessor might do something for you too, but I doubt that'll be helpful.
You're incorrectly assuming a binary choice. You now are always busy-waiting because sleep always would be a bad idea.
The better solution is to try a few times without sleeping. If that still fails (because the map is full, and the other thread isn't running), then you can issue a true sleep. This will be sufficiently rare that on average you'll be sleeping microseconds. You could even check the realtime clock (RDTSC) to determine how long you've spent busy-waiting before surrendering your timeslice.
If you're operating under .Net, you can look into the Thread::Yield() method.
It may or may not help with your specific scenario but it's the correct way notify the scheduler that you want to relinquish the remainder of your timeslice.
If you're running in a pre-.Net environment (seems unlikely if you're on Windows 7), you can look into the Win32 SwitchToThread() function instead.
Related
I've created a multi-threaded application using C++ and POSIX threads. In which I should now block a thread (main thread) until a boolean flag is set (becomes true).
I've found two ways to get this done.
Spinning through a loop without sleep.
while(!flag);
Spinning through a loop with sleep.
while(!flag){
sleep(some_int);
}
If I should follow the first way, why do some people write codes following the second way? If the second way should be used, why should we make current thread to sleep? And what are disadvantages of this way?
The first option (a "busy wait") wastes an entire core for the duration of the wait, preventing other useful work being done and/or wasting energy.
The second option is less wasteful - your waiting thread uses very little CPU and allows other threads to run. But it is still wasteful to keep switching back to the thread to check the flag.
Far better than either would be to use a condition variable, which allows the waiting thread to block without consuming any resources until it is able to proceed.
while(flag); will cause your thread to use all of its allocated time checking the condition. This wastes a lot of CPU cycles checking something which has likely not changed.
Sleeping for a bit causes the thread to pause and give up the CPU to programs that actually need it.
You shouldn't do either though; you should use a threading library to create a flag object and call its wait function, so that the kernel will pause the thread until the flag is set.
The first way (just the plain while) is wasting resources, specifically the processor time of your process.
When a thread is put into sleep, OS may decide that the processor will be used for different tasks when talking about systems with preemptive multitasking. In theory, if you had as many processors / cores as threads, there would not have to be any difference.
If a solution is good or not depends on the operating system used, and sometimes architecture the program is running on. You should consult your syscall reference to find out more about this.
I know that Sleep() is not accurate, but is there's a way to make it not sleep for more than 10 ms (i.e. only sleep between 1 ms and 10 ms)? Or does Sleep(1) already guarantee that?
If you really want guaranteed timings, you will not be using Windows at all.
To answer your question, Sleep() does not provide any means of guaranteeing an upper bound on the sleep time.
In windows, this is because Sleep() relinquishes the threads's time slice, and it is not guaranteed that the system scheduler will schedule the sleeping thread (i.e. allocate another time slice) to execute immediately after the sleep time is up. That depends on priorities of competing threads, scheduling policies, and things like that.
In practice, the actual sleep interval depends a lot on what other programs are running on the system, configuration of the system, whether other programs are accessing slow drives, etc etc.
With a lightly loaded system, it is a fair bet Sleep(1) will sleep between 1 and 2 ms on any modern (GHz frequency CPU or better). However, it is not impossible for your program to experience greater delays.
With a heavily loaded system (lots of other programs executing, using CPU and timer resources), it is a fair bet your program will experience substantially greater delays than 1ms, and even more than 10ms.
In short: no guarantees.
There is no way to guarantee it.
This is what real time OS are for.
In general case if your OS doesn't experience high loads sleep will be pretty accurate but as you increase load on it the more inaccurate it will get.
No. Or, yes, depending on your perspective.
According to the documentation:
After the sleep interval has passed, the thread is ready to run. If
you specify 0 milliseconds, the thread will relinquish the remainder
of its time slice but remain ready. Note that a ready thread is not
guaranteed to run immediately. Consequently, the thread may not run
until some time after the sleep interval elapses. For more
information, see Scheduling Priorities.
What this means is that the problem isn't Sleep. Rather, when Sleep ends, your thread may still need to wait to become active again.
You cannot count on 10 milliseconds, that's too low. Sleep() accuracy is affected by:
The clock tick interrupt frequency. In general, the processor tends to be in a quiescent state, not consuming any power and turned off by the HLT instruction. It is dead to the world, unaware that time is passing and unaware that your sleep interval has expired. A periodic hardware interrupt generated by the chipset wakes it up and makes it pay attention again. By default, this interrupt is generated 64 times per second. Or once every 15.625 milliseconds.
The thread scheduler runs at every clock interrupt. It is the one that notices that your sleep interval has expired, it will put the thread back into the ready-to-run state. And boosts its priority so that it is more likely to acquire a processor core. It will do so when no other threads with higher priority are ready to run.
There isn't much you can do about the 2nd bullet, you have to compete with everybody else and take your fair share. If the thread does a lot of sleeping and little computation then it is not unreasonable to claim more than your fair share, call SetThreadPriority() to boost your base priority and make it more likely that your sleep interval is accurate. If that isn't good enough then the only way to claim a high enough priority that will always beat everybody else is by writing ring 0 code, a driver.
You can mess with the 1st bullet, it is pretty common to do so. Also the reason why many programmers think that the default accuracy is 10 msec. Or if they use Chrome that it might be 1 msec, that browser jacks up the interrupt rate sky-high. A fairly unreasonable thing to do, bad for power consumption, unless you are in the business of making your mobile operating system products look good :)
Call timeBeginPeriod before you need to make your sleep intervals short enough, timeEndPeriod() when you're done. Use NtSetTimerResolution() if you need to go lower than 1 msec.
Sleep won't guarantee that.
The only way I know of doing that is to have a thread wait for a fast timer event and free a synchronization object every 10 ms or so.
You will pass a semaphore to this "wait server task", and it will free it on the next timer tick, thus giving you a response time between 0 and 10 ms.
Of couse if you want an extreme precision you will have to boost this thread priority above other tasks that might preempt it, and at any rate you might still be preempted by system processes and/or interrupt handlers, which will add some noise to your timer.
While profiling my code to find what is going slow, I have 3functions that are taking forever apparently, well thats what very sleepy says.
These functions are:
ZwDelayExecution 20.460813 20.460813 19.987685 19.987685
MsgWaitForMultipleObjects 20.460813 20.460813 19.987685 19.987685
WaitForSingleObject 20.361805 20.361805 19.890967 19.890967
Can anybody tell me what these functions are? Why they are taking so long, and how to fix them.
Thanks
Probably that functions are used to make thread 'sleeping' in Win32 API. Also they might be used as thread synchronization so check these thing.
They are taking so much CPU time because they are designed for that.
The WaitForSingleObject function can wait for the following objects:
Change notification
Console input
Event
Memory resource notification
Mutex
Process
Semaphore
Thread
Waitable timer
So the other possible thing where it can be used for is console user input waiting.
ZwDelayExecution is an internal function of Windows. As it can be seen it is used to realize Sleep function. Here is call stack for Sleep function so you can see it with your own eyes:
0 ntdll.dll ZwDelayExecution
1 kernel32.dll SleepEx
2 kernel32.dll Sleep
It probaly uses Assembly low-level features to realize that so it can delay thread with precision of 100ns.
MsgWaitForMultipleObjects has a similar to WaitForSingleObject goal.
Judging on the names, all 3 functions seem to block, so they take a long time because they are designed to do so, but they shouldn't use any CPU while waiting.
One of the first steps should always be to check the documentation:
WaitForSingleObject:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms687032.aspx
Waits for an object like a thread, process, mutex.
MsgWaitForMultipleObjects:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms684242.aspx
Simply waits for multiple objects, just like WaitForSingleObject.
ZwDelayExecution:
There doesn't seem to be a documentation for ZwDelayExecution but I think that is an internal method which get's called when you call Sleep.
Anyway, the name already reveals part of it. "Wait" and "Delay"-functions are supposed to take time. If you want to reduce the waiting time you have to find out what is calling these functions.
To give you an example:
If you start a new thread and then wait for it to finish in your main thread, you will call WaitForSingleObject one way or another in WINAPI-programming. It doesn't even have to be you who is starting the thread - it could be the runtime itself. The function will wait until the thread finishes. Therefore it will take time and block the program in WaitForSingleObject until thread is done or a timeout occurs. This is nothing bad, this is intended behaviour.
Before you start zooming in on these functions, you might first want to determine what kind of slowness your program is suffering from. It is pretty normal for a Windows program to have one or more threads spending most of their time in blocking functions.
You would first need to determine whether your actual critical thread is CPU bound. In that case you don't want to zoom in on the functions that take a lot off wall clock time, you want to find those functions that take CPU time.
I don't have much experience with Very Sleepy, but IIRC it is a sampling profiler, and those are typically not so good at measuring CPU usage.
Only after you've determined that your program is not CPU bound, then you should zoom in on the functions that wait a lot.
I was testing how long a various win32 API calls will wait for when asked to wait for 1ms. I tried:
::Sleep(1)
::WaitForSingleObject(handle, 1)
::GetQueuedCompletionStatus(handle, &bytes, &key, &overlapped, 1)
I was detecting the elapsed time using QueryPerformanceCounter and QueryPerformanceFrequency. The elapsed time was about 15ms most of the time, which is expected and documented all over the Internet. However for short period of time the waits were taking about 2ms!!! It happen consistently for few minutes but now it is back to 15ms. I did not use timeBeginPeriod() and timeEndPeriod calls! Then I tried the same app on another machine and waits are constantly taking about 2ms! Both machines have Windows XP SP2 and hardware should be identical. Is there something that explains why wait times vary by so much? TIA
Thread.Sleep(0) will let any threads of the same priority execute. Thread.Sleep(1) will let any threads of the same or lower priority execute.
Each thread is given an interval of time to execute in, before the scheduler lets another thread execute. As Billy ONeal states, calling Thread.Sleep will give up the rest of this interval to other threads (subject to the priority considerations above).
Windows balances over threads over the entire OS - not just in your process. This means that other threads on the OS can also cause your thread to be pre-empted (ie interrupted and the rest of the time interval given to another thread).
There is an article that might be of interest on the topic of Thread.Sleep(x) at:
Priority-induced starvation: Why Sleep(1) is better than Sleep(0) and the Windows balance set manager
Changing the timer's resolution can be done by any process on the system, and the effect is seen globally. See this article on how the Hotspot Java compiler deals with times on windows, specifically:
Note that any application can change the timer interrupt and that it affects the whole system. Windows only allows the period to be shortened, thus ensuring that the shortest requested period by all applications is the one that is used. If a process doesn't reset the period then Windows takes care of it when the process terminates. The reason why the VM doesn't just arbitrarily change the interrupt rate when it starts - it could do this - is that there is a potential performance impact to everything on the system due to the 10x increase in interrupts. However other applications do change it, typically multi-media viewers/players.
The biggest thing sleep(1) does is give up the rest of your thread's quantum . That depends entirely upon how much of your thread's quantum remains when you call sleep.
To aggregate what was said before:
CPU time is assigned in quantums (time slices)
The thread scheduler picks the thread to run. This thread may run for the entire time slice, even if threads of higher priority become ready to run.
Typical time slices are 8..15ms, depending on architecture.
The thread can "give up" the time slice - typically Sleep(0) or Sleep(1). Sleep(0) allows another thread of same or hogher priority to run for the next time slice. Sleep(1) allows "any" thread.
The time slice is global and can be affected by all processes
Even if you don't change the time slice, someone else could.
Even if the time slice doesn't change, you may "jump" between the two different times.
For simplicity, assume a single core, your thread and another thread X.
If Thread X runs at the same priority as yours, crunching numbers, Your Sleep(1) will take an entire time slice, 15ms being typical on client systems.
If Thread X runs at a lower priority, and gives up its own time slice after 4 ms, your Sleep(1) will take 4 ms.
I would say it just depends on how loaded the cpu is, if there arent many other process/threads it could get back to the calling thread a lot faster.
I've written a C++ library that does some seriously heavy CPU work (all of it math and calculations) and if left to its own devices, will easily consume 100% of all available CPU resources (it's also multithreaded to the number of available logical cores on the machine).
As such, I have a callback inside the main calculation loop that software using the library is supposed to call:
while(true)
{
//do math here
callback(percent_complete);
}
In the callback, the client calls Sleep(x) to slow down the thread.
Originally, the clientside code was a fixed Sleep(100) call, but this led to bad unreliable performance because some machines finish the math faster than others, but the sleep is the same on all machines. So now the client checks the system time, and if more than 1 second has passed (which == several iterations), it will sleep for half a second.
Is this an acceptable way of slowing down a thread? Should I be using a semaphore/mutex instead of Sleep() in order to maximize performance? Is sleeping x milliseconds for each 1 second of processing work fine or is there something wrong that I'm not noticing?
The reason I ask is that the machine still gets heavily bogged down even though taskman shows the process taking up ~10% of the CPU. I've already explored hard disk and memory contention to no avail, so now I'm wondering if the way I'm slowing down the thread is causing this problem.
Thanks!
Why don't you use a lower priority for the calculation threads? That will ensure other threads are scheduled while allowing your calculation threads to run as fast as possible if no other threads need to run.
What is wrong with the CPU at 100%? That's what you should strive for, not try to avoid. These math calculations are important, no? Unless you're trying to avoid hogging some other resource not explicitly managed by the OS (a mutex, the disk, etc) and used by the main thread, generally trying to slow your thread down is a bad idea. What about on multicore systems (which almost all systems will be, going forward)? You'd be slowing down a thread for absolutely no reason.
The OS has a concept of a thread quantum. It will take care of ensuring that no important thread on your system is starved. And, as I mentioned, on multicore systems spiking one thread on one CPU does not hurt performance for other threads on other cores at all.
I also see in another comment that this thread is also doing a lot of disk I/O - these operations will already cause your thread to yield while it's waiting for the results, so the sleeps will do nothing.
In general, if you're calling Sleep(x), there is something wrong/lazy with your design, and if x==0, you're opening yourself up to live locks (the thread calling Sleep(0) can actually be rescheduled immediately, making it a noop).
Sleep should be fine for throttling an app, which from your comments is what you're after. Perhaps you just need to be more precise how long you sleep for.
The only software in which I use a feature like this is the BOINC client. I don't know what mechanism it uses, but it's open-source and multi-platform, so help yourself.
It has a configuration option ("limit CPU use to X%"). The way I'd expect to implement that is to use platform-dependent APIs like clock() or GetSystemTimes(), and compare processor time against elapsed wall clock time. Do a bit of real work, check whether you're over or under par, and if you're over par sleep for a while to get back under.
The BOINC client plays nicely with priorities, and doesn't cause any performance issues for other apps even at 100% max CPU. The reason I use the throttle it is that otherwise, the client runs the CPU flat-out all the time, and drives up the fan speed and noise. So I run it at the level where the fan stays quiet. With better cooling maybe I wouldn't need it :-)
Another, not so elaborate, method could be to time one iteration and let the thread sleep for (x * t) milliseconds before the next iteration where t is the millisecond time for one iteration and x is the choosen sleep time fraction (between 0 and 1).
Have a look at cpulimit. It sends SIGSTOP and SIGCONT as required to keep a process below a given CPU usage percentage.
Even still, WTF at "crazy complaints and outlandish reviews about your software killing PC performance". I'd be more likely to complain that your software was slow and not making the best use of my hardware, but I'm not your customer.
Edit: on Windows, SuspendThread() and ResumeThread() can probably produce similar behaviour.