I developed a class for calculations on multithreads and only one instance of this class is used by a thread. Also I want to measure the duration of calculations by iterating over a container of this class from another thread. The application is win32. The thing is I have read QueryPerformanceCounter is useful when comparing the measuremnts on a single thread. Because I can not use it my problem, I think of clock() or GetSystemTime(). It is sad that both methods have a 'resolution' of milliseconds (since CLOCKS_PER_SEC is 1000 on win32). Which method should I use or to generalize, is there a better option for me?
As a rule I have to take the measurements outside the working thread.
Here is some code as an example.
unsinged long GetCounter()
{
SYSTEMTIME ww;
GetSystemTime(&ww);
return ww.wMilliseconds + 1000 * ww.wSeconds;
// or
return clock();
}
class WorkClass
{
bool is_working;
unsigned long counter;
HANDLE threadHandle;
public:
DoWork()
{
threadHandle = GetCurrentThread();
is_working = true;
counter = GetCounter();
// Do some work
is_working = false;
}
};
void CheckDurations() // will work on another thread;
{
for(size_t i =0;i < vector_of_workClass.size(); ++i)
{
WorkClass & wc = vector_of_workClass[i];
if(wc.is_working)
{
unsigned long dur = GetCounter() - wc.counter;
ReportDuration(wc,dur);
if( dur > someLimitValue)
TerminateThread(wc.threadHandle);
}
}
}
QueryPerformanceCounter is fine for multithreaded applications. The processor instruction that may be used (rdtsc) can potentially provide invalid results when called on different processors.
I recommend reading "Game Timing and Multicore Processors".
For your specific application, the problem it appears you are trying to solve is using a timeout on some potentially long-running threads. The proper solution to this would be to use the WaitForMultipleObjects function with a timeout value. If the time expires, then you can terminate any threads that are still running - ideally by setting a flag that each thread checks, but TerminateThread may be suitable.
both methods have a precision of milliseconds
They don't. They have a resolution of a millisecond, the precision is far worse. Most machines increment the value only at intervals of 15.625 msec. That's a heckofalot of CPU cycles, usually not good enough to get any reliable indicator of code efficiency.
QPF does much better, no idea why you couldn't use it. A profiler is a the standard tool to measure code efficiency. Beats taking dependencies you don't want.
QueryPerformanceCounter should give you the best precision, but there is issues when the function get run on different processors (you get a different result for each processor). So when running in a thread you will experience shifts when the thread switch processor. To solve this you can set processor affinity for the thread that measures time.
GetSystemTime gets an absolute time, clock is a relative time but both measure elapsed time, not CPU time related to the actual thread/process.
Of course clock() is more portable. Having said that I use clock_gettime on Linux because I can get both elapsed and thread CPU time with that call.
boost has some time functions that you could use that will run on multiple platforms if you want platform independent code.
Related
I need to execute some function accurate in 20 milliseconds (for RTP packets sending) after some event. I have tried next variants:
std::this_thread::sleep_for(std::chrono::milliseconds(20));
boost::this_thread::sleep_for(std::chrono::milliseconds(20));
Sleep(20);
Also different perversions as:
auto a= GetTickCount();
while ((GetTickCount() - a) < 20) continue;
Also tried micro and nanoseconds.
All this methods have error in range from -6ms to +12ms but its not acceptable. How to make it work right?
My opinion, that +-1ms is acceptable, but no more.
UPDATE1: to measure time passed i use std::chrono::high_resolution_clock::now();
Briefly, because of how OS kernels manage time and threads, you won't get accuracy much better with that method. Also, you can't rely on sleep alone with a static interval or your stream will quickly drift off your intended send clock rate, because the thread could be interrupted or it could be scheduled again well after your sleep time... for this reason you should check the system clock to know how much to sleep for at each iteration (i.e. somewhere between 0ms and 20ms). Without going into too much detail, this is also why there's a jitter buffer in RTP streams... to account for variations in packet reception (due to network jitter or send jitter). Because of this, you likely won't need +/-1ms level accuracy anyway.
Using std::chrono::steady_clock, I got about 0.1ms accuracy on windows 7.
That is, simply:
auto a = std::chrono::steady_clock::now();
while ((std::chrono::steady_clock::now() - a) < WAIT_TIME) continue;
This should give you accurate "waiting" (about 0.1ms, as I said), at least. We all know that this kind of waiting is "ugly" and should be avoided, but it's a hack that might still do the trick just fine.
You could use high_resolution_clock, which might give even better accuracy for some systems, but it is not guaranteed not to be adjusted by the OS, and you don't want that. steady_clock is supposed to be guaranteed not to be adjusted, and often has the same accuracy as high_resolution_clock.
As for "sleep()" functions that are very accurate, I don't know. Perhaps someone else knows more about that.
In C we have a nanosleep function in time.h.
The nanosleep() function causes the current thread to be suspended from execution until either the time interval specified by the rqtp argument has elapsed or a signal is delivered to the calling thread and its action is to invoke a signal-catching function or to terminate the process.
This below program sleeps for 20 milli seconds.
int main()
{
struct timespec tim, tim2;
tim.tv_sec = 0;
tim.tv_nsec =20000000;//20 milliseconds converted to nano seconds
if(nanosleep(&tim , NULL) < 0 )
{
printf("Nano sleep system call failed \n");
return -1;
}
printf("Nano sleep successfull \n");
return 0;
}
Recently i've been trying to create a wait function that waits for 25 ms using the wall clock as reference. I looked around and found "gettimeofday", but i've been having problems with it. My code (simplified):
while(1)
{
timeval start, end;
double t_us;
bool release = false;
while (release == false)
{
gettimeofday(&start, NULL);
DoStuff();
{
gettimeofday(&end, NULL);
t_us = ( (end.tv_sec - start.tv_sec) * 1000*1000) + (end.tv_usec - start.tv_usec);
if (t_us >= 25000) //25 ms
{
release = true;
}
}
}
}
This code runs in a thread (Posix) and, on it's its own, works fine. DoStuff() is called every 25ms. It does however eat all the CPU if it can (as you might expect) so obviously this isn't a good idea.
When I tried throttling it by adding a Sleep(1); in the wait loop after the if statement, the entire thing slows by about 50% (that is, it called DoStuff every 37ms or so. This makes no sense to me - assuming DoStuff and any other threads complete their tasks in under (25 - 1) ms the called rate of DoStuff shouldn't be affected (allowing a 1ms error margin)
I also tried Sleep(0), usleep(1000) and usleep(0) but the behaviour is the same.
The same behaviour occurs whenever another higher priority thread needs CPU time (without the sleep). It's as if the clock stops counting when the thread reliqnuishes runtime.
I'm aware that gettimeofday is vulnerable to things like NTP updates etc... so I tried using clock_gettime but linking -ltr on my system causes problems so i don't think that is an option.
Does anyone know what i'm doing wrong?
The part that's missing here is how the kernel does thread scheduling based on time slices. In rough numbers, if you sleep at the beginning of your time slice for 1ms and the scheduling is done on a 35ms clock rate, your thread may not execute again for 35ms. If you sleep for 40ms, your thread may not execute again for 70ms. You can't really change that without changing the scheduling, but that's not recommended due to overall performance implications of the system. You could use a "high-resolution" timer, but often that's implemented in a tight cycle-wasting loop of "while it's not time yet, chew CPU" so that's not really desirable either.
If you used a high-resolution clock and queried it frequently inside of your DoStuff loop, you could potentially play some tricks like run for 30ms, then do a sleep(1) which could effectively relinquish your thread for the remainder of your timeslice (e.g. 5ms) to let other threads run. Kind of a cooperative/preemptive multitasking if you will. It's still possible you don't get back to work for an extended period of time though...
All variants of sleep()/usleep() involve yielding the CPU to other runnable tasks. Your programm can then run only after it is rescheduled by the kernel, which seems to last about 37 ms in your case.
I'm using QueryPerformanceCounter to do some timing in my application. However, after running it for a few days the application seems to stop functioning properly. If I simply restart the application it starts working again. This makes me a believe I have an overflow problem in my timing code.
// Author: Ryan M. Geiss
// http://www.geisswerks.com/ryan/FAQS/timing.html
class timer
{
public:
timer()
{
QueryPerformanceFrequency(&freq_);
QueryPerformanceCounter(&time_);
}
void tick(double interval)
{
LARGE_INTEGER t;
QueryPerformanceCounter(&t);
if (time_.QuadPart != 0)
{
int ticks_to_wait = static_cast<int>(static_cast<double>(freq_.QuadPart) * interval);
int done = 0;
do
{
QueryPerformanceCounter(&t);
int ticks_passed = static_cast<int>(static_cast<__int64>(t.QuadPart) - static_cast<__int64>(time_.QuadPart));
int ticks_left = ticks_to_wait - ticks_passed;
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
if (ticks_passed >= ticks_to_wait)
done = 1;
if (!done)
{
// if > 0.002s left, do Sleep(1), which will actually sleep some
// steady amount, probably 1-2 ms,
// and do so in a nice way (cpu meter drops; laptop battery spared).
// otherwise, do a few Sleep(0)'s, which just give up the timeslice,
// but don't really save cpu or battery, but do pass a tiny
// amount of time.
if (ticks_left > static_cast<int>((freq_.QuadPart*2)/1000))
Sleep(1);
else
for (int i = 0; i < 10; ++i)
Sleep(0); // causes thread to give up its timeslice
}
}
while (!done);
}
time_ = t;
}
private:
LARGE_INTEGER freq_;
LARGE_INTEGER time_;
};
My question is whether the code above should work deterministically for weeks of running continuously?
And if not where the problem is? I thought the overflow was handled by
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
But maybe thats not enough?
EDIT: Please observe that I did not write the original code, Ryan M. Geiss did, the link to the original source of the code is in the code.
QueryPerformanceCounter is notorious for its unreliability. It's fine to use for individual short-interval timing, if you're prepared to handle abnormal results. It is not exact - It's typically based on the PCI bus frequency, and a heavily loaded bus can lead to lost ticks.
GetTickCount is actually more stable, and can give you 1ms resolution if you've called timeBeginPeriod. It will eventually wrap, so you need to handle that.
__rdtsc should not be used, unless you're profiling and have control of which core you're running on and are prepared to handle variable CPU frequency.
GetSystemTime is decent for longer periods of measurements, but will jump when the system time is adjusted.
Also, Sleep(0) does not do what you think it does. It will yield the cpu if another context wants it - otherwise it'll return immediately.
In short, timing on windows is a mess. One would think that today it'd be possible to get accurate long-term timing from a computer without going through hoops - but this isn't the case. In our game framework we're using several time sources and corrections from the server to ensure all connected clients have the same game time, and there's a lot of bad clocks out there.
Your best bet would likely be to just use GetTickCount or GetSystemTime, wrap it into something that adjusts for time jumps/wrap arounds.
Also, you should convert your double interval to an int64 milliseconds and then use only integer math - this avoids problems due to floating point types' varying accuracy based on their contents.
Based on your comment, you probably should be using Waitable Timers instead.
See the following examples:
Using Waitable Timer Objects
Using Waitable Timers with an Asynchronous Procedure Call
Performance counters are 64-bit, so they are large enough for years of running continuously. For example, if you assume the performance counter increments 2 billion times each second (some imaginary 2 GHz processor) it will overflow in about 290 years.
Using a nanosecond-scale timer to control something like Sleep() that at best is precise to several milliseconds (and usually, several dozen milliseconds) is somewhat controversary anyway.
A different approach you might consider would be to use WaitForSingleObject or a similar function. This burns less CPU cycles, causes a trillion fewer context switches over the day, and is more reliable than Sleep(0), too.
You could for example create a semapore and never touch it in normal operation. The semaphore exists only so you can wait on something, if you don't have anything better to wait on. Then you can specify a timeout in milliseconds up to 49 days long with a single syscall. And, it will not only be less work, it will be much more accurate too.
The advantage is that if "something happens", so you want to break up earlier than that, you only need to signal the semaphore. The wait call will return instantly, and you will know from the WAIT_OBJECT_0 return value that it was due to being signaled, not due to time running out. And all that without complicated logic and counting cycles.
The problem you asked about most directly:
if (t.QuadPart < time_.QuadPart)
should instead be this:
if (t.QuadPart - time_.QuadPart < 0)
The reason for that is that you want to look for wrapping in relative time, not absolute time. Relative time will wrap (1ull<<63) time units after the reference call to QPC. Absolute time might wrap (1ull<<63) time units after reboot, but it could wrap at any other time it felt like it, that's undefined.
QPC is a little bugged on some systems (older RDTSC-based QPCs on early multicore CPUs, for instance) so it may be desirable to allow small negative time deltas like so:
if (t.QuadPart - time_.QuadPart < -1000000) //time wrap
An actual wrap will produce a very large negative time deltas, so that's safe. It shouldn't be necessary on modern systems, but trusting microsoft is rarely a good idea.
...
However, the bigger problem there with time wrapping is in the fact that ticks_to_wait, ticks_passed, and ticks_left are all int, not LARGE_INT or long long like they should be. This makes most of that code wrap if any significant time periods are involved - and "significant" in this context is platform dependent, it can be on the order of 1 second in a few (rare these days) cases, or even less on some hypothetical future system.
Other issues:
if (time_.QuadPart != 0)
Zero is not a special value there, and should not be treated as such. My guess is that the code is conflating QPC returning a time of zero with QPCs return value being zero. The return value is not the 64 bit time passed by pointer, it's the BOOL that QPC actually returns.
Also, that loop of Sleep(0) is foolish - it appears to be tuned to behave correctly only on a particular level of contention and a particular per-thread CPU performance. If you need resolution that's a horrible idea, and if you don't need resolution then that entire function should have just been a single call to Sleep.
I am running some profiling tests, and usleep is an useful function. But while my program is sleeping, this time does not appear in the profile.
eg. if I have a function as :
void f1() {
for (i = 0; i < 1000; i++)
usleep(1000);
}
With profile tools as gprof, f1 does not seems to consume any time.
What I am looking is a method nicer than an empty while loop for doing an active sleep, like:
while (1) {
if (gettime() == whatiwant)
break;
}
What kind of a system are you on? In UNIX-like systems you can use setitimer() to send a signal to a process after a specified period of time. This is the facility you would need to implement the type of "active sleep" you're looking for.
Set the timer, then loop until you receive the signal.
Because when you call usleep the CPU is put to work to something else for 1 second. So the current thread does not use any processor resources, and that's a very clever thing to do.
An active sleep is something to absolutely avoid because it's a waste of resources (ultimately damaging the environment by converting electricity to heat ;) ).
Anyway if you really want to do that you must give some real work to do to the processor, something that will not be factored out by compiler optimizations. For example
for (i = 0; i < 1000; i++)
time(NULL);
I assume you want to find out the total amount of time (wall-clock time, real-world time, the time you are sitting watching your app run) f1() is taking, as opposed to CPU time. I'd investigate to see if gprof can give you a wall-clock-time instead of a processing-time.
I imagine it depends upon your OS, but the reason you aren't seeing usleep as taking any process time in the profile is because it technically isn't using any during that time - other running processes are (assuming this is running on a *nix platform).
for (int i = i; i < SOME_BIG_NUMBER; ++i);
The entire point in "sleep" functions is that your application is not running. It is put in a sleep queue, and the OS transfers control to another process. If you want your application to run, but do nothing, an empty loop is a simple solution. But you lose all the benefits of sleep (letting other applications run, saving CPU usage/power consumption)
So what you're asking makes no sense. You can't have your application sleep, but still be running.
AFAIK the only option is to do a while loop. The operating system generally assumes that if you want to wait for a period of time that you will want to be yielding to the operating system.
Being able to get a microsecond accurate timer is also a potential issue. AFAIK there isn't a cross-platform way of doing timing (please someone correct me on this because i'd love a cross-platform sub-microsecond timer! :D). Under Win32, You could surround a loop with some QueryPerformanceCounter calls to work out when you have spent enough time in the loop and then exit.
e.g
void USleepEatCycles( __int64 uSecs )
{
__int64 frequency;
QueryPerformanceFrequency( (LARGE_INTEGER*)&frequency );
__int64 counter;
QueryPerformanceCounter( (LARGE_INTEGER*)&counter );
double dStart = (double)counter / (double)frequency;
double dEnd = dStart;
while( (dEnd - dStart) < uSecs )
{
QueryPerformanceCounter( (LARGE_INTEGER*)&counter );
dEnd = (double)counter / (double)frequency;
}
}
That's why it's important when profiling to look at the "Switched Out %" time. Basically, while your function's exclusive time may be little, if it performs e.g. I/O, DB, etc, waiting for external resources, then "Switched Out %" is the metric to watch out.
This is the kind of confusion you get with gprof, since what you care about is wall-clock time. I use this.
On Windows I have a problem I never encountered on Unix. That is how to get a thread to sleep for less than one millisecond. On Unix you typically have a number of choices (sleep, usleep and nanosleep) to fit your needs. On Windows, however, there is only Sleep with millisecond granularity.
On Unix, I can use the use the select system call to create a microsecond sleep which is pretty straightforward:
int usleep(long usec)
{
struct timeval tv;
tv.tv_sec = usec/1000000L;
tv.tv_usec = usec%1000000L;
return select(0, 0, 0, 0, &tv);
}
How can I achieve the same on Windows?
This indicates a mis-understanding of sleep functions. The parameter you pass is a minimum time for sleeping. There's no guarantee that the thread will wake up after exactly the time specified. In fact, threads don't "wake up" at all, but are rather chosen for execution by the OS scheduler. The scheduler might choose to wait much longer than the requested sleep duration to activate a thread, especially if another thread is still active at that moment.
As Joel says, you can't meaningfully 'sleep' (i.e. relinquish your scheduled CPU) for such short periods. If you want to delay for some short time, then you need to spin, repeatedly checking a suitably high-resolution timer (e.g. the 'performance timer') and hoping that something of high priority doesn't pre-empt you anyway.
If you really care about accurate delays of such short times, you should not be using Windows.
Use the high resolution multimedia timers available in winmm.lib. See this for an example.
#include <Windows.h>
static NTSTATUS(__stdcall *NtDelayExecution)(BOOL Alertable, PLARGE_INTEGER DelayInterval) = (NTSTATUS(__stdcall*)(BOOL, PLARGE_INTEGER)) GetProcAddress(GetModuleHandle("ntdll.dll"), "NtDelayExecution");
static NTSTATUS(__stdcall *ZwSetTimerResolution)(IN ULONG RequestedResolution, IN BOOLEAN Set, OUT PULONG ActualResolution) = (NTSTATUS(__stdcall*)(ULONG, BOOLEAN, PULONG)) GetProcAddress(GetModuleHandle("ntdll.dll"), "ZwSetTimerResolution");
static void SleepShort(float milliseconds) {
static bool once = true;
if (once) {
ULONG actualResolution;
ZwSetTimerResolution(1, true, &actualResolution);
once = false;
}
LARGE_INTEGER interval;
interval.QuadPart = -1 * (int)(milliseconds * 10000.0f);
NtDelayExecution(false, &interval);
}
Works very well for sleeping extremely short times. Remember though that at a certain point the actual delays will never be consistent because the system can't maintain consistent delays of such a short time.
Yes, you need to understand your OS' time quantums. On Windows, you won't even be getting 1ms resolution times unless you change the time quantum to 1ms. (Using for example timeBeginPeriod()/timeEndPeriod()) That still won't really guarantee anything. Even a little load or a single crappy device driver will throw everything off.
SetThreadPriority() helps, but is quite dangerous. Bad device drivers can still ruin you.
You need an ultra-controlled computing environment to make this ugly stuff work at all.
Generally a sleep will last at least until the next system interrupt occurs. However, this
depends on settings of the multimedia timer resources. It may be set to something close to
1 ms, some hardware even allows to run at interrupt periods of 0.9765625 (ActualResolution provided by NtQueryTimerResolution will show 0.9766 but that's actually wrong. They just can't put the correct number into the ActualResolution format. It's 0.9765625ms at 1024 interrupts per second).
There is one exception wich allows us to escape from the fact that it may be impossible to sleep for less than the interrupt period: It is the famous Sleep(0). This is a very powerful
tool and it is not used as often as it should! It relinquishes the reminder of the thread's time slice. This way the thread will stop until the scheduler forces the thread to get cpu service again. Sleep(0) is an asynchronous service, the call will force the scheduler to react independent of an interrupt.
A second way is the use of a waitable object. A wait function like WaitForSingleObject() can wait for an event. In order to have a thread sleeping for any time, also times in the microsecond regime, the thread needs to setup some service thread which will generate an event at the desired delay. The "sleeping" thread will setup this thread and then pause at the wait function until the service thread will set the event signaled.
This way any thread can "sleep" or wait for any time. The service thread can be of big complexity and it may offer system wide services like timed events at microsecond resolution. However, microsecond resolution may force the service thread to spin on a high resolution time service for at most one interrupt period (~1ms). If care is taken, this can
run very well, particulary on multi-processor or multi-core systems. A one ms spin does not hurt considerably on multi-core system, when the affinity mask for the calling thread and the service thread are carefully handled.
Code, description, and testing can be visited at the Windows Timestamp Project
As several people have pointed out, sleep and other related functions are by default dependent on the "system tick". This is the minimum unit of time between OS tasks; the scheduler, for instance, will not run faster than this. Even with a realtime OS, the system tick is not usually less than 1 ms. While it is tunable, this has implications for the entire system, not just your sleep functionality, because your scheduler will be running more frequently, and potentially increasing the overhead of your OS (amount of time for the scheduler to run, vs. amount of time a task can run).
The solution to this is to use an external, high-speed clock device. Most Unix systems will allow you to specify to your timers and such a different clock to use, as opposed to the default system clock.
What are you waiting for that requires such precision? In general if you need to specify that level of precision (e.g. because of a dependency on some external hardware) you are on the wrong platform and should look at a real time OS.
Otherwise you should be considering if there is an event you can synchronize on, or in the worse case just busy wait the CPU and use the high performance counter API to measure the elapsed time.
If you want so much granularity you are in the wrong place (in user space).
Remember that if you are in user space your time is not always precise.
The scheduler can start your thread (or app), and schedule it, so you are depending by the OS scheduler.
If you are looking for something precise you have to go:
1) In kernel space (like drivers)
2) Choose an RTOS.
Anyway if you are looking for some granularity (but remember the problem with user space ) look to
QueryPerformanceCounter Function and QueryPerformanceFrequency function in MSDN.
Actually using this usleep function will cause a big memory/resource leak. (depending how often called)
use this corrected version (sorry can't edit?)
bool usleep(unsigned long usec)
{
struct timeval tv;
fd_set dummy;
SOCKET s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
FD_ZERO(&dummy);
FD_SET(s, &dummy);
tv.tv_sec = usec / 1000000ul;
tv.tv_usec = usec % 1000000ul;
bool success = (0 == select(0, 0, 0, &dummy, &tv));
closesocket(s);
return success;
}
I have the same problem and nothing seems to be faster than a ms, even the Sleep(0). My problem is the communication between a client and a server application where I use the _InterlockedExchange function to test and set a bit and then I Sleep(0).
I really need to perform thousands of operations per second this way and it doesn't work as fast as I planned.
Since I have a thin client dealing with the user, which in turn invokes an agent which then talks to a thread, I will move soon to merge the thread with the agent so that no event interface will be required.
Just to give you guys an idea how slow this Sleep is, I ran a test for 10 seconds performing an empty loop (getting something like 18,000,000 loops) whereas with the event in place I only got 180,000 loops. That is, 100 times slower!
Try using SetWaitableTimer...
Like everybody mentioned, there is indeed no guarantees about the sleep time.
But nobody wants to admit that sometimes, on an idle system, the usleep command can be very precise. Especially with a tickless kernel. Windows Vista has it and Linux has it since 2.6.16.
Tickless kernels exists to help improve laptops batterly life: c.f. Intel's powertop utility.
In that condition, I happend to have measured the Linux usleep command that respected the requested sleep time very closely, down to half a dozen of micro seconds.
So, maybe the OP wants something that will roughly work most of the time on an idling system, and be able to ask for micro second scheduling!
I actually would want that on Windows too.
Also Sleep(0) sounds like boost::thread::yield(), which terminology is clearer.
I wonder if Boost-timed locks have a better precision. Because then you could just lock on a mutex that nobody ever releases, and when the timeout is reached, continue on...
Timeouts are set with boost::system_time + boost::milliseconds & cie (xtime is deprecated).
If your goal is to "wait for a very short amount of time" because you are doing a spinwait, then there are increasing levels of waiting you can perform.
void SpinOnce(ref Int32 spin)
{
/*
SpinOnce is called each time we need to wait.
But the action it takes depends on how many times we've been spinning:
1..12 spins: spin 2..4096 cycles
12..32: call SwitchToThread (allow another thread ready to go on time core to execute)
over 32 spins: Sleep(0) (give up the remainder of our timeslice to any other thread ready to run, also allows APC and I/O callbacks)
*/
spin += 1;
if (spin > 32)
Sleep(0); //give up the remainder of our timeslice
else if (spin > 12)
SwitchTothread(); //allow another thread on our CPU to have the remainder of our timeslice
else
{
int loops = (1 << spin); //1..12 ==> 2..4096
while (loops > 0)
loops -= 1;
}
}
So if your goal is actually to wait only for a little bit, you can use something like:
int spin = 0;
while (!TryAcquireLock())
{
SpinOne(ref spin);
}
The virtue here is that we wait longer each time, eventually going completely to sleep.
Just use Sleep(0). 0 is clearly less than a millisecond. Now, that sounds funny, but I'm serious. Sleep(0) tells Windows that you don't have anything to do right now, but that you do want to be reconsidered as soon as the scheduler runs again. And since obviously the thread can't be scheduled to run before the scheduler itself runs, this is the shortest delay possible.
Note that you can pass in a microsecond number to your usleep, but so does void usleep(__int64 t) { Sleep(t/1000); } - no guarantees to actually sleeping that period.
Sleep function that is way less than a millisecond-maybe
I found that sleep(0) worked for me. On a system with a near 0% load on the cpu in task manager, I wrote a simple console program and the sleep(0) function slept for a consistent 1-3 microseconds, which is way less than a millisecond.
But from the above answers in this thread, I know that the amount sleep(0) sleeps can vary much more wildly than this on systems with a large cpu load.
But as I understand it, the sleep function should not be used as a timer. It should be used to make the program use the least percentage of the cpu as possible and execute as frequently as possible. For my purposes, such as moving a projectile across the screen in a videogame much faster than one pixel a millisecond, sleep(0) works, I think.
You would just make sure the sleep interval is way smaller than the largest amount of time it would sleep. You don't use the sleep as a timer but just to make the game use the minimum amount of cpu percentage possible. You would use a separate function that has nothing to do is sleep to get to know when a particular amount of time has passed and then move the projectile one pixel across the screen-at a time of say 1/10th of a millisecond or 100 microseconds.
The pseudo-code would go something like this.
while (timer1 < 100 microseconds) {
sleep(0);
}
if (timer2 >=100 microseconds) {
move projectile one pixel
}
//Rest of code in iteration here
I know the answer may not work for advanced issues or programs but may work for some or many programs.
If the machine is running Windows 10 version 1803 or later then you can use CreateWaitableTimerExW with the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag.
On Windows the use of select forces you to include the Winsock library which has to be initialized like this in your application:
WORD wVersionRequested = MAKEWORD(1,0);
WSADATA wsaData;
WSAStartup(wVersionRequested, &wsaData);
And then the select won't allow you to be called without any socket so you have to do a little more to create a microsleep method:
int usleep(long usec)
{
struct timeval tv;
fd_set dummy;
SOCKET s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
FD_ZERO(&dummy);
FD_SET(s, &dummy);
tv.tv_sec = usec/1000000L;
tv.tv_usec = usec%1000000L;
return select(0, 0, 0, &dummy, &tv);
}
All these created usleep methods return zero when successful and non-zero for errors.