windows timing drift of performancecounter C++ - c++

I am using QueryPerformanceCounter() function for measuring time in my application. This function also used in order to supply time line of my application life. Recently I have noticed that there is a drift of the time regarding other time-functions. Finally, I have written a small test to check whether the drift is real or not. (I use VS2013 compiler)
#include <Windows.h>
#include <chrono>
#include <thread>
#include <cstdio>
static LARGE_INTEGER s_freq;
using namespace std::chrono;
inline double now_WinPerfCounter()
{
LARGE_INTEGER tt;
if (TRUE != ::QueryPerformanceCounter(&tt))
{
printf("Error in QueryPerformanceCounter() - Err=%d\n", GetLastError());
return -1.;
}
return (double)tt.QuadPart / s_freq.QuadPart;
}
inline double now_WinTick64()
{
return (double)GetTickCount64() / 1000.;
}
inline double now_WinFileTime()
{
FILETIME ft;
::GetSystemTimeAsFileTime(&ft);
long long * pVal = reinterpret_cast<long long *>(&ft);
return (double)(*pVal) / 10000000. - 11644473600LL;
}
int _tmain(int argc, _TCHAR* argv[])
{
if (TRUE != ::QueryPerformanceFrequency(&s_freq))
{
printf("Error in QueryPerformanceFrequency() - Err=#d\n", GetLastError());
return -1;
}
// save all timetags at the beginning
double t1_0 = now_WinPerfCounter();
double t2_0 = now_WinTick64();
double t3_0 = now_WinFileTime();
steady_clock::time_point t4_0 = steady_clock::now();
for (int i = 0;; ++i) // forever
{
double t1 = now_WinPerfCounter();
double t2 = now_WinTick64();
double t3 = now_WinFileTime();
steady_clock::time_point t4 = steady_clock::now();
printf("%03d\t %.3lf %.3lf %.3lf %.3lf \n",
i,
t1 - t1_0,
t2 - t2_0,
t3 - t3_0,
duration_cast<nanoseconds>(t4 - t4_0).count() * 1.e-9
);
std::this_thread::sleep_for(std::chrono::seconds(10));
}
return 0;
}
The output was, confusing :
000 0.000 0.000 0.000 0.000
...
001 10.001 10.000 10.002 10.002
...
015 150.006 150.010 150.010 150.010
...
024 240.009 240.007 240.015 240.015
...
025 250.010 250.007 250.015 250.015
...
026 260.010 260.007 260.016 260.016
...
070 700.027 700.039 700.041 700.041
Why is there a difference ? It looks like one second duration is not same when using different API functions ? Also , during a day the difference is not constant ...

This is normal, affordable clocks never have infinite precision. GetSystemTime() and GetTickCount() are periodically re-calibrated from an Internet time server, time.windows.com by default. Which uses an unaffordable atomic clock to keep time. Adjustments to catch up or slow down are gradual in order to avoid giving software a heart-attack. The time they report will be off from the Earth's rotation by at most a few seconds, the periodic re-calibration limits long-term drift error.
But QPF is not calibrated, its resolution is far too high to allow these kind of adjustments to not affect the short interval measurements it is normally used for. It is derived from a frequency source available on the chipset, picked by the system builder. Subject to typical electronic part tolerances, temperature in particular affects accuracy and causes variable drift over longer periods. QPF is therefore only suitable to measure relatively short intervals.
Inevitably both clocks can't be the same and you will see them drift apart. You'll have to pick one or the other to be the "master" clock. If long term accuracy is important then you'll have to pick the calibrated source, if high resolution but not accuracy is important then QPF. If both are important then you need custom hardware, like a GPS clock.

Edit: The main reason for the drift you observe is an inaccuracy of the performance counter frequency. The system provides you with a constant value returned by QueryPerformanceFrequency. This constant frequency is only near the true frequency. It is accompanied by an offset and possible drift. Modern Platforms (Windows > 7, invariant TSC) have little offset and little drift.
Example: A 5 ppm offset would cause the time to move forward by 5 us/s or 432 ms/day faster than expected if you use the performance counter frequency to scale performance counter values to times.
General:
QueryPerformanceCounterand GetSystemTimeAsFileTime are using different resources (hardware). Modern platforms derive QueryPerformanceCounter from the CPUs timestamp counter (TSC) and GetSystemTimeAsFileTime uses PIT timer, ACPI PM timer, or HPET hardware. See Intel 64® and IA-32 Architectures Software Developer's Manual, Volume 3B: System Programming Guide, Part 2 for details.
It is unavoidable to deal with drift when the two APIs are using different hardware. However, you may extend your test code to calibrate the drift.
Frequency generating hardware is typically temperature sensitive. Therefore, the drift may vary, depending on the load. See
The Windows Timestamp Project for details.

Probably, different functions measure time in different ways.
QueryPerformanceCounter - is a high-resolution time stamps or measure time intervals.
GetSystemTimeAsFileTime - retrieves the current system date and time in UTC format.
So this is not quite correct to use this functions and compare time from them.

It's probably a bad idea to use multiple methods of the measurement; the different functions have different resolutions, you'll effectively be seeing noise from values getting rounded to their precision. And of course, a small amount of time will pass while setting t1..t4 as the functions take time to execute, so drift is unavoidable.

You may want to look at GetSystemTimePreciseAsFileTime API. It is both high-precision and can adjust periodically from external time servers.

Related

I'm looking to improve or request my current delay / sleep method. c++

Currently I am coding a project that requires precise delay times over a number of computers. Currently this is the code I am using I found it on a forum. This is the code below.
{
LONGLONG timerResolution;
LONGLONG wantedTime;
LONGLONG currentTime;
QueryPerformanceFrequency((LARGE_INTEGER*)&timerResolution);
timerResolution /= 1000;
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
wantedTime = currentTime / timerResolution + ms;
currentTime = 0;
while (currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
currentTime /= timerResolution;
}
}
Basically the issue I am having is this uses alot of CPU around 16-20% when I start to call on the function. The usual Sleep(); uses Zero CPU but it is extremely inaccurate from what I have read from multiple forums is that's the trade-off when you trade accuracy for CPU usage but I thought I better raise the question before I set for this sleep method.
The reason why it's using 15-20% CPU is likely because it's using 100% on one core as there is nothing in this to slow it down.
In general, this is a "hard" problem to solve as PCs (more specifically, the OSes running on those PCs) are in general not made for running real time applications. If that is absolutely desirable, you should look into real time kernels and OSes.
For this reason, the guarantee that is usually made around sleep times is that the system will sleep for atleast the specified amount of time.
If you are running Linux you could try using the nanosleep method (http://man7.org/linux/man-pages/man2/nanosleep.2.html) Though I don't have any experience with it.
Alternatively you could go with a hybrid approach where you use sleeps for long delays, but switch to polling when it's almost time:
#include <thread>
#include <chrono>
using namespace std::chrono_literals;
...
wantedtime = currentTime / timerResolution + ms;
currentTime = 0;
while(currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
currentTime /= timerResolution;
if(currentTime-wantedTime > 100) // if waiting for more than 100 ms
{
//Sleep for value significantly lower than the 100 ms, to ensure that we don't "oversleep"
std::this_thread::sleep_for(50ms);
}
}
Now this is a bit race condition prone, as it assumes that the OS will hand back control of the program within 50ms after the sleep_for is done. To further combat this you could turn it down (to say, sleep 1ms).
You can set the Windows timer resolution to minimum (usually 1 ms), to make Sleep() accurate up to 1 ms. By default it would be accurate up to about 15 ms. Sleep() documentation.
Note that your execution can be delayed if other programs are consuming CPU time, but this could also happen if you were waiting with a timer.
#include <timeapi.h>
// Sleep() takes 15 ms (or whatever the default is)
Sleep(1);
TIMECAPS caps_;
timeGetDevCaps(&caps_, sizeof(caps_));
timeBeginPeriod(caps_.wPeriodMin);
// Sleep() now takes 1 ms
Sleep(1);
timeEndPeriod(caps_.wPeriodMin);

how to run Clock-gettime correctly in Vxworks to get accurate time

I am trying to measure time take by processes in C++ program with linux and Vxworks. I have noticed that clock_gettime(CLOCK_REALTIME, timespec ) is accurate enough (resolution about 1 ns) to do the job on many Oses. For a portability matter I am using this function and running it on both Vxworks 6.2 and linux 3.7.
I ve tried to measure the time taken by a simple print:
#define <timers.h<
#define <iostream>
#define BILLION 1000000000L
int main(){
struct timespec start, end; uint32_t diff;
for(int i=0; i<1000; i++){
clock_gettime(CLOCK_REALTME, &start);
std::cout<<"Do stuff"<<std::endl;
clock_gettime(CLOCK_REALTME, &end);
diff = BILLION*(end.tv_sec-start.tv_sec)+(end.tv_nsec-start.tv_nsec);
std::cout<<diff<<std::endl;
}
return 0;
}
I compiled this on linux and vxworks. For linux results seemed logic (average 20 µs). But for Vxworks, I ve got a lot of zeros , then 5000000 ns , then a lot of zeros...
PS , for vxwroks, I runned this app on ARM-cortex A8, and results seemed random
have anyone seen the same bug before,
In vxworks, the clock resolution is defined by the system scheduler frequency. By default, this is typically 60Hz, however may be different dependant on BSP, kernel configuration, or runtime configuration.
The VxWorks kernel configuration parameters SYS_CLK_RATE_MAX and SYS_CLK_RATE_MIN define the maximum and minimum values supported, and SYS_CLK_RATE defines the default rate, applied at boot.
The actual clock rate can be modified at runtime using sysClkRateSet, either within your code, or from the shell.
You can check the current rate by using sysClkRateGet.
Given that you are seeing either 0 or 5000000ns - which is 5ms, I would expect that your system clock rate is ~200Hz.
To get greater resolution, you can increase the system clock rate. However, this may have undesired side effects, as this will increase the frequency of certain system operations.
A better method of timing code may be to use sysTimestamp which is typically driven from a high frequency timer, and can be used to perform high-res timing of short-lived activities.
I think in vxworks by default the clock resolution is 16.66ms which you can get by calling clock_getres() function. You can change the resolution by calling sysclkrateset() function(max resolution supported is 200us i guess by passing 5000 as argument to sysclkrateset function). You can then calculate the difference between two timestamps using difftime() function

Why does Sleep() slow down subsequent code for 40ms?

I originally asked about this at coderanch.com, so if you've tried to assist me there, thanks, and don't feel obliged to repeat the effort. coderanch.com is mostly a Java community, though, and this appears (after some research) to really be a Windows question, so my colleagues there and I thought this might be a more appropriate place to look for help.
I have written a short program that either spins on the Windows performance counter until 33ms have passed, or else calls Sleep(33). The former exhibits no unexpected effects, but the latter appears to (inconsistently) slow subsequent processing for about 40ms (either that, or it has some effect on the values returned from the performance counter for that long). After the spin or Sleep(), the program calls a routine, runInPlace(), that spins for 2ms, counting the number of times it queries the performance counter, and returning that number.
When the initial 33ms delay is done by spinning, the number of iterations of runInPlace() tends to be (on my Windows 10, XPS-8700) about 250,000. It varies, probably due to other system overhead, but it varies smoothing around 250,000.
Now, when the initial delay is done by calling Sleep(), something strange happens. A lot of the calls to runInPlace() return a number near 250,000, but quite a few of them return a number near 50,000. Again, the range varies around 50,000, fairly smoothly. But, it is clearly averaging one or the other, with nearly no returns anywhere between 80,000 and 150,000. If I call runInPlace() 100 times after each delay, instead of just once, it never returns a number of iterations in the smaller range after the 20th call. As runInPlace() runs for 2ms, this means the behavior I'm observing disappears after 40ms. If I have runInPlace() run for 4ms instead of 2ms, it never returns a number of iterations in the smaller range after the 10th call, so, again, the behavior disappears after 40ms (likewise if have runInPlace() run for only 1ms; the behavior disappears after the 40th call).
Here's my code:
#include "stdafx.h"
#include "Windows.h"
int runInPlace(int msDelay)
{
LARGE_INTEGER t0, t1;
int n = 0;
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
n++;
} while (t1.QuadPart - t0.QuadPart < msDelay);
return n;
}
int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER t0, t1;
LARGE_INTEGER frequency;
int n;
QueryPerformanceFrequency(&frequency);
int msDelay = 2 * frequency.QuadPart / 1000;
int spinDelay = 33 * frequency.QuadPart / 1000;
for (int i = 0; i < 100; i++)
{
if (argc > 1)
Sleep(33);
else
{
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
} while (t1.QuadPart - t0.QuadPart < spinDelay);
}
n = runInPlace(msDelay);
printf("%d \n", n);
}
getchar();
return 0;
}
Here's some output typical of what I get when using Sleep() for the delay:
56116
248936
53659
34311
233488
54921
47904
45765
31454
55633
55870
55607
32363
219810
211400
216358
274039
244635
152282
151779
43057
37442
251658
53813
56237
259858
252275
251099
And here's some output typical of what I get when I spin to create the delay:
276461
280869
276215
280850
188066
280666
281139
280904
277886
279250
244671
240599
279697
280844
159246
271938
263632
260892
238902
255570
265652
274005
273604
150640
279153
281146
280845
248277
Can anyone help me understand this behavior? (Note, I have tried this program, compiled with Visual C++ 2010 Express, on five computers. It only shows this behavior on the two fastest machines I have.)
This sounds like it is due to the reduced clock speed that the CPU will run at when the computer is not busy (SpeedStep). When the computer is idle (like in a sleep) the clock speed will drop to reduce power consumption. On newer CPUs this can be 35% or less of the listed clock speed. Once the computer gets busy again there is a small delay before the CPU will speed up again.
You can turn off this feature (either in the BIOS or by changing the "Minimum processor state" setting under "Processor power management" in the advanced settings of your power plan to 100%.
Besides what #1201ProgramAlarm said (which may very well be, modern processors are extremely fond of downclocking whenever they can), it may also be a cache warming up problem.
When you ask to sleep for a while the scheduler typically schedules another thread/process for the next CPU time quantum, which means that the caches (instruction cache, data cache, TLB, branch predictor data, ...) relative to your process are going to be "cold" again when your code regains the CPU.

Getting milliseconds accuracy current time in Qt

Qt documentation about QTime::currentTime() says :
Note that the accuracy depends on the accuracy of the underlying
operating system; not all systems provide 1-millisecond accuracy.
But is there any way to get this time with milliseconds accuracy in windows 7?
You can use QDateTime class and convert the current time with the appropriate format:
QDateTime::currentDateTime().toString("yyyy/MM/dd hh:mm:ss,zzz")
where 'z' corresponds to miliseconds accuracy.
you can use the functionality provided by time.h header file in C/C++.
#include <time.h>
clock_t start, end;
double cpu_time_used;
int main()
{
start = clock();
/* Do the work. */
end = clock();
cpu_time_used = ((double)(end-start)/ CLOCKS_PER_SEC);
}
Timer resolution may vary on different platforms and readings may not be accurate. If you need to get high-resolution, accurate timestamps on Windows 7, it provides QPC API:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx
GetSystemTimePreciseAsFileTime is claimed to provide system time with <1us resolution.
But that's only about accurate timestamp. If you need to actually do something with 1 ms latency (ex. handle an event), you need a RTOS, not a desktop clunker.
One common way would be to scale up whatever you are doing and do it 10-100 times in a row, that way you would be able get a more accurate time reading of whatever you are doing, by dividing the result by 10-100.
But getting millisecond precise readings of your time is pretty much useless because you don't have 100% of the cpu time, which means that your readings will have much greater variance than just 1 millisecond if the OS gives another process computing time while you are doing your actions.

Can i retrieve microseconds or very accurate milliseconds on c++ on windows?

So I made a game loop that uses SDL_Delay function to cap the frames per second, it look like this:
//While the user hasn't qui
while( stateID != STATE_EXIT )
{
//Start the frame timer
fps.start();
//Do state event handling
currentState->handle_events();
//Do state logic
currentState->logic();
//Change state if needed
change_state();
//Do state rendering
currentState->render();
//Update the screen
if( SDL_Flip( screen ) == -1 )
{
return 1;
}
//Cap the frame rate
if( fps.get_ticks() < 1000 / FRAMES_PER_SECOND )
{
SDL_Delay( ( 1000 / FRAMES_PER_SECOND ) - fps.get_ticks() );
}
}
So when I run my games on 60 frames per second (which is the "eye cap" I assume) I can still see laggy type of motion, meaning i see the frames appearing independently causing unsmooth motion.
This is because apparently SDL_Delay function is not too accurate, causing +,- 15 milliseconds or something difference between frames greater than whatever I want it to be.
(all these are just my assumptions)
so I am just searching fo a good and accurate timer that will help me with this problem.
any suggestions?
I think there is a similar question in How to make thread sleep less than a millisecond on Windows
But as a game programmer myself, I don't rely on sleep functions to manage frame-rate (the parameter they take is just a minimum). I just draw stuff on screen as fast as I can. I have a bunch of function calls in my game loop, and then I keep track of how often I'm calling them. For instance, I check input quite often (1000x/second) to make the game more responsive, but I don't check the network inbox more than 100x/second.
For example:
#define NW_CHECK_INTERVAL 10
#define INPUT_CHECK_INTERVAL 1
uint32_t last_nw_check = 0, last_input_check = 0;
while (game_running) {
uint32_t now = SDL_GetTicks();
if (now - last_nw_check > NW_CHECK_INTERVAL) {
check_network();
last_nw_check = now;
}
if (now - last_input_check > INPUT_CHECK_INTERVAL) {
check_input();
last_input_check = now;
}
check_video();
// and so on...
}
Use the QueryPerformanceCounter / Frequency for that.
LARGE_INTEGER start, end, tps; //tps = ticks per second
QueryPerformanceFrequency( &tps );
QueryPerformanceCounter( &start );
QueryPerformanceCounter( &end );
int usPassed = (end.QuadPart - start.QuadPart) * 1000000 / tps.QuadPart;
Here's a small wait function I had created for timing midi sequences using QueryPerformanceCounter:
void wait(int waitTime) {
LARGE_INTEGER time1, time2, freq;
if(waitTime == 0)
return;
QueryPerformanceCounter(&time1);
QueryPerformanceFrequency(&freq);
do {
QueryPerformanceCounter(&time2);
} while((time2.QuadPart - time1.QuadPart) * 1000000ll / freq.QuadPart < waitTime);
}
To convert ticks to microseconds, calculate the difference in ticks, multiply by 1,000,000 (microseconds/second) and divide by the frequency of ticks per second.
Note that some things may throw this off, for instance the precision of the high-resolution counter is not likely to be down to a single microsecond. For example, if you want to wait 10 microseconds and the precision/frequency is one tick every 6 microseconds, your 10 microsecond wait will actually be no less than 12 microseconds. Again, this frequency is system dependent and will vary from system to system.
Also, Windows is not a real-time operating system. A process may be preempted at any time and it is up to Windows to decide when the process is rescheduled. The application may be preempted in the middle of this function and not restarted again until long after the expected wait time has elapsed. There really isn't much you can do about it but you'll probably never notice it if it happens.
60 fame per second is just the frequency of power in US (50 in Europe, Africa and Asia are somehow mixed) and is the frequency of video refreshing for hardware comfortable reasons (It can be an integer multiple on more sophisticated monitors). It was a mandatory constrains for CRT dispaly, and it is still a comfortable reference for LCD (that's how frequently the frame buffer is uploaded to the display)
The eye-cap is no more than 20-25 fps - not to be confused with retina persistency, that's about one-half - and that's why TV interlace two squares upon every refresh.
independently on the timing accuracy, whatever hardware device cannot be updated during its buffer-scan (otherwise the image changes while it is shown, resulting in half-drawn broken frames), hence, if you go faster than one half of the device refresh you are queued behind it and forced to wait for it.
60 fps in a game loop serves only to help CPU manufacturers to sell new faster CPUs. Slow down under 25 and everything will look more fluid.
SDL_Delay:
This function waits a specified number of milliseconds before returning. It waits at least the specified time, but possible longer due to OS scheduling. The delay granularity is at least 10 ms. Some platforms have shorter clock ticks but this is the most common.
The actual delays observed with this function depend on OS settings. I'd suggest to look into the
Mutimedia Timer API, particulary into the timeBeginPeriod function, to adapt the interrupt frequency to your requirements.
Obtaining and Setting Timer Resolution shows an example how to change the interrupt period to about 1ms. This way you don't have the 15ms hickup anymore. BTW: Eye-catch period is about 40ms.
Obtaining fixed period timing can also be addressed by Waitable Timer Objects. But the use of mutimedia timers is mandatory to obtain decent resolution, no matter what.
Using other tools to improve the timing capabilities is discussed here.