Is QueryPerformanceFrequency accurate when using HPET? - c++

I'm playing around with QueryPerformanceFrequency.
It used to return 3.6 Mhz, but it was not enough for what I was trying to do.
I've enabled HPET using this command bcdedit /set useplatformclock true. Now it returns 14.3 Mhz. It's great it's more precise... excepted it's not. I quickly realized that I did not get the granularity I expected.
If I try to poll QueryPerformanceCounter until it ticks, the smallest increment I can get is 11, which means 1.27Mhz. If I try to count the number of different values I can get from QueryPerformanceCounter in one second, I get 1.26Mhz.
So I was wondering is there was a way to really use the 14.3 Mhz to their full extent ?
I'm using windows 7, 64 bit system, visual studio 2008.

Using the HPET hardware as a source for QueryPerformanceCounter (QPC) is known to be assosiated with large overheads.
QPC is an expensive call when configured with HPET.
It provides 14.3 MHz which suggests high accuracy but as you found, it can't be called fast enough to actually resolve that frequency.
Therefore Microsoft has turned into the CPUs time stamp counter (TSC) as a source for QPC whenever the hardware is capable doing so. TSC queries have much lower overhead. The associated frequency used for QPC is typically the CPU frequency divided by 1024; also typically a few MHz.
The call of QPC in TSC mode is so fast that a lot of consecutive calls may show the same result (typically approx. 20-30 calls or 15 - 20 ns/call).
This way you may obtain typical resolutions of approx. 0.3 us (on a 3.4 GHz CPU).
You observed 3.6 MHz before you switched to HPET. That's likely the signiture of the systems ACPI PM timer (3579545 Hz), which indicates that you were not operating on TSC based QPC before switching to HPET.
So either way, running HPET or ACPI PM timer results in a usable resoluion in the range of a few MHz. Both cannot expose the full resolution given by the performance counter frequency (PCF) because the call to QPC is too expensive. Only The TSC based QPC is fast enough and capable to actually oversample the QPC.
Microsoft has just recently released more detailed information about this matter:
See Acquiring high-resolution time stamps (MSDN 2014) for the details.
This is a comprehensive article with lots of examples and detailed description. A must read for users of QPC.
...a way to really use the 14.3 Mhz to their full extent ?
Unfortunately not.
You can run Coreinfo.exe utility from Windows Sysinternals. Sysinternals has moved to Microsoft technet. Here is the link: Sysinternals System Information Utilities. This will give you an answer to the question: How can I check if my system has a non-invariant TSC?
Summary: The best resolution/accuracy/granularty is obtained by QPC based on TSC.
BTW: The proper choice of hardware as resource for QPC also influences the call expense of the new GetSystemTimePreciseAsFileTime function (Windows 8 desktop and upwards) because it internally uses QPC.

Related

Similar functionality to CLOCK_MONOTONIC in Windows (and Windows Embedded CE)

I have some C++ Windows code which needs to compute time intervals. For that, it uses GetCurrentFT if it detects that is running in WinCE and GetSystemTimeAsFileTime for other Windows platforms. However, if I'm not mistaken, this might be vulnerable to system clock manipulations (i.e. someone changing the system clock would make the measured time intervals unreliable).
Is there something similar to UNIX's CLOCK_MONOTONIC for these platforms (both WinCE and the other Windows platforms) which would make use of a monotonically increasing counter and not the system clock?
std::chrono::steady_clock is monotonic and not vulnerable to system time changes. I don't know if Microsoft supports C++11 for WinCE though.
I did end up using GetTickCount(), which worked just fine.
As per Microsoft's documentation:
The number of milliseconds that have elapsed since the system was
started indicates success.
It is a monotonic counter, which is what I was looking for. The granularity of the returned value depends on the hardware, but seems to be enough for my purposes in the hardware I'm using.

Why is there no boost::date_time with microsec resolution on Windows?

On Win32 system boost::date_time::microsec_clock() is implemented using ftime, which provides only millisecond resolution: Link to doc
There are some questions/answers on Stackoverflow stating this and linking the documentation, but not explaining why that is the case:
Stackoverflow #1
Stackoverflow #2
There seemingly are ways to implement microsecond resolution on Windows:
GetSystemTimePreciseAsFileTime (Win8++)
QueryPerformanceCounter
What I'm interested in is why Boost implemented it that way, when in turn there are possibly solutions that would be more fitting?
QueryPerformanceCounter can't help you on this problem. It gives you a timestamp, but as you don't know when the counter starts there is no reliable way to calculate an absolute time point out of it. boost::date_time is such a (user-understandable) time point.
The other difference is that a counter like QueryPerformanceCounter gives you a steadily increasing timer, while the system time can be influenced by the user and can therefore jump.
So the 2 things are for different use cases. One for representing a real time, the other one for getting precise timing in the software and for benchmarking.
GetSystemTimePreciseAsFileTime seems to fit the bill for a high resolution absolute time. I guess it wasn't used because it requires Windows8.
GetSystemTimePreciseAsFileTime only became available with Windows 8 Desktop applications. It mimics Linuxes GetTimeOfDay. The implementation uses QueryPerformanceCounter to achieve the microsecond resolution. Timestamps are taken at the time of a system time increment. Subsequent calls to GetSystemTimePreciseAsFileTime will take the system time and add the elapsed "performance counter time" (elapsed ticks / performance counter frequency) as the high resolution part.
The functionallity of QueryPerformanceCounter again depends on platform specific details (HPET, ACPI PM timer, invariant TSC etc.). See MSDN: Acquiring high-resolution time stamps and SO: Is QueryPerformanceFrequency acurate when using HPET? for details.
The various versions of Windows do have specific schemes to update the system time. Windows XP has a fixed file time granularty which is independent of the systems timer resolution. Only post Windows XP versions allow to modify the system time granularity by changing the system timer resolution.
This can be accomplished by means of the multimedia timer API timeBeginPeriod and/or the hidden API NtSetTimerResolution (See this SO answer for more details about using `
timeBeginPeriod and NtSetTimerResolution).
As stated, GetSystemTimePreciseAsFileTime is only available for desktop applications. The reason for this is the need for specific hardware.
What I'm interested in is why Boost implemented it that way, when in turn there are possibly solutions that would be more fitting?
Taking the facts stated above will make the implementation very complex and the result very platform specific. Every (!) Windows version has undergone severe changes of time keeping. Even the latest small step from 8 to 8.1 has changed the time keeping procedure considerably. However, there is still room to further improve time matters on Windows.
I should mention that GetSystemTimePreciseAsFileTime is, as of Windows 8.1, not giving results as accurate as expected or specified at MSDN: GetSystemTimePreciseAsFileTime function. It combines the system file time with the result of QueryPerformanceCounter to fill the gap between consecutive file time increments but it does not take system time adjustments into account. An active system time adjustement, e.g. done by SetSystemTimeAdjustment, modifies the system time granularity and the progress of the system time. However, the used performance counter frequency to build the result of GetSystemTimePreciseAsFileTime is kept constant. As a result, the microseconds part is off by the adjustment gain set by SetSystemTimeAdjustment.

Finding out the CPU clock frequency (per core, per processor)

Programs like CPUz are very good at giving in depth information about the system (bus speed, memory timings, etc.)
However, is there a programmatic way of calculating the per core (and per processor, in multi processor systems with multiple cores per CPU) frequency without having to deal with CPU specific info.
I am trying to develop a anti cheating tool (for use with clock limited benchmark competitions) which will be able to record the CPU clock during the benchmark run for all the active cores in the system (across all processors.)
I'll expand on my comments here. This is too big and in-depth for me to fit in the comments.
What you're trying to do is very difficult - to the point of being impractical for the following reasons:
There's no portable way to get the processor frequency. rdtsc does NOT always give the correct frequency due to effects such as SpeedStep and Turbo Boost.
All known methods to measure frequency require an accurate measurement of time. However, a determined cheater can tamper with all the clocks and timers in the system.
Accurately reading either the processor frequency as well as time in a tamper-proof way will require kernel-level access. This implies driver signing for Windows.
There's no portable way to get the processor frequency:
The "easy" way to get the CPU frequency is to call rdtsc twice with a fixed time-duration in between. Then dividing out the difference will give you the frequency.
The problem is that rdtsc does not give the true frequency of the processor. Because real-time applications such as games rely on it, rdtsc needs to be consistent through CPU throttling and Turbo Boost. So once your system boots, rdtsc will always run at the same rate (unless you start messing with the bus speeds with SetFSB or something).
For example, on my Core i7 2600K, rdtsc will always show the frequency at 3.4 GHz. But in reality, it idles at 1.6 GHz and clocks up to 4.6 GHz under load via the overclocked Turbo Boost multiplier at 46x.
But once you find a way to measure the true frequency, (or you're happy enough with rdtsc), you can easily get the frequency of each core using thread-affinities.
Getting the True Frequency:
To get the true frequency of the processor, you need to access either the MSRs (model-specific registers) or the hardware performance counters.
These are kernel-level instructions and therefore require the use of a driver. If you're attempting this in Windows for the purpose of distribution, you will therefore need to go through the proper driver signing protocol. Furthermore, the code will differ by processor make and model so you will need different detection code for each processor generation.
Once you get to this stage, there are a variety of ways to read the frequency.
On Intel processors, the hardware counters let you count raw CPU cycles. Combined with a method of precisely measuring real time (next section), you can compute the true frequency. The MSRs give you access to other information such as the CPU frequency multiplier.
All known methods to measure frequency require an accurate measurement of time:
This is perhaps the bigger problem. You need a timer to be able to measure the frequency. A capable hacker will be able to tamper with all the clocks that you can use in C/C++.
This includes all of the following:
clock()
gettimeofday()
QueryPerformanceCounter()
etc...
The list goes on and on. In other words, you cannot trust any of the timers as a capable hacker will be able to spoof all of them. For example clock() and gettimeofday() can be fooled by changing the system clock directly within the OS. Fooling QueryPerformanceCounter() is harder.
Getting a True Measurement of Time:
All the clocks listed above are vulnerable because they are often derived off of the same system base clock in some way or another. And that system base clock is often tied to the system base clock - which can be changed after the system has already booted up by means of overclocking utilities.
So the only way to get a reliable and tamper-proof measurement of time is to read external clocks such as the HPET or the ACPI. Unfortunately, these also seem to require kernel-level access.
To Summarize:
Building any sort of tamper-proof benchmark will almost certainly require writing a kernel-mode driver which requires certificate signing for Windows. This is often too much of a burden for casual benchmark writers.
This has resulted in a shortage of tamper-proof benchmarks which has probably contributed to the overall decline of the competitive overclocking community in recent years.
I realise this has already been answered. I also realise this is basically a black art, so please take it or leave it - or offer feedback.
In a quest to find the clock rate on throttled (thanks microsft,hp, and dell) HyperV hosts (unreliable perf counter), and HyperV guests (can only get stock CPU speed, not current), I have managed, through trial error and fluke, to create a loop that loops exactly once per clock.
Code as follows - C# 5.0, SharpDev, 32bit, Target 3.5, Optimize on (crucial), no debuger active (crucial)
long frequency, start, stop;
double multiplier = 1000 * 1000 * 1000;//nano
if (Win32.QueryPerformanceFrequency(out frequency) == false)
throw new Win32Exception();
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(1);
const int gigahertz= 1000*1000*1000;
const int known_instructions_per_loop = 1;
int iterations = int.MaxValue;
int g = 0;
Win32.QueryPerformanceCounter(out start);
for( i = 0; i < iterations; i++)
{
g++;
g++;
g++;
g++;
}
Win32.QueryPerformanceCounter(out stop);
//normal ticks differs from the WMI data, i.e 3125, when WMI 3201, and CPUZ 3199
var normal_ticks_per_second = frequency * 1000;
var ticks = (double)(stop - start);
var time = (ticks * multiplier) /frequency;
var loops_per_sec = iterations / (time/multiplier);
var instructions_per_loop = normal_ticks_per_second / loops_per_sec;
var ratio = (instructions_per_loop / known_instructions_per_loop);
var actual_freq = normal_ticks_per_second / ratio;
Console.WriteLine( String.Format("Perf counhter freq: {0:n}", normal_ticks_per_second));
Console.WriteLine( String.Format("Loops per sec: {0:n}", loops_per_sec));
Console.WriteLine( String.Format("Perf counter freq div loops per sec: {0:n}", instructions_per_loop));
Console.WriteLine( String.Format("Presumed freq: {0:n}", actual_freq));
Console.WriteLine( String.Format("ratio: {0:n}", ratio));
Notes
25 instructions per loop if debugger is active
Consider running a 2 or 3 seconds loop before hand to spin up the processor (or at least attempt to spin up, knowing how heavily servers are throttled these days)
Tested on a 64bit Core2 and Haswell Pentium and compared against CPU-Z
One of the most simple ways to do it is using RDTSC, but seeing as this is for anti-cheating mechanisms, I'd put this in as a kernel driver or a hyper-visor resident piece of code.
You'd probably also need to roll your own timing code**, which again can be done with RDTSC (QPC as used in the example below uses RDTSC, and its in fact very simple to reverse engineer and use a local copy of, which means to tamper with it, you'd need to tamper with your driver).
void GetProcessorSpeed()
{
CPUInfo* pInfo = this;
LARGE_INTEGER qwWait, qwStart, qwCurrent;
QueryPerformanceCounter(&qwStart);
QueryPerformanceFrequency(&qwWait);
qwWait.QuadPart >>= 5;
unsigned __int64 Start = __rdtsc();
do
{
QueryPerformanceCounter(&qwCurrent);
}while(qwCurrent.QuadPart - qwStart.QuadPart < qwWait.QuadPart);
pInfo->dCPUSpeedMHz = ((__rdtsc() - Start) << 5) / 1000000.0;
}
** I this would be for security as #Mystical mentioned, but as I've never felt the urge to subvert low level system timing mechanisms, there might be more involved, would be nice if Mystical could add something on that :)
I've previously posted on this subject (along with a basic algorithm): here. To my knowledge the algorithm (see the discussion) is very accurate. For example, Windows 7 reports my CPU clock as 2.00 GHz, CPU-Z as 1994-1996 MHz and my algorithm as 1995025-1995075 kHz.
The algorithm performs a lot of loops to do this which causes the CPU frequency to increase to maximum (as it also will during benchmarks) so speed-throttling software won't come into play.
Additional info here and here.
On the question of speed-throttling I really don't see it as a problem unless an application uses the speed values to determine elapsed times and that the times themselves are extremely important. For example, if a division requires x clock cycles to complete it doesn't matter if the CPU is running at 3 GHz or 300 MHz: it will still need x clock cycles and the only difference is that it will complete the division in a tenth of the time at # 3 GHz.
You need to use CallNtPowerInformation. Here's a code sample from putil project.
With this you can get current and max CPU frequency. As far as I know it's not possible to get per-CPU frequency.
One should refer to this white paper: Intel® Turbo Boost Technology in Intel® Core™ Microarchitecture (Nehalem) Based Processors. Basically, produce several reads of the UCC fixed performance counter over a sample period T.
Relative.Freq = Delta(UCC) / T
Where:
Delta() = UCC # period T
- UCC # period T-1
Starting with Nehalem architecture, UCC increase and decrease the number of click ticks relatively to the Unhalted state of the core.
When SpeedStep or Turbo Boost are activated, the estimated frequency using UCC will be measured accordingly; while TSC remains constant. For instance, Turbo Boost in action reveals that Delta(UCC) is greater or equal to Delta(TSC)
Example in function Core_Cycle function at Cyring | CoreFreq GitHub.

How to realise long-term high-resolution timing on windows using C++?

I need to get exact timestamps every couple of ms (20, 30, 40ms) over a long period of time (a couple of hours). The function in which the timestamp is taken is invoked as a callback by a 3rd-party library.
Using GetSystemTime() one can get the correct system timestamp but only with milliseconds accuracy, which is not precise enough for me. Using QueryPerformanceTimer() yields more accurate timestamps but is not synchronous to the system timestamp over a long period of time (see http://msdn.microsoft.com/en-us/magazine/cc163996.aspx).
The solution provided at the site linked above somehow works only on older computers, it hangs while synchronizing when i try to use it with newer computers.
It seems to me like boost is also only working on milliseconds accuracy.
If possible, I'd like to avoid using external libraries, but if there's no other choice I'll go with it.
Any suggestions?
Deleted article from CodeProject, this seems to be the copy: DateTimePrecise C# Class The idea is to use QueryPerformanceCounter API for accurate small increments and periodically adjust it in order to keep long term accuracy. This is about to give microsecond accuracy ("about" because it's still not exactly precise, but still quite usable).
See also: Microsecond resolution timestamps on Windows
Which language are you using?
In Java (1.5 or above) I'd suggest 'System.nanoTime()' which requires no import.
Remember in Windows that time-slice granularity is 1000ms / 64 = 15.625ms.
This will affect inter-process communication, especially on uni-processor machines, or machines that run several heavy CPU usage processes 'concurrently'*.
In fact, I just got DOS 6.22 and Windows for Workgroups 3.11/3.15 via eBay, so I can screenshot the original timeslice configuration for uni-processor Windows machines of the era when I started to get into it. (Although it might not be visible in versions above 3.0).
You'll be hard pressed to find anything better than QueryPerformanceTimer() on Windows.
On modern hardware it uses the HPET as a source which replaces the RTC interrupt controller. I would expect QueryPerformanceTimer() and the System clock to be synchronous.
There is no such QueryPerformanceTimer() on windows. The resource is named QueryPerformanceCounter(). It provides a counter value counting at some higher frequency.
Its incrementing frequency can be retrieved by a call to QueryPerformanceFrequency().
Since this frequency is typically in the MHz range, microsecond resolution can be observed.
There are some implementations around, i.e. this thread or at the Windows Timestamp Project

Timers to measure latency

When measuring network latency (time ack received - time msg sent) in any protocol over TCP, what timer would you recommend to use and why? What resolution does it have? What are other advantages/disadvantages?
Optional: how does it work?
Optional: what timer would you NOT use and why?
I'm looking mostly for Windows / C++ solutions, but if you'd like to comment on other systems, feel free to do so.
(Currently we use GetTickCount(), but it's not a very accurate timer.)
This is a copy of my answer from: C++ Timer function to provide time in nano seconds
For Linux (and BSD) you want to use clock_gettime().
#include <sys/time.h>
int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
For windows you want to use the QueryPerformanceCounter. And here is more on QPC
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and
QueryPerformanceFrequency() offer a
bit better resolution, but have
different issues. For example in
Windows XP, all AMD Athlon X2 dual
core CPUs return the PC of either of
the cores "randomly" (the PC sometimes
jumps a bit backwards), unless you
specially install AMD dual core driver
package to fix the issue. We haven't
noticed any other dual+ core CPUs
having similar issues (p4 dual, p4 ht,
core2 dual, core2 quad, phenom quad).
You mentioned that you use GetTickCount(), so I'm going to recommend that you take a look at QueryPerformanceCounter().
There is really no substitute for the rdtsc instruction. You cannot be sure of what resolution the QueryPerformanceCounter will support. Some have a very large granularity (low increment rate/frequency), some return nothing at all.
Instead, I recommend you use the rdtsc instruction. It does not require any OS implementation and returns the number of CPU internal clock cycles that have elapsed since the computer/processor/core was powered up. For a 3 GHz processor that's 3 billion increments per second - it doesn't get more precise than that, now does it? This instruction is available for x86-32 and -64 beginning with the Pentium or Pentium MMX. It should therefore be accessible from x86 Linuxes as well.
There are plenty of posts about it here on stackoverflow.com. I've written a few myself ...