timer accuracy: c clock( ) vs. WinAPI's QPC or timeGetTime( ) - c++

I'd like to characterize the accuracy of a software timer. I'm not concerned so much about HOW accurate it is, but do need to know WHAT the accuracy is.
I've investigated c function clock(), and WinAPI's function QPC and timeGetTime, and I know that they're all hardware dependent.
I'm measuring a process that could take around 5-10 seconds, and my requirements are simple: I only need 0.1 second precision (resolution). But I do need to know what the accuracy is, worst-case.
while more accuracy would be preferred, I would rather know that the accuracy was poor (500ms) and account for it, than to believe that the accuracy was better (1 mS) but not be able to document it.
Does anyone have suggestions on how to characterize software clock accuracy?
Thanks

You'll need to distinguish accuracy, resolution and latency.
clock(), GetTickCount and timeGetTime() are derived from a calibrated hardware clock. Resolution is not great, they are driven by the clock tick interrupt which ticks by default 64 times per second or once every 15.625 msec. You can use timeBeginPeriod() to drive that down to 1.0 msec. Accuracy is very good, the clock is calibrated from a NTP server, you can usually count on it not being off more than a second over a month.
QPC has a much higher resolution, always better than one microsecond and as little as half a nanosecond on some machines. It however has poor accuracy, the clock source is a frequency picked up from the chipset somewhere. It is not calibrated and has typical electronic tolerances. Use it only to time short intervals.
Latency is the most important factor when you deal with timing. You have no use for a highly accurate timing source if you can't read it fast enough. And that's always an issue when you run code in user mode on a protected mode operating system. Which always has code that runs with higher priority than your code. Particularly device drivers are trouble-makers, video and audio drivers in particular. Your code is also subjected to being swapped out of RAM, requiring a page-fault to get loaded back. On a heavily loaded machine, not being able to run your code for hundreds of milliseconds is not unusual. You'll need to factor this failure mode into your design. If you need guaranteed sub-millisecond accuracy then only a kernel thread with real-time priority can give you that.
A pretty decent timer is the multi-media timer you get from timeSetEvent(). It was designed to provide good service for the kind of programs that require a reliable timer. You can make it tick at 1 msec, it will catch up with delays when possible. Do note that it is an asynchronous timer, the callback is made on a separate worker thread so you have to be careful taking care of proper threading synchronization.

Since you've asked for hard facts, here they are:
A typical frequency device controlling HPETs is the CB3LV-3I-14M31818
which specifies a frequency stability of +/- 50ppm between -40 °C and +85 °C.
A cheaper chip is the CB3LV-3I-66M6660. This device has a frequency stability of +/- 100 ppm between -20°C and 70°C.
As you see, 50 to 100ppm will result in a drift of 50 to 100 us/s, 180 to 360 ms/hour, or 4.32 to 8.64 s/day!
Devices controlling the RTC are typically somewhat better: The RV-8564-C2 RTC
module provides tolerances of +/- 10 to 20 ppm. Tighter tolerances are typically available in military version or on request. The deviation of this source is a factor of 5
less than that of the HPET. However, it is still 0.86 s/day.
All of the above values are maximum values as specified in the data sheet. Typical values may be considerably less, as mentioned in my comment, they are in the few ppm range.
The frequency values are also accompanied by thermal drift. The result of QueryPerformanceCounter() may be heavely influenced by thermal drift on systems operating with the ACPI Power Management Timer chip (example).
More information about timers: Clock and Timer Circuits.

For QPC, you can call QueryPerformanceFrequency to get the rate it updates at. Unless you are using time, you will get more than 0.5s timing accuracy anyway, but clock isn't all that accurate - quite often 10ms segments [although the apparently CLOCKS_PER_SEC is standardized at 1 million, making the numbers APPEAR more accurate].
If you do something along these lines, you can figure out how small a gap you can measure [although at REALLY high frequency you may not be able to notice how small, e.g. timestamp counter that updates every clock-cycle, and reading it takes 20-40 clockcycles]:
time_t t, t1;
t = time();
// wait for the next "second" to tick on.
while(t == (t1 = time())) /* do nothing */ ;
clock_t old = 0;
clock_t min_diff = 1000000000;
clock_t start, end;
start = clock();
int count = 0;
while(t1 == time())
{
clock_t c = clock();
if (old != 0 && c != old)
{
count ++;
clock_t diff;
diff = c - old;
if (min_diff > diff) min_diff = diff;
}
old = c;
}
end = clock();
cout << "Clock changed " << count << " times" << endl;
cout << "Smallest differece " << min_diff << " ticks" << endl;
cout << "One second ~= " << end - start << " ticks" << endl;
Obviously, you can apply same principle to other time-sources.
(Not compile-tested, but hopefully not too full of typos and mistakes)
Edit:
So, if you are measuring times in the range of 10 seconds, a timer that runs at 100Hz would give you 1000 "ticks". But it could be 999 or 1001, depending on your luck and you catch it just right/wrong, so that's 2000 ppm there - then the clock input may vary too, but it's much smaller variation ~ 100ppm at most. For Linux, the the clock() is updated at 100Hz (the actual timer that runs the OS may run at a higher frequency, but clock() in Linux will update at 100Hz or 10ms intervals [and it only updates when the CPU is being used, so sitting 5 seconds waiting for user input is 0 time].
In windows, clock() measures the actual time, same as your wrist watch does, not just the CPU being used, so 5 seconds waiting for user input is counted as 5 seconds of time. I'm not sure how accurate it is.
The other problem that you will find is that modern systems are not very good at repeatable timing in general - no matter what you do, the OS, the CPU and the memory all conspire together to make life a misery for getting the same amount of time for two runs. CPU's these days often run with intentionally variable clock (it's allowed to drift about 0.1-0.5%) to reduce electromagnetic radiation for EMC, (electromagnetic compatibility) testing spikes that can "sneak out" of that nicely sealed computer box.
In other words, even if you can get a very standardized clock, your test results will vary up and down a bit, depending on OTHER factors that you can't do anything about...
In summary, unless you are looking for a number to fill into a form that requires you to have a ppm number for your clock accuracy, and it's a government form that you can't NOT fill that information into, I'm not entirely convinced it's very useful to know the accuracy of the timer used to measure the time itself. Because other factors will play AT LEAST as big a role.

Related

How accurate is std::chrono?

std::chrono advertises that it can report results down to the nanosecond level. On a typical x86_64 Linux or Windows machine, how accurate would one expect this to be? What would be the error bars for a measurement of 10 ns, 10 µs, 10 ms, and 10 s, for example?
It's most likely hardware and OS dependent. For example when I ask Windows what the clock frequency is using QueryPerformanceFrequency() I get 3903987, which if you take the inverse of that you get a clock period or resolution of about 256 nanoseconds. This is the value that that my operating system reports.
With std::chrono according to the docs the minimum representable duration is high_resolution_clock::period::num / high_resolution_clock::period::den.
The num and den are numerator and denominator. std::chrono::high_resolution_clock tells me the numerator is 1, and the denominator is 1 billion, supposedly corresponding to 1 nanosecond:
std::cout << (double)std::chrono::high_resolution_clock::period::num /
std::chrono::high_resolution_clock::period::den; // Results in a nanosecond.
So according to the std::chrono I have one nanosecond resolution but I don't believe it because the native OS system call is more likely to be reporting the more accurate frequency/period.
The accuracy will depend upon the application and how this application interacts with the operating system. I am not familiar with chrono specifically, but there are limitations at a lower level you must account for.
For example, if you timestamp network packets using the CPU, the measurement accuracy is very noisy. Even though the precision of the time measurement may be 1 nanosecond, the context switch time for the interrupt corresponding to the packet arrival may be ~1 microsecond. You can accurately measure when your application processes the packet, but not what time the packet arrived.
Short answer: Not accurate in microseconds and below.
Long answer:
I was interested to know about how much my two dp program takes time to execute. So I used the chrono library, but when I ran it it says 0 microseconds. So technically I was unable to compare. I can't increase the array size, because it will not be possible to extend it to 1e8.
So I wrote a sort program to test it and ran it for 100 and the following is the result:
Enter image description here
It is clearly visible that it is not consistent for same input, so I would recommend not to use for higher precision.

Accurate C/C++ clock on a multi-core processor with auto-overclock?

I have looked into several topics to try to get some ideas on how to make a reliable clock with C or C++. However, I also saw some functions used the processor's ticks and ticks per second to calculate the end result, which I think could be a problem on a CPU with auto-overclock like the one I have. I also saw one of them reset after a while, thus is not really reliable.
The idea is to make a (preferably cross-platform) clock like an in-game one, with a precision better than a second in order to be able to add the elapsed time in the "current session" with the saved time at the end of the program. This would be to count the time spent on a console game that does not have an in-game clock, and on the long run to perhaps integrate it to actual PC games.
It should be able to run without taking too much or all of the CPU's time (or a single core's time for multi-core CPUs) as it would be quite bad to use all these resources just for the clock, and also on systems with auto-overclock (which could otherwise cause inaccurate results).
The program I would like to implement this feature into currently looks like this, but I might re-code it in C (since I have to get back to learning how to code in C++):
#include <iostream>
#include <cstdlib>
using namespace std;
int main()
{
cout << "In game" << endl;
system("PAUSE");
return 0;
}
On a side-note, I still need to get rid of the PAUSE feature which is Windows-specific, but I think that can be taken care of with a simple "while (char != '\n')" loop.
What I have already skimmed through:
Using clock() to measure execution time
Calculating elapsed time in a C program in milliseconds
Time stamp in the C programming language
Execution time of C program
C: using clock() to measure time in multi-threaded programs
Is gettimeofday() guaranteed to be of microsecond resolution?
How to measure time in milliseconds using ANSI C?
C++ Cross-Platform High-Resolution Timer
Timer function to provide time in nano seconds using C++
How to measure cpu time and wall clock time?
How can I measure CPU time and wall clock time on both Linux/Windows?
how to measure time?
resolution of std::chrono::high_resolution_clock doesn't correspond to measurements
C++ How to make timer accurate in Linux
http://gameprogrammingpatterns.com/game-loop.html
clock() accuracy
std::chrono doesn't seem to be giving accurate clock resolution/frequency
clock function in C++ with threads
(Edit: Extra research, in particular for a C implementation:
Cross platform C++ High Precision Event Timer implementation (no real answer)
Calculating Function time in nanoseconds in C code (Windows)
How to print time difference in accuracy of milliseconds and nanoseconds? (could be the best answer for a C implementation)
How to get duration, as int milli's and float seconds from <chrono>? (C++ again) )
The problem is that it is not clear whether some of the mentioned methods, like Boost or SDL2, behave properly with auto-overclock in particular.
TL;DR : What cross-platform function should I use to make an accurate, sub-second precise counter in C/C++ that could work on multi-core and/or auto-overclocking processors please?
Thanks in advance.
The std::chrono::high_resolution_clock seems to be what you are looking for. On most modern CPUs it is going to be steady monotonically increased clock which would not be affected by overclocking of the CPU.
Just keep in mind that it can't be used to tell time. It is only good for telling the time intervals, which is a great difference. For example:
using clock = std::chrono::high_resolution_clock;
auto start = clock::now();
perform_operation();
auto end = clock::now();
auto us = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "Operation took " << us << " microseconds.\n";
If the clock checking itself is a performance-sensitive operation, you will have to resort to platform-specific tricks, of which the most popular is reading CPU tick counter directly (RDTSC in Intel family). This is very fast, and on modern CPUs very accurate way of measuring time intervals.

Measuring an algorithms running time in terms of clock ticks

I'm using the following code snippet to measure my algorithms running time in terms of clock ticks:
clock_t t;
t = clock();
//run algorithm
t = clock() - t;
printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
However, this returns 0 when input size is small. How is this even possible ?
The clock has some granularity, dependent on several factors such as your OS.
Therefore, it may happen that your algorithm runs that fast that the clock did not have time update. Hence the measured duration of 0.
You can try to run your algorithm n times and divide the measured time by n to get a better idea of the time taken on small inputs.
The resolution of the standard C clock() function can vary heavily between systems and is likely too small to measure your algorithm. You have 2 options:
Use operating system specific functions
Repeat your algorithm several times until it takes long enough to be measured with clock()
For 1) you can use QueryPerformanceCounter() and QueryPerformanceFrequency() if your program runs under windows or use clock_gettime() if it runs on Linux.
Refer to these pages for further details:
QueryPerformanceCounter()
clock_gettime()
For 2) you have to execute your algorithm a given number of times sequentially so that the time reported by clock() is several magnitudes above the minimum granularity of clock(). Lets say clock() only works in steps of 12 microseconds, then the time consumed by the total test run should be at least 1.2 milliseconds so your time measurement has at most 1% deviation. Otherwise, if you measure a time of 12 micros you never know if it ran for 12.0 micros or maybe 23.9 micros, but the next bigger clock() tick just didn't happen. The more often your algorithm executes sequentially inside the time measurement, the more exact your time measurement will be. Also be sure to copy-paste the call to your algorithm for sequential executions; if you just use a loop counter in a for-loop, this may severely influence your measurement!

Finding out the CPU clock frequency (per core, per processor)

Programs like CPUz are very good at giving in depth information about the system (bus speed, memory timings, etc.)
However, is there a programmatic way of calculating the per core (and per processor, in multi processor systems with multiple cores per CPU) frequency without having to deal with CPU specific info.
I am trying to develop a anti cheating tool (for use with clock limited benchmark competitions) which will be able to record the CPU clock during the benchmark run for all the active cores in the system (across all processors.)
I'll expand on my comments here. This is too big and in-depth for me to fit in the comments.
What you're trying to do is very difficult - to the point of being impractical for the following reasons:
There's no portable way to get the processor frequency. rdtsc does NOT always give the correct frequency due to effects such as SpeedStep and Turbo Boost.
All known methods to measure frequency require an accurate measurement of time. However, a determined cheater can tamper with all the clocks and timers in the system.
Accurately reading either the processor frequency as well as time in a tamper-proof way will require kernel-level access. This implies driver signing for Windows.
There's no portable way to get the processor frequency:
The "easy" way to get the CPU frequency is to call rdtsc twice with a fixed time-duration in between. Then dividing out the difference will give you the frequency.
The problem is that rdtsc does not give the true frequency of the processor. Because real-time applications such as games rely on it, rdtsc needs to be consistent through CPU throttling and Turbo Boost. So once your system boots, rdtsc will always run at the same rate (unless you start messing with the bus speeds with SetFSB or something).
For example, on my Core i7 2600K, rdtsc will always show the frequency at 3.4 GHz. But in reality, it idles at 1.6 GHz and clocks up to 4.6 GHz under load via the overclocked Turbo Boost multiplier at 46x.
But once you find a way to measure the true frequency, (or you're happy enough with rdtsc), you can easily get the frequency of each core using thread-affinities.
Getting the True Frequency:
To get the true frequency of the processor, you need to access either the MSRs (model-specific registers) or the hardware performance counters.
These are kernel-level instructions and therefore require the use of a driver. If you're attempting this in Windows for the purpose of distribution, you will therefore need to go through the proper driver signing protocol. Furthermore, the code will differ by processor make and model so you will need different detection code for each processor generation.
Once you get to this stage, there are a variety of ways to read the frequency.
On Intel processors, the hardware counters let you count raw CPU cycles. Combined with a method of precisely measuring real time (next section), you can compute the true frequency. The MSRs give you access to other information such as the CPU frequency multiplier.
All known methods to measure frequency require an accurate measurement of time:
This is perhaps the bigger problem. You need a timer to be able to measure the frequency. A capable hacker will be able to tamper with all the clocks that you can use in C/C++.
This includes all of the following:
clock()
gettimeofday()
QueryPerformanceCounter()
etc...
The list goes on and on. In other words, you cannot trust any of the timers as a capable hacker will be able to spoof all of them. For example clock() and gettimeofday() can be fooled by changing the system clock directly within the OS. Fooling QueryPerformanceCounter() is harder.
Getting a True Measurement of Time:
All the clocks listed above are vulnerable because they are often derived off of the same system base clock in some way or another. And that system base clock is often tied to the system base clock - which can be changed after the system has already booted up by means of overclocking utilities.
So the only way to get a reliable and tamper-proof measurement of time is to read external clocks such as the HPET or the ACPI. Unfortunately, these also seem to require kernel-level access.
To Summarize:
Building any sort of tamper-proof benchmark will almost certainly require writing a kernel-mode driver which requires certificate signing for Windows. This is often too much of a burden for casual benchmark writers.
This has resulted in a shortage of tamper-proof benchmarks which has probably contributed to the overall decline of the competitive overclocking community in recent years.
I realise this has already been answered. I also realise this is basically a black art, so please take it or leave it - or offer feedback.
In a quest to find the clock rate on throttled (thanks microsft,hp, and dell) HyperV hosts (unreliable perf counter), and HyperV guests (can only get stock CPU speed, not current), I have managed, through trial error and fluke, to create a loop that loops exactly once per clock.
Code as follows - C# 5.0, SharpDev, 32bit, Target 3.5, Optimize on (crucial), no debuger active (crucial)
long frequency, start, stop;
double multiplier = 1000 * 1000 * 1000;//nano
if (Win32.QueryPerformanceFrequency(out frequency) == false)
throw new Win32Exception();
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(1);
const int gigahertz= 1000*1000*1000;
const int known_instructions_per_loop = 1;
int iterations = int.MaxValue;
int g = 0;
Win32.QueryPerformanceCounter(out start);
for( i = 0; i < iterations; i++)
{
g++;
g++;
g++;
g++;
}
Win32.QueryPerformanceCounter(out stop);
//normal ticks differs from the WMI data, i.e 3125, when WMI 3201, and CPUZ 3199
var normal_ticks_per_second = frequency * 1000;
var ticks = (double)(stop - start);
var time = (ticks * multiplier) /frequency;
var loops_per_sec = iterations / (time/multiplier);
var instructions_per_loop = normal_ticks_per_second / loops_per_sec;
var ratio = (instructions_per_loop / known_instructions_per_loop);
var actual_freq = normal_ticks_per_second / ratio;
Console.WriteLine( String.Format("Perf counhter freq: {0:n}", normal_ticks_per_second));
Console.WriteLine( String.Format("Loops per sec: {0:n}", loops_per_sec));
Console.WriteLine( String.Format("Perf counter freq div loops per sec: {0:n}", instructions_per_loop));
Console.WriteLine( String.Format("Presumed freq: {0:n}", actual_freq));
Console.WriteLine( String.Format("ratio: {0:n}", ratio));
Notes
25 instructions per loop if debugger is active
Consider running a 2 or 3 seconds loop before hand to spin up the processor (or at least attempt to spin up, knowing how heavily servers are throttled these days)
Tested on a 64bit Core2 and Haswell Pentium and compared against CPU-Z
One of the most simple ways to do it is using RDTSC, but seeing as this is for anti-cheating mechanisms, I'd put this in as a kernel driver or a hyper-visor resident piece of code.
You'd probably also need to roll your own timing code**, which again can be done with RDTSC (QPC as used in the example below uses RDTSC, and its in fact very simple to reverse engineer and use a local copy of, which means to tamper with it, you'd need to tamper with your driver).
void GetProcessorSpeed()
{
CPUInfo* pInfo = this;
LARGE_INTEGER qwWait, qwStart, qwCurrent;
QueryPerformanceCounter(&qwStart);
QueryPerformanceFrequency(&qwWait);
qwWait.QuadPart >>= 5;
unsigned __int64 Start = __rdtsc();
do
{
QueryPerformanceCounter(&qwCurrent);
}while(qwCurrent.QuadPart - qwStart.QuadPart < qwWait.QuadPart);
pInfo->dCPUSpeedMHz = ((__rdtsc() - Start) << 5) / 1000000.0;
}
** I this would be for security as #Mystical mentioned, but as I've never felt the urge to subvert low level system timing mechanisms, there might be more involved, would be nice if Mystical could add something on that :)
I've previously posted on this subject (along with a basic algorithm): here. To my knowledge the algorithm (see the discussion) is very accurate. For example, Windows 7 reports my CPU clock as 2.00 GHz, CPU-Z as 1994-1996 MHz and my algorithm as 1995025-1995075 kHz.
The algorithm performs a lot of loops to do this which causes the CPU frequency to increase to maximum (as it also will during benchmarks) so speed-throttling software won't come into play.
Additional info here and here.
On the question of speed-throttling I really don't see it as a problem unless an application uses the speed values to determine elapsed times and that the times themselves are extremely important. For example, if a division requires x clock cycles to complete it doesn't matter if the CPU is running at 3 GHz or 300 MHz: it will still need x clock cycles and the only difference is that it will complete the division in a tenth of the time at # 3 GHz.
You need to use CallNtPowerInformation. Here's a code sample from putil project.
With this you can get current and max CPU frequency. As far as I know it's not possible to get per-CPU frequency.
One should refer to this white paper: Intel® Turbo Boost Technology in Intel® Core™ Microarchitecture (Nehalem) Based Processors. Basically, produce several reads of the UCC fixed performance counter over a sample period T.
Relative.Freq = Delta(UCC) / T
Where:
Delta() = UCC # period T
- UCC # period T-1
Starting with Nehalem architecture, UCC increase and decrease the number of click ticks relatively to the Unhalted state of the core.
When SpeedStep or Turbo Boost are activated, the estimated frequency using UCC will be measured accordingly; while TSC remains constant. For instance, Turbo Boost in action reveals that Delta(UCC) is greater or equal to Delta(TSC)
Example in function Core_Cycle function at Cyring | CoreFreq GitHub.

How to calculate a operation's time in micro second precision

I want to calculate performance of a function in micro second precision on Windows platform.
Now Windows itself has milisecond granuality, so how can I achieve this.
I tried following sample, but not getting correct results.
LARGE_INTEGER ticksPerSecond = {0};
LARGE_INTEGER tick_1 = {0};
LARGE_INTEGER tick_2 = {0};
double uSec = 1000000;
// Get the frequency
QueryPerformanceFrequency(&ticksPerSecond);
//Calculate per uSec freq
double uFreq = ticksPerSecond.QuadPart/uSec;
// Get counter b4 start of op
QueryPerformanceCounter(&tick_1);
// The ope itself
Sleep(10);
// Get counter after opfinished
QueryPerformanceCounter(&tick_2);
// And now the op time in uSec
double diff = (tick_2.QuadPart/uFreq) - (tick_1.QuadPart/uFreq);
Run the operation in a loop a million times or so and divide the result by that number. That way you'll get the average execution time over that many executions. Timing one (or even a hundred) executions of a very fast operation is very unreliable, due to multitasking and whatnot.
compile it
look at the assembler output
count the number of each instruction in your function
apply the cycles per instruction on your target processor
end up with a cycle count
multiply by the clock speed you are running at
apply arbitrary scaling factors to account for cache misses and branch mis-predictions lol
(man I am so going to get down-voted for this answer)
No, you are probably getting an accurate result, QueryPerformanceCounter() works well for timing short intervals. What's wrong is the your expectation of the accuracy of Sleep(). It has a resolution of 1 millisecond, its accuracy is far worse. No better than about 15.625 milliseconds on most Windows machine.
To get it anywhere close to 1 millisecond, you'll have to call timeBeginPeriod(1) first. That probably will improve the match, ignoring the jitter you'll get from Windows being a multi-tasking operating system.
If you're doing this for offline profiling, a very simple way is to run the function 1000 times, measure to the closest millisecond and divide by 1000.
To get finer resolution than 1 ms, you will have to consult your OS documentation. There may be APIs to get timer resolution in microsecond resolution. If so, run your application many times and take the averages.
I like Matti Virkkunen's answer. Check the time, call the function a large number of times, check the time when you finish, and divide by the number of times you called the function. He did mention you might be off due to OS interrupts. You might vary the number of times you make the call and see a difference. Can you raise the priority of the process? Can you get it so all the calls within a single OS time slice?
Since you don't know when the OS might swap you out, you can put this all inside a larger loop to do the whole measurement a large number of times, and save the smallest number as that is the one that had the fewest OS interrupts. This still may be greater than the actual time for the function to execute because it may still contain some OS interrupts.
Sanjeet,
It looks (to me) like you're doing this exactly right. QueryPerformanceCounter is a perfectly good way to measure short periods of time with a high degree of precision. If you're not seeing the result you expected, it's most likely because the sleep isn't sleeping for the amount of time you expected it to! However, it is likely being measured correctly.
I want to go back to your original question about how to measure the time on windows with microsecond precision. As you already know, the high performance counter (i.e. QueryPerformanceCounter) "ticks" at the frequency reported by QueryPerformanceFrequency. That means that you can measure time with precision equal to:
1/frequency seconds
On my machine, QueryPerformanceFrequency reports 2337910 (counts/sec). That means that my computer's QPC can measure with precision 4.277e-7 seconds, or 0.427732 microseconds. That means that the smallest bit of time I can measure is 0.427732 microseconds. This, of course, gives you the precision that you originally asked for :) Your machine's frequency should be similar, but you can always do the math and check it.
Or you can use gettimeofday() which gives you a timeval struct that is a timestamp (down to µs)