Get nanoseconds since midnight with the lowest latency - c++

I'd like to get the current number of nanoseconds since midnight, with the lowest latency.
My platform is Linux/Centos 7 with Clang. I do not care about portability.
I found this <chrono> struct,
but they are dividing by seconds/milliseconds etc to get the result.
I also found this which could be modified for nanoseconds:
struct timeval tv;
int msec = -1;
if (gettimeofday(&tv, NULL) == 0)
{
msec = ((tv.tv_sec % 86400) * 1000 + tv.tv_usec / 1000);
}
https://stackoverflow.com/a/10499119/997112
but again they are using a division. Is there anything quicker, avoiding modulus and divisions?
I would assume the fastest way would be:
Get the time now
Multiple number of hours, minutes seconds by necessary nanoseconds and then add the current number of nanos to the total
?

There isn't any hardware that provides a nanoseconds counter; therefore hardware that provides something else (e.g. "CPU cycles") must be used and scaled by software somewhere.
The clock_gettime() function on Linux will scale to nanoseconds for you. More importantly (depending on security vs. performance compromises) this may be done purely in user-space, avoiding the overhead of calling the kernel API (which is likely to be at least 10 times more expensive than a measly division).
However; at these scales you need to be extremely specific about what you actually want. For example; what is expected during leap seconds? 2 computers can disagree simply because one is configured to smear leap seconds and the other isn't.
For another example; if you want to calculate latency (e.g. like "latency = current_time_at_receiver - time_packet_says_it_was_sent") then 2 computers can be out of sync (e.g. the sender's clock being a few seconds behind the receiver's, so latency ends up being negative); and to deal with that you'll probably need a training phase (a bit like the NTP protocol) where you try to estimate the initial difference between the 2 computers' time sources, followed by monitoring/tracking (to try to compensate for any long term drift).

Related

chrono: can I validate system clock with steady clock on a time scale of an hour?

My application needs absolute timestamp (i.e. including date and hour) with error below 0.5s. The server synchronises via NTP, but I still want to detect if the server clock is not well synchronised for whatever reason.
My idea is to use steady clock to validate the system clock. I assume that within a period of, say, 1 hour steady clock should deviate very little from the real time (well below 0.5s). I compare time measured with steady and system clocks periodically. If the difference between the two grows or jumps large, it may suggest NTP is adjusting the system clock, which may mean that some of the time values were incorrect.
This is an example code:
#include <iostream>
#include <chrono>
#include <thread>
int main() {
const int test_time = 3600; //seconds, approximate
const int delay = 100; //milliseconds
const int iterations = test_time * 1000 / delay;
int64_t system_clock = std::chrono::system_clock::now().time_since_epoch().count();
int64_t steady_clock = std::chrono::steady_clock::now().time_since_epoch().count();
const int64_t offset = system_clock - steady_clock;
for(int i = 0; i < iterations; i++) {
system_clock = std::chrono::system_clock::now().time_since_epoch().count();
steady_clock = std::chrono::steady_clock::now().time_since_epoch().count();
int64_t deviation = system_clock - offset - steady_clock;
std::cout<<deviation/1e3<<" µs"<<std::endl;
/**
* Here I put code making use of system_clock
*/
std::this_thread::sleep_for(std::chrono::milliseconds(delay));
}
}
Does this procedure make sense? What I'm not sure about in particular is stability of the steady clock. I assume that it might be subject only to a slight deviation due to imperfectness of whatever is the internal server clock, but maybe I'm missing something?
I was very positively surprised by the test results with the code above. Even if I set it to run for 8 hours the maximum deviation I saw was only –22µs, and only around 1µs for vast majority of the times.
This question has little to do with C++.
1) Whether this method has a chance to work depends on accuracy of your computer's internal clock. Cheap clock might drift a minute a day - which is way over 0.5sec per hour.
2) The method is unable to identify a systematic offset. Say, you are constantly behind by a second due to network lagging, ping, or some other issues. The method will display a negligible deviation in this case.
Basically, it can only tell if time measured is precise but provides little knowledge on the accuracy (google: accuracy vs precision). Also in comments were mentioned issues of the algo regarding general clock adjustment.

Timers differences between Win7 & Win10

I have a application where I use the MinGW implementation of gettimeofday to achieve "precise" timing (~1ms precision) on Win7. It works fine.
However, when using the same code (and even the same *.exe) on Win10, the precision drops drastically to the famous 15.6ms precision, which is not enough for me.
Two questions:
- do you know what can be the root for such discrepancies? (is it a OS config/"features"?)
- how can I fix it ? or, better, is there a precise timer agnostic to the OS config?
NB: std::chrono::high_resolution_clock seems to have the same issue (at least it does show the 15.6ms limit on Win10).
From Hans Passant comments and additional tests on my side, here is a sounder answer:
The 15.6ms (1/64 second) limit is well-known on Windows and is the default behavior. It is possible to lower the limit (e.g. to 1ms, through a call to timeBeginPeriod()) though we are not advise to do so, because this affects the global system timer resolution and the resulting power consumption. For instance, Chrome is notorious for doing this‌​. Hence, due to the global aspect of the timer resolution, one may observe a 1ms precision without explicitly asking for, because of third party programs.
Besides, be aware that std::chrono::high_resolution_clock does not have a valid behavior on windows (both in Visual Studio or MinGW context). So you cannot expect this interface to be a cross-platform solution, and the 15.625ms limit still applies.
Knowing that, how can we deal with it? Well, one can use the timeBeginPeriod() thing to increase precision of some timers but, again, we are not advise to do so: it seems better to use QueryPerformanceCounter() (QPC), which is the primary API for native code looking forward to acquire high-resolution time stamps or measure time intervals according to Microsoft. Note that GPC does count elapsed time (and not CPU cycles). Here is a usage example:
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);
// Activity to be timed
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
//
// We now have the elapsed number of ticks, along with the
// number of ticks-per-second. We use these values
// to convert to the number of elapsed microseconds.
// To guard against loss-of-precision, we convert
// to microseconds *before* dividing by ticks-per-second.
//
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
According to Microsoft, QPC is also suitable in a multicore/multithread context, though it can be less precise/ambiguous:
When you compare performance counter results that are acquired from different threads, consider values that differ by ± 1 tick to have an ambiguous ordering. If the time stamps are taken from the same thread, this ± 1 tick uncertainty doesn't apply. In this context, the term tick refers to a period of time equal to 1 ÷ (the frequency of the performance counter obtained from QueryPerformanceFrequency).
As additional resources, MS also provides an FAQ on how/why use QPC and an explanation on clock/timing in Windows.

Consistent Timestamping in C++ with std::chrono

I'm logging timestamps in my program with the following block of code:
// Taken at relevant time
m.timestamp = std::chrono::high_resolution_clock::now().time_since_epoch();
// After work is done
std::size_t secs = std::chrono::duration_cast <std::chrono::seconds> (timestamp).count();
std::size_t nanos = std::chrono::duration_cast<std::chrono::nanoseconds> (timestamp).count() % 1000000000;
std::time_t tp = (std::time_t) secs;
std::string mode;
char ts[] = "yyyymmdd HH:MM:SS";
char format[] = "%Y%m%d %H:%M:%S";
strftime(ts, 80, format, std::localtime(&tp));
std::stringstream s;
s << ts << "." << std::setfill('0') << std::setw(9) << nanos
<< " - " << message << std::endl;
return s.str();
I'm comparing these to timestamps recorded by an accurate remote source. When the difference in timestamps is graphed and ntp is not enabled, there is a linear looking drift through the day (700 microseconds every 30 seconds or so).
After correcting for a linear drift, I find that there's a non-linear component. It can drift in and out hundreds of microseconds over the course of hours.
The second graph looks similar to graphs taken with same methodology as above, but NTP enabled. The large vertical spikes are expected in the data, but the wiggle in the minimum is surprising.
Is there a way to get a more precise timestamp, but retain microsecond/nanosecond resolution? It's okay if the clock drifts from the actual time in a predictable way, but the timestamps would need to be internally consistent over long stretches of time.
high_resolution_clock has no guaranteed relationship with "current time". Your system may or not alias high_resolution_clock to system_clock. That means you may or may not get away with using high_resolution_clock in this manner.
Use system_clock. Then tell us if the situation has changed (it may not).
Also, better style:
using namespace std::chrono;
auto timestamp = ... // however, as long as it is based on system_clock
auto secs = duration_cast <seconds> (timestamp);
timestamp -= secs;
auto nanos = duration_cast<nanoseconds> (timestamp);
std::time_t tp = system_clock::to_time_t(system_clock::time_point{secs});
Stay in the chrono type system as long as possible.
Use the chrono type system to do the conversions and arithmetic for you.
Use system_clock::to_time_t to convert to time_t.
But ultimately, none of the above is going to change any of your results. system_clock is just going to talk to the OS (e.g. call gettimeofday or whatever).
If you can devise a more accurate way to tell time on your system, you can wrap that solution up in a "chrono-compatible clock" so that you can continue to make use of the type safety and conversion factors of chrono durations and time_points.
struct my_super_accurate_clock
{
using rep = long long;
using period = std::nano; // or whatever?
using duration = std::chrono::duration<rep, period>;
using time_point = std::chrono::time_point<my_super_accurate_clock>;
static const bool is_steady = false;
static time_point now(); // do super accurate magic here
};
The problem is that unless your machine is very unusual, the underlying hardware simply isn't capable of providing a particularly reliable measurement of time (at least on the scales you are looking at).
Whether on your digital wristwatch or your workstation, most electronic clock signals are internally generated by a crystal oscillator. Such crystals have both long (years) and short-term (minutes) variation around their "ideal" frequency, with the largest short-term component being variation with temperature. Fancy lab equipment is going to have something like a crystal oven which tries to keep the crystal at a constant temperature (above ambient) to minimize temperature related drift, but I've never seen anything like that on commodity computing hardware.
You see the effects of crystal inaccuracy in a different way in both of your graphs. The first graph simply shows that your crystal ticks at a somewhat large offset from true time, either due to variability at manufacturing (it was always that bad) or long-term drift (it got like that over time). Once you enable NTP, the "constant" or average offset from true is easily corrected, so you'll expect to average zero offset over some large period of time (indeed the line traced by the minimum dips above and below zero).
At this scale, however, you'll see the smaller short term variations in effect. NTP kicks in periodically and tries to "fix them", but the short term drift is always there and always changing direction (you can probably even check the effect of increasing or decreasing ambient temperature and see it in the graph).
You can't avoid the wiggle, but you could perhaps increase the NTP adjustment frequency to keep it more tightly coupled to real time. Your exact requirements aren't totally clear though. For example you mention:
It's okay if the clock drifts from the actual time in a predictable
way, but the timestamps would need to be internally consistent over
long stretches of time.
What does "internally consistent" mean? If you are OK with arbitrary drift, just use your existing clock without NTP adjustments. If you want something like time that tracks real time "over large timeframes" (i.e,. it doesn't get too out of sync), why could use your internal clock in combination with periodic polling of your "external source", and change the adjustment factor in a smooth way so that you don't have "jumps" in the apparent time. This is basically reinventing NTP, but at least it would be fully under application control.

C++ fine granular time

The following piece of code gives 0 as runtime of the function. Can anybody point out the error?
struct timeval start,end;
long seconds,useconds;
gettimeofday(&start, NULL);
int optimalpfs=optimal(n,ref,count);
gettimeofday(&end, NULL);
seconds = end.tv_sec - start.tv_sec;
useconds = end.tv_usec - start.tv_usec;
long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
cout<<"\nOptimal Runtime is "<<opt_runtime<<"\n";
I get both start and end time as the same.I get the following output
Optimal Runtime is 0
Tell me the error please.
POSIX 1003.1b-1993 specifies interfaces for clock_gettime() (and clock_getres()), and offers that with the MON option there can be a type of clock with a clockid_t value of CLOCK_MONOTONIC (so that your timer isn't affected by system time adjustments). If available on your system then these functions return a structure which has potential resolution down to one nanosecond, though the latter function will tell you exactly what resolution the clock has.
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* and nanoseconds */
};
You may still need to run your test function in a loop many times for the clock to register any time elapsed beyond its resolution, and perhaps you'll want to run your loop enough times to last at least an order of magnitude more time than the clock's resolution.
Note though that apparently the Linux folks mis-read the POSIX.1b specifications and/or didn't understand the definition of a monotonically increasing time clock, and their CLOCK_MONOTONIC clock is affected by system time adjustments, so you have to use their invented non-standard CLOCK_MONOTONIC_RAW clock to get a real monotonic time clock.
Alternately one could use the related POSIX.1 timer_settime() call to set a timer running, a signal handler to catch the signal delivered by the timer, and timer_getoverrun() to find out how much time elapsed between the queuing of the signal and its final delivery, and then set your loop to run until the timer goes off, counting the number of iterations in the time interval that was set, plus the overrun.
Of course on a preemptive multi-tasking system these clocks and timers will run even while your process is not running, so they are not really very useful for benchmarking.
Slightly more rare is the optional POSIX.1-1999 clockid_t value of CLOCK_PROCESS_CPUTIME_ID, indicated by the presence of the _POSIX_CPUTIME from <time.h>, which represents the CPU-time clock of the calling process, giving values representing the amount of execution time of the invoking process. (Even more rare is the TCT option of clockid_t of CLOCK_THREAD_CPUTIME_ID, indicated by the _POSIX_THREAD_CPUTIME macro, which represents the CPU time clock, giving values representing the amount of execution time of the invoking thread.)
Unfortunately POSIX makes no mention of whether these so-called CPUTIME clocks count just user time, or both user and system (and interrupt) time, accumulated by the process or thread, so if your code under profiling makes any system calls then the amount of time spent in kernel mode may, or may not, be represented.
Even worse, on multi-processor systems, the values of the CPUTIME clocks may be completely bogus if your process happens to migrate from one CPU to another during its execution. The timers implementing these CPUTIME clocks may also run at different speeds on different CPU cores, and at different times, further complicating what they mean. I.e. they may not mean anything related to real wall-clock time, but only be an indication of the number of CPU cycles (which may still be useful for benchmarking so long as relative times are always used and the user is aware that execution time may vary depending on external factors). Even worse it has been reported that on Linux CPU TimeStampCounter-based CPUTIME clocks may even report the time that a process has slept.
If your system has a good working getrusage() system call then it will hopefully be able to give you a struct timeval for each of the the actual user and system times separately consumed by your process while it was running. However since this puts you back to a microsecond clock at best then you'll need to run your test code enough times repeatedly to get a more accurate timing, calling getrusage() once before the loop, and again afterwards, and the calculating the differences between the times given. For simple algorithms this might mean running them millions of times, or more. Note also that on many systems the division between user time and system time is done somewhat arbitrarily and if examined separately in a repeated loop one or the other can even appear to run backwards. However if your algorithm makes no system calls then summing the time deltas should still be a fair total time for your code execution.
BTW, take care when comparing time values such that you don't overflow or end up with a negative value in a field, either as #Nim suggests, or perhaps like this (from NetBSD's <sys/time.h>):
#define timersub(tvp, uvp, vvp) \
do { \
(vvp)->tv_sec = (tvp)->tv_sec - (uvp)->tv_sec; \
(vvp)->tv_usec = (tvp)->tv_usec - (uvp)->tv_usec; \
if ((vvp)->tv_usec < 0) { \
(vvp)->tv_sec--; \
(vvp)->tv_usec += 1000000; \
} \
} while (0)
(you might even want to be more paranoid that tv_usec is in range)
One more important note about benchmarking: make sure your function is actually being called, ideally by examining the assembly output from your compiler. Compiling your function in a separate source module from the driver loop usually convinces the optimizer to keep the call. Another trick is to have it return a value that you assign inside the loop to a variable defined as volatile.
You've got weird mix of floats and ints here:
long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
Try using:
long opt_runtime = (long)(seconds * 1000 + (float)useconds/1000);
This way you'll get your results in milliseconds.
The execution time of optimal(...) is less than the granularity of gettimeofday(...). This likely happes on Windows. On Windows the typical granularity is up to 20 ms. I've answered a related gettimeofday(...) question here.
For Linux I asked How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? and got a good result.
More information on how to obtain accurate timing is described in this SO answer.
I normally do such a calculation as:
long long ss = start.tv_sec * 1000000LL + start.tv_usec;
long long es = end.tv_sec * 1000000LL + end.tv_usec;
Then do a difference
long long microsec_diff = es - ss;
Now convert as required:
double seconds = microsec_diff / 1000000.;
Normally, I don't bother with the last step, do all timings in microseconds.

Linux/c++ timing method to gaurentee execution every N seconds despite drift?

I have a program that needs to execute every X seconds to write some output. The program will do some intermittent polling and processing between each output. So for example I may be outputing every 5 seconds, but I wake up to poll every .1 seconds until I hit the next 5 second mark. The program in theory will run for months between restart, possible even longer.
I need the the execution at every X seconds to stay consistent to wall clock. In other words I can't allow clock drift to cause me to drift away from the X second mark. I don't need quite the same level of accuracy in the intermitent polling, but I would like to poll more often then a second so I need a structur that can represent sub-second percision.
I realize that by the very nature of running on an OS there wil be a certain inconsistency/latency in execution of any timer such that I can't gaurentee that I'll execute at exactly ever x seconds. But that is fine so long at it stays a small normal distribution, what I don't want is for drift to allow the time to consistently get further and further off.
I would also prefer to try to minimize the CPU cost of the polling as much as possible, but that's a secondary concern.
What timing constructs are available for Linux that can best provide me this level of percision? I'm trying to avoid including boost in the application due to hassles in distribution, but can use it if I have to. So methods using the standard c++ libraries are prefered, but if Bosst can do it better I would like to know that as well.
Thank you.
-ps, I can't use c++11. It's not an option so there is no way I can use any constructs from it.
clock_gettime and clock_nanosleep both operate on sub-second times, and use of CLOCK_MONOTONIC should prevent any skew due to adjustments to system time (such as NTP or adjtimex). For example,
long delay_ns = 250000000;
struct timespec next;
clock_gettime(CLOCK_MONOTONIC, &next);
while (1) {
next.ts_nsec += delay_ns;
next.ts_sec += ts_nsec / 1000000000;
next.ts_nsec %= 1000000000;
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &next, NULL);
/* do something after the wait */
}
In reality, you should check whether you've returned early due to a signal and whether you should skip an interval because too much time passed while you were sleeping.
Other methods of sleeping, such as nanosleep and select, only allow for specifying a time interval and use CLOCK_REALTIME, which is system time and may change.
If you need some exact timing without jitter or clock drift, make sure to have an external time source like a GPS, atom clock or radio time clock.
See for example the NTP integration documented in this PDF.
All NTP tricks work as well on Linux as on NanoBSD (which is in the PDF).
I would record the time t_0 when I started, then at any given time t, if i = (t - t_0)/X (integer division) is greater than the same quantity last time it executed, you execute again.
For example, suppose you are running every 5 seconds, if you started at time 23 you record i = 0, t_0 = 23. Then at time 27, (27 - 23)/5 = 0, so you don't yet execute. But next time around when it comes to time 28, (28 - 23)/5 = 1, which is greater than 0, so you execute.
This way you're not adding to a counter each tick which could have a huge drift, you're computing absolutely the time since the start so you know the exact times at which to execute.