Fastest way to get a timestamp - c++

I am implementing some data structure in which I need to invalidate some entries after some time, so for each entry I need to maintain its insertion timestamp. When I get an entry I need to get a timestamp again and calculate the elapsed time from the insertion (if it's too old, I can't use it).
This data structure is highly contented by many threads, so I must get this timestamp (on insert and find) in the most efficient way possible. Efficiency is extremely important here.
If it matters, I am working on a linux machine, developing in C++.
What is the most efficient way to retrieve a timestamp?
BTW, in some old project I was working on, I remember I saw some assembly command which gets a timestamp directly from the CPU (can't remember the command).

I have created the following benchmark to test several methods for retrieving a timestamp. The benchmark was compiled with GCC with -O2, and tested on my mac. I have measured the time it takes to get 1M timestamps for each method, and from the results it looks like rdtsc is faster than the others.
EDIT: The benchmark was modified to support multiple threads.
Benchmark code:
#include <iostream>
#include <chrono>
#include <sys/time.h>
#include <unistd.h>
#include <vector>
#include <thread>
#include <atomic>
#define NUM_SAMPLES 1000000
#define NUM_THREADS 4
static inline unsigned long long getticks(void)
{
unsigned int lo, hi;
// RDTSC copies contents of 64-bit TSC into EDX:EAX
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return (unsigned long long)hi << 32 | lo;
}
std::atomic<bool> g_start(false);
std::atomic<unsigned int> totalTime(0);
template<typename Method>
void measureFunc(Method method)
{
// warmup
for (unsigned int i = 0; i < NUM_SAMPLES; i++)
{
method();
}
auto start = std::chrono::system_clock::now();
for (unsigned int i = 0; i < NUM_SAMPLES; i++)
{
method();
}
auto end = std::chrono::system_clock::now();
totalTime += std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
}
template<typename Method>
void measureThread(Method method)
{
while(!g_start.load());
measureFunc(method);
}
template<typename Method>
void measure(const std::string& methodName, Method method)
{
std::vector<std::thread> threads;
totalTime.store(0);
g_start.store(false);
for (unsigned int i = 0; i < NUM_THREADS; i++)
{
threads.push_back(std::thread(measureThread<Method>, method));
}
g_start.store(true);
for (std::thread& th : threads)
{
th.join();
}
double timePerThread = (double)totalTime / (double)NUM_THREADS;
std::cout << methodName << ": " << timePerThread << "ms per thread" << std::endl;
}
int main(int argc, char** argv)
{
measure("gettimeofday", [](){ timeval tv; return gettimeofday(&tv, 0); });
measure("time", [](){ return time(NULL); });
measure("std chrono system_clock", [](){ return std::chrono::system_clock::now(); });
measure("std chrono steady_clock", [](){ return std::chrono::steady_clock::now(); });
measure("clock_gettime monotonic", [](){ timespec tp; return clock_gettime(CLOCK_MONOTONIC, &tp); });
measure("clock_gettime cpu time", [](){ timespec tp; return clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tp); });
measure("rdtsc", [](){ return getticks(); });
return 0;
}
Results (in milliseconds) for a single thread:
gettimeofday: 54ms per thread
time: 260ms per thread
std chrono system_clock: 62ms per thread
std chrono steady_clock: 60ms per thread
clock_gettime monotonic: 102ms per thread
clock_gettime cpu time: 493ms per thread
rdtsc: 8ms per thread
With 4 threads:
gettimeofday: 55.25ms per thread
time: 292.5ms per thread
std chrono system_clock: 69.25ms per thread
std chrono steady_clock: 68.5ms per thread
clock_gettime monotonic: 118.25ms per thread
clock_gettime cpu time: 2975.75ms per thread
rdtsc: 10.25ms per thread
From the results, it looks like std::chrono has some small overhead when called from multiple threads, the gettimeofday method stays stable as the number of threads increases.

Related

How could one delay a function without the use of sleep / suspending the code?

I need to delay a function by x amount of time. The problem is that I can't use sleep nor any function that suspends the function (that's because the function is a loop that contains more function, sleeping / suspending one will sleep / suspend all)
Is there a way I could do it?
If you want to execute some specific code at a certain time interval and don't want to use threads (to be able to suspend), then you have to keep track of time and execute the specific code when the delay time was exceeded.
Example (pseudo):
timestamp = getTime();
while (true) {
if (getTime() - timestamp > delay) {
//main functionality
//reset timer
timestamp = getTime();
}
//the other functionality you mentioned
}
With this approach, you invoke a specific fuction every time interval specified by delay. The other functions will be invoked at each iteration of the loop.
In other words, it makes no difference if you delay a function or execute it at specific time intervals.
Assuming that you need to run functions with their own arguments inside of a loop with custom delay and wait for them to finish before each iteration:
#include <cstdio>
void func_to_be_delayed(const int &idx = -1, const unsigned &ms = 0)
{
printf("Delayed response[%d] by %d ms!\n", idx, ms);
}
#include <chrono>
#include <future>
template<typename T, typename ... Ta>
void delay(const unsigned &ms_delay, T &func, Ta ... args)
{
std::chrono::time_point<std::chrono::high_resolution_clock> start = std::chrono::high_resolution_clock::now();
double elapsed;
do {
std::chrono::time_point<std::chrono::high_resolution_clock> end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration<double, std::milli>(end - start).count();
} while(elapsed <= ms_delay);
func(args...);
}
int main()
{
func_to_be_delayed();
const short iterations = 5;
for (int i = iterations; i >= 0; --i)
{
auto i0 = std::async(std::launch::async, [i]{ delay((i+1)*1000, func_to_be_delayed, i, (i+1)*1000); } );
// Will arrive with difference from previous
auto i1 = std::async(std::launch::async, [i]{ delay(i*1000, func_to_be_delayed, i, i*1000); } );
func_to_be_delayed();
// Loop will wait for all calls
}
}
Notice: this method potentially will spawn additional thread on each call with std::launch::async type of policy.
Standard solution is to implement event loop.
If you use some library, framework, system API, then most probably there is something similar provided to solve this kind of problem.
For example Qt has QApplication which provides this loop and there is QTimer.
boost::asio has io_context which provides even loop in which timer can be run boost::asio::deadline_timer.
You can also try implement such event loop yourself.
Example wiht boost:
#include <boost/asio.hpp>
#include <boost/date_time.hpp>
#include <exception>
#include <iostream>
void printTime(const std::string& label)
{
auto timeLocal = boost::posix_time::second_clock::local_time();
boost::posix_time::time_duration durObj = timeLocal.time_of_day();
std::cout << label << " time = " << durObj << '\n';
}
int main() {
boost::asio::io_context io_context;
try {
boost::asio::deadline_timer timer{io_context};
timer.expires_from_now(boost::posix_time::seconds(5));
timer.async_wait([](const boost::system::error_code& error){
if (!error) {
printTime("boom");
} else {
std::cerr << "Error: " << error << '\n';
}
});
printTime("start");
io_context.run();
} catch (const std::exception& e) {
std::cerr << e.what() << '\n';
}
return 0;
}
https://godbolt.org/z/nEbTvMhca
C++20 introduces coroutines, this could be a good solution too.

Measuring execution time when using threads

I would like to measure the execution time of some code. The code starts in the main() function and finishes in an event handler.
I have a C++11 code that looks like this:
#include <iostream>
#include <time.h>
...
volatile clock_t t;
void EventHandler()
{
// when this function called is the end of the part that I want to measure
t = clock() - t;
std::cout << "time in seconds: " << ((float)t)/CLOCKS_PER_SEC;
}
int main()
{
MyClass* instance = new MyClass(EventHandler); // this function starts a new std::thread
instance->start(...); // this function only passes some data to the thread working data, later the thread will call EventHandler()
t = clock();
return 0;
}
So it is guaranteed that the EventHandler() will be called only once, and only after an instance->start() call.
It is working, this code give me some output, but it is a horrible code, it uses global variable and different threads access global variable. However I can't change the used API (the constructor, the way the thread calls to EventHandler).
I would like to ask if a better solution exists.
Thank you.
Global variable is unavoidable, as long as MyClass expects a plain function and there's no way to pass some context pointer along with the function...
You could write the code in a slightly more tidy way, though:
#include <future>
#include <thread>
#include <chrono>
#include <iostream>
struct MyClass
{
typedef void (CallbackFunc)();
constexpr explicit MyClass(CallbackFunc* handler)
: m_handler(handler)
{
}
void Start()
{
std::thread(&MyClass::ThreadFunc, this).detach();
}
private:
void ThreadFunc()
{
std::this_thread::sleep_for(std::chrono::seconds(5));
m_handler();
}
CallbackFunc* m_handler;
};
std::promise<std::chrono::time_point<std::chrono::high_resolution_clock>> gEndTime;
void EventHandler()
{
gEndTime.set_value(std::chrono::high_resolution_clock::now());
}
int main()
{
MyClass task(EventHandler);
auto trigger = gEndTime.get_future();
auto startTime = std::chrono::high_resolution_clock::now();
task.Start();
trigger.wait();
std::chrono::duration<double> diff = trigger.get() - startTime;
std::cout << "Duration = " << diff.count() << " secs." << std::endl;
return 0;
}
clock() call will not filter out executions of different processes and threads run by scheduler in parallel with program event handler thread. There are alternative like times() and getrusage() which tells cpu time of process. Though it is not clearly mentioned about thread behaviour for these calls but if it is Linux, threads are treated as processes but it has to be investigated.
clock() is the wrong tool here, because it does not count the time actually required by the CPU to run your operation, for example, if the thread is not running at all, the time is still counted.
Instead you have to use platform-specific APIs, such as pthread_getcpuclockid for POSIX-compliant systems (Check if _POSIX_THREAD_CPUTIME is defined), that counts the actual time spent by a specific thread.
You can take a look at a benchmarking library I wrote for C++ that supports thread-aware measuring (see struct thread_clock implementation).
Or, you can use the code snippet from the man page:
/* Link with "-lrt" */
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
#define handle_error_en(en, msg) \
do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)
static void *
thread_start(void *arg)
{
printf("Subthread starting infinite loop\n");
for (;;)
continue;
}
static void
pclock(char *msg, clockid_t cid)
{
struct timespec ts;
printf("%s", msg);
if (clock_gettime(cid, &ts) == -1)
handle_error("clock_gettime");
printf("%4ld.%03ld\n", ts.tv_sec, ts.tv_nsec / 1000000);
}
int
main(int argc, char *argv[])
{
pthread_t thread;
clockid_t cid;
int j, s;
s = pthread_create(&thread, NULL, thread_start, NULL);
if (s != 0)
handle_error_en(s, "pthread_create");
printf("Main thread sleeping\n");
sleep(1);
printf("Main thread consuming some CPU time...\n");
for (j = 0; j < 2000000; j++)
getppid();
pclock("Process total CPU time: ", CLOCK_PROCESS_CPUTIME_ID);
s = pthread_getcpuclockid(pthread_self(), &cid);
if (s != 0)
handle_error_en(s, "pthread_getcpuclockid");
pclock("Main thread CPU time: ", cid);
/* The preceding 4 lines of code could have been replaced by:
pclock("Main thread CPU time: ", CLOCK_THREAD_CPUTIME_ID); */
s = pthread_getcpuclockid(thread, &cid);
if (s != 0)
handle_error_en(s, "pthread_getcpuclockid");
pclock("Subthread CPU time: 1 ", cid);
exit(EXIT_SUCCESS); /* Terminates both threads */
}

Why can't I get a loop to spin for less than a millisecond in C++ on Windows with Chrono?

EDIT: I am using VS2013 and on Windows 7.
With the below code I'd expect to be able to have a time difference of at least one microsecond, however, when executed it builds it up to at least 1000 microseconds (one millisecond). What is the reasoning I'm not able to get a time lower then one millisecond? Is there any way around this?
// SleepTesting.cpp : Defines the entry point for the console application.
//
#include <chrono>
#include "windows.h"
#include <iostream>
int _tmain(int argc, _TCHAR* argv[])
{
FILETIME startFileTime, endFileTime;
uint64_t ullStartTime, ullEndTime;
bool sleep = true;
auto start = std::chrono::system_clock::now();
auto now = std::chrono::system_clock::now();
auto elapsedTime = std::chrono::duration_cast<std::chrono::microseconds>(now - start);
GetSystemTimeAsFileTime(&startFileTime);
ullStartTime = static_cast<uint64_t>(startFileTime.dwHighDateTime) << 32 | startFileTime.dwLowDateTime;
while (sleep)
{
now = std::chrono::system_clock::now();
elapsedTime = std::chrono::duration_cast < std::chrono::microseconds > (now - start);
if (elapsedTime.count() > 0)
{
sleep = false;
}
}
GetSystemTimeAsFileTime(&endFileTime);
ullEndTime = static_cast<uint64_t>(endFileTime.dwHighDateTime) << 32 | endFileTime.dwLowDateTime;
uint64_t timeDifferenceHundredsOfNano = ullEndTime - ullStartTime;
std::cout << "Elapsed time with Chrono library: " << elapsedTime.count() << " micro-seconds" << std::endl;
std::cout << "Elapsed time with Windows.h FILETIME: " << timeDifferenceHundredsOfNano << " hundreds of nanoseconds" << std::endl;
return 0;
}
since you're using system_clock, I think that you can't get micro-second resolution on windows 7 (at least from what I've seen).
try high-resolution clock, but even that won't always work, since windows doesn't even guarantee that time elapsed between two consecutive operations is less then one millisecond, even without sleeping
IIRC in VS2013 the system_clock (and highres_clock) are in clock ticks i.e. ms. If you need higher resolution you can go all Windows and take a look at QueryPerformanceCounter.
LARGE_INTEGER startCount;
LARGE_INTEGER endCount;
LARGE_INTEGER frequency;
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&startCount);
{...}
QueryPerformanceCounter(&endCount);
double startTimeInMicroSec = startCount.QuadPart * (1000000.0 / frequency.QuadPart);
double endTimeInMicroSec = endCount.QuadPart * (1000000.0 / frequency.QuadPart);
// endTimeInMicroSec - startTimeInMicroSec
disclaimer: ocular compilation

Calculate time to execute a function

I need to calculate the execution time of a function.
Currently, I use time.h
At the beginning of the function:
time_t tbegin,tend;
double texec=0.000;
time(&tbegin);
Before the return:
time(&tend);
texec = difftime(tend,tbegin);
It works fine but give me a result in texec as a integer.
How can I have my execution time in milliseconds ?
Most of the simple programs have computation time in milliseconds. So, I suppose, you will find this useful.
#include <time.h>
#include <stdio.h>
int main() {
clock_t start = clock();
// Executable code
clock_t stop = clock();
double elapsed = (double)(stop - start) * 1000.0 / CLOCKS_PER_SEC;
printf("Time elapsed in ms: %f\n", elapsed);
}
If you want to compute the run-time of the entire program and you are on a Unix system, run your program using the time command, like this: time ./a.out
You can use a lambda with auto parameters in C++14 to time your other functions. You can pass the parameters of the timed function to your lambda. I'd do it like this:
// Timing in C++14 with auto lambda parameters
#include <iostream>
#include <chrono>
// need C++14 for auto lambda parameters
auto timing = [](auto && F, auto && ... params)
{
auto start = std::chrono::steady_clock::now();
std::forward<decltype(F)>(F)
(std::forward<decltype(params)>(params)...); // execute the function
return std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now() - start).count();
};
void f(std::size_t numsteps) // we'll measure how long this function runs
{
// need volatile, otherwise the compiler optimizes the loop
for (volatile std::size_t i = 0; i < numsteps; ++i);
}
int main()
{
auto taken = timing(f, 500'000'000); // measure the time taken to run f()
std::cout << "Took " << taken << " milliseconds" << std::endl;
taken = timing(f, 100'000'000); // measure again
std::cout << "Took " << taken << " milliseconds" << std::endl;
}
The advantage is that you can pass any callable object to the timing lambda.
But if you want to keep it simple, you can just do:
auto start = std::chrono::steady_clock::now();
your_function_call_here();
auto end = std::chrono::steady_clock::now();
auto taken = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
std::cout << taken << " milliseconds";
If you know you're not going to change the system time during the run, you can use a std::chrono::high_resolution_clock instead, which may be more precise. std::chrono::steady_clock is however un-sensitive to system time changes during the run.
PS: In case you need to time a member function, you can do:
// time member functions
template<class Return, class Object, class... Params1, class... Params2>
auto timing(Return (Object::*fp)(Params1...), Params2... params)
{
auto start = std::chrono::steady_clock::now();
(Object{}.*fp)(std::forward<decltype(params)>(params)...);
return std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::steady_clock::now() - start).count();
};
then use it as
// measure the time taken to run X::f()
auto taken = timing(&X::f, 500'000'000);
std::cout << "Took " << taken << " milliseconds" << std::endl;
to time e.g. X::f() member function.
You can create a function like this source:
typedef unsigned long long timestamp_t;
static timestamp_t
timestampinmilliseconf ()
{
struct timeval now;
gettimeofday (&now, NULL);
return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}
Then you can use this to get the time difference.
timestamp_t time1 = get_timestamp();
// Your function
timestamp_t time2 = get_timestamp();
For windows you can use this function:
#ifdef WIN32
#include <Windows.h>
#else
#include <sys/time.h>
#include <ctime>
#endif
typedef long long int64; typedef unsigned long long uint64;
/* Returns the amount of milliseconds elapsed since the UNIX epoch. Works on both
* windows and linux. */
int64 GetTimeMs64()
{
#ifdef WIN32
/* Windows */
FILETIME ft;
LARGE_INTEGER li;
/* Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
* to a LARGE_INTEGER structure. */
GetSystemTimeAsFileTime(&ft);
li.LowPart = ft.dwLowDateTime;
li.HighPart = ft.dwHighDateTime;
uint64 ret = li.QuadPart;
ret -= 116444736000000000LL; /* Convert from file time to UNIX epoch time. */
ret /= 10000; /* From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals */
return ret;
#else
/* Linux */
struct timeval tv;
gettimeofday(&tv, NULL);
uint64 ret = tv.tv_usec;
/* Convert from micro seconds (10^-6) to milliseconds (10^-3) */
ret /= 1000;
/* Adds the seconds (10^0) after converting them to milliseconds (10^-3) */
ret += (tv.tv_sec * 1000);
return ret;
#endif
}
Source
in the header <chrono> there is a
class std::chrono::high_resolution_clock
that does what you want. it's a bit involved to use though;
#include <chrono>
using namespace std;
using namespace chrono;
auto t1 = high_resolution_clock::now();
// do calculation here
auto t2 = high_resolution_clock::now();
auto diff = duration_cast<duration<double>>(t2 - t1);
// now elapsed time, in seconds, as a double can be found in diff.count()
long ms = (long)(1000*diff.count());

how to reduce the latency from one boost strand to another boost strand

Suppose there are several boost strand share_ptr stored in a vector m_poStrands. And tJobType is the enum indicated different type of job.
I found the time diff from posting a job in one strand (JOBA) to call the onJob of another strand (JOBB) is around 50 milli second.
I want to know if there is any way to reduce the time diff.
void postJob(tJobType oType, UINT8* pcBuffer, size_t iSize)
{
//...
m_poStrands[oType]->post(boost::bind(&onJob, this, oType, pcDestBuffer, iSize));
}
void onJob(tJobType oType, UINT8* pcBuffer, size_t iSize)
{
if (oType == JOBA)
{
//....
struct timeval sTV;
gettimeofday(&sTV, 0);
memcpy(pcDestBuffer, &sTV, sizeof(sTV));
pcDestBuffer += sizeof(sTV);
iSize += sizeof(sTV);
memcpy(pcDestBuffer, pcBuffer, iSize);
m_poStrands[JOBB]->(boost::bind(&onJob, this, JOBB, pcDestBuffer, iSize));
}
else if (oType == JOBB)
{
// get the time from buffer
// and calculate the dime diff
struct timeval eTV;
gettimeofday(&eTV, 0);
}
}
Your latency is probably coming from the memcpys between your gettimeofdays. Here's an example program I ran on my machine (2 ghz core 2 duo). I'm getting thousands of nanoseconds. So a few microseconds. I doubt that your system is running 4 orders of magnitude slower than mine. The worst I ever saw it run was 100 microsecond for one of the two tests. I tried to make the code as close to the code posted as possible.
#include <boost/asio.hpp>
#include <boost/chrono.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
#include <iostream>
struct Test {
boost::shared_ptr<boost::asio::strand>* strands;
boost::chrono::high_resolution_clock::time_point start;
int id;
Test(int i, boost::shared_ptr<boost::asio::strand>* strnds)
: id(i),
strands(strnds)
{
strands[0]->post(boost::bind(&Test::callback,this,0));
}
void callback(int i) {
if (i == 0) {
start = boost::chrono::high_resolution_clock::now();
strands[1]->post(boost::bind(&Test::callback,this,1));
} else {
boost::chrono::nanoseconds sec = boost::chrono::high_resolution_clock::now() - start;
std::cout << "test " << id << " took " << sec.count() << " ns" << std::endl;
}
}
};
int main() {
boost::asio::io_service io_service_;
boost::shared_ptr<boost::asio::strand> strands[2];
strands[0] = boost::shared_ptr<boost::asio::strand>(new boost::asio::strand(io_service_));
strands[1] = boost::shared_ptr<boost::asio::strand>(new boost::asio::strand(io_service_));
boost::thread t1 (boost::bind(&boost::asio::io_service::run, &io_service_));
boost::thread t2 (boost::bind(&boost::asio::io_service::run, &io_service_));
Test test1 (1, strands);
Test test2 (2, strands);
t1.join();
t2.join();
}