Why are gettimeofday() intervals occasionally negative? - c++

I have an experimental library whose performance I'm trying to measure. To do this, I've written the following:
struct timeval begin;
gettimeofday(&begin, NULL);
{
// Experiment!
}
struct timeval end;
gettimeofday(&end, NULL);
// Print the time it took!
std::cout << "Time: " << 100000 * (end.tv_sec - begin.tv_sec) + (end.tv_usec - begin.tv_usec) << std::endl;
Occasionally, my results include negative timings, some of which are nonsensical. For instance:
Time: 226762
Time: 220222
Time: 210883
Time: -688976
What's going on?

You've got a typo. Corrected last line (note the number of 0s):
std::cout << "Time: " << 1000000 * (end.tv_sec - begin.tv_sec) + (end.tv_usec - begin.tv_usec) << std::endl;
BTW, timersub is a built in method to get the difference between two timevals.

The posix realtime libraries are better suited for measurement of high accuracy intervals. You really don't want to know the current time. You just want to know how long it has been between two points. That is what the monotonic clock is for.
struct timespec begin;
clock_gettime( CLOCK_MONOTONIC, &begin );
{
// Experiment!
}
struct timespec end;
clock_gettime(CLOCK_MONOTONIC, &end );
// Print the time it took!
std::cout << "Time: " << double(end.tv_sec - begin.tv_sec) + (end.tv_nsec - begin.tv_nsec)/1000000000.0 << std::endl;
When you link you need to add -lrt.
Using the monotonic clock has several advantages. It often uses the hardware timers (Hz crystal or whatever), so it is often a faster call than gettimeofday(). Also monotonic timers are guaranteed to never go backwards even if ntpd or a user is goofing with the system time.

You took care of the negative value but it still isn't correct. The difference between the millisecond variables is erroneous, say we have begin and end times as 1.100s and 2.051s. By the accepted answer this would be an elapsed time of 1.049s which is incorrect.
The below code takes care of the cases where there is only a difference of milliseconds but not seconds and the case where the milliseconds value overflows.
if(end.tv_sec==begin.tv_sec)
printf("Total Time =%ldus\n",(end.tv_usec-begin.tv_usec));
else
printf("Total Time =%ldus\n",(end.tv_sec-begin.tv_sec-1)*1000000+(1000000-begin.tv_usec)+end.tv_usec);

std::cout << "Time: " << 100000 * (end.tv_sec - begin.tv_sec) + (end.tv_usec - begin.tv_usec) << std::endl;
As noted, there are 1000000 usec in a sec, not 100000.
More generally, you may need to be aware of the instability of timing on computers. Processes such as ntpd can change clock times, leading to incorrect delta times. You might be interested in POSIX facilities such as timer_create.

do
$ time ./proxy-application
next time

Related

'Cheapest' way of getting time stamp in Linux (c++)

I am wondering what is the cheapest way of getting a timestamp in Linux (in c++).
I assume it's an accuracy trade-of so I believe there is more than 1 possibility.
I need to have milliseconds but not necessarily microseconds, so std::localtime isn't an option and gettimeofday is probably too costly. (due to the microseconds accuracy).
1: fprintf(stdout, "%u\n", (unsigned)time(NULL));
2: struct timeval tv;
gettimeofday(&tv,NULL);
tv.tv_sec // seconds
tv.tv_usec // microseconds
3: std::time_t result = std::time(nullptr);
std::cout << std::asctime(std::localtime(&result))
<< result << " seconds since the Epoch\n";
4:
using namespace std::chrono;
milliseconds ms = duration_cast< milliseconds >(
high_resolution_clock::now().time_since_epoch()
);`
I would suggest ctime library and the following code:
std::time_t timestamp = std::time(nullptr);
std::cout << std::asctime(std::localtime(&result))
<< result << " seconds since the Epoch\n";
This would return in "result" variable the number of seconds since the Epoch (pretty straight forward). There are ways to convert this into a readable format, but that, as you said, is costly. Getting that number would be very efficient because it only reads the variable from the system instead of converting / concatenating and calculating a readable date.

Odd results when adding artificial delays to C++ code. Embedded Linux

I have been looking at the performance of our C++ server application running on embedded Linux (ARM). The pseudo code for the main processing loop of the server is this -
for i = 1 to 1000
Process item i
Sleep for 20 ms
The processing for one item takes about 2ms. The "Sleep" here is really a call to the Poco library to do a "tryWait" on an event. If the event is fired (which it never is in my tests) or the time expires, it comes returns. I don't know what system call this equates to. Although we ask for a 2ms block, it turns out to be roughly 20ms. I can live with that - that's not the problem. The sleep is just an artificial delay so that other threads in the process are not starved.
The loop takes about 24 seconds to go through 1000 items.
The problem is, we changed the way the sleep is used so that we had a bit more control. I mean - 20ms delay for 2ms processing doesn't allow us to do much processing. With this new parameter set to a certain value it does something like this -
For i = 1 to 1000
Process item i
if i % 50 == 0 then sleep for 1000ms
That's the rough code, in reality the number of sleeps is slightly different and it happens to work out at a 24s cycle to get through all the items - just as before.
So we are doing exactly the same amount of processing in the same amount of time.
Problem 1 - the CPU usage for the original code is reported at around 1% (it varies a little but that's about average) and the CPU usage reported for the new code is about 5%. I think they should be the same.
Well perhaps this CPU reporting isn't accurate so I thought I'd sort a large text file at the same time and see how much it's slowed up by our server. This is a CPU bound process (98% CPU usage according to top). The results are very odd. With the old code, the time taken to sort the file goes up by 21% when our server is running.
Problem 2 - If the server is only using 1% of the CPU then wouldn't the time taken to do the sort be pretty much the same?
Also, the time taken to go through all the items doesn't change - it's still 24 seconds with or without the sort running.
Then I tried the new code, it only slows the sort down by about 12% but it now takes about 40% longer to get through all the items it has to process.
Problem 3 - Why do the two ways of introducing an artificial delay cause such different results. It seems that the server which sleeps more frequently but for a minimum time is getting more priority.
I have a half baked theory on the last one - whatever the system call that is used to do the "sleep" is switching back to the server process when the time is elapsed. This gives the process another bite at the time slice on a regular basis.
Any help appreciated. I suspect I'm just not understanding it correctly and that things are more complicated than I thought. I can provide more details if required.
Thanks.
Update: replaced tryWait(2) with usleep(2000) - no change. In fact, sched_yield() does the same.
Well I can at least answer problem 1 and problem 2 (as they are the same issue).
After trying out various options in the actual server code, we came to the conclusion that the CPU reporting from the OS is incorrect. It's quite result so to make sure, I wrote a stand alone program that doesn't use Poco or any of our code. Just plain Linux system calls and standard C++ features. It implements the pseudo code above. The processing is replaced with a tight loop just checking the elapsed time to see if 2ms is up. The sleeps are proper sleeps.
The small test program shows exactly the same problem. i.e. doing the same amount of processing but splitting up the way the sleep function is called, produces very different results for CPU usage. In the case of the test program, the reported CPU usage was 0.0078 seconds using 1000 20ms sleeps but 1.96875 when a less frequent 1000ms sleep was used. The amount of processing done is the same.
Running the test on a Linux PC did not show the problem. Both ways of sleeping produced exactly the same CPU usage.
So clearly a problem with our embedded system and the way it measures CPU time when a process is yielding so often (you get the same problem with sched_yeild instead of a sleep).
Update: Here's the code. RunLoop is where the main bit is done -
int sleepCount;
double getCPUTime( )
{
clockid_t id = CLOCK_PROCESS_CPUTIME_ID;
struct timespec ts;
if ( id != (clockid_t)-1 && clock_gettime( id, &ts ) != -1 )
return (double)ts.tv_sec +
(double)ts.tv_nsec / 1000000000.0;
return -1;
}
double GetElapsedMilliseconds(const timeval& startTime)
{
timeval endTime;
gettimeofday(&endTime, NULL);
double elapsedTime = (endTime.tv_sec - startTime.tv_sec) * 1000.0; // sec to ms
elapsedTime += (endTime.tv_usec - startTime.tv_usec) / 1000.0; // us to ms
return elapsedTime;
}
void SleepMilliseconds(int milliseconds)
{
timeval startTime;
gettimeofday(&startTime, NULL);
usleep(milliseconds * 1000);
double elapsedMilliseconds = GetElapsedMilliseconds(startTime);
if (elapsedMilliseconds > milliseconds + 0.3)
std::cout << "Sleep took longer than it should " << elapsedMilliseconds;
sleepCount++;
}
void DoSomeProcessingForAnItem()
{
timeval startTime;
gettimeofday(&startTime, NULL);
double processingTimeMilliseconds = 2.0;
double elapsedMilliseconds;
do
{
elapsedMilliseconds = GetElapsedMilliseconds(startTime);
} while (elapsedMilliseconds <= processingTimeMilliseconds);
if (elapsedMilliseconds > processingTimeMilliseconds + 0.1)
std::cout << "Processing took longer than it should " << elapsedMilliseconds;
}
void RunLoop(bool longSleep)
{
int numberOfItems = 1000;
timeval startTime;
gettimeofday(&startTime, NULL);
timeval startMainLoopTime;
gettimeofday(&startMainLoopTime, NULL);
for (int i = 0; i < numberOfItems; i++)
{
DoSomeProcessingForAnItem();
double elapsedMilliseconds = GetElapsedMilliseconds(startTime);
if (elapsedMilliseconds > 100)
{
std::cout << "Item count = " << i << "\n";
if (longSleep)
{
SleepMilliseconds(1000);
}
gettimeofday(&startTime, NULL);
}
if (longSleep == false)
{
// Does 1000 * 20 ms sleeps.
SleepMilliseconds(20);
}
}
double elapsedMilliseconds = GetElapsedMilliseconds(startMainLoopTime);
std::cout << "Main loop took " << elapsedMilliseconds / 1000 <<" seconds\n";
}
void DoTest(bool longSleep)
{
timeval startTime;
gettimeofday(&startTime, NULL);
double startCPUtime = getCPUTime();
sleepCount = 0;
int runLoopCount = 1;
for (int i = 0; i < runLoopCount; i++)
{
RunLoop(longSleep);
std::cout << "**** Done one loop of processing ****\n";
}
double endCPUtime = getCPUTime();
std::cout << "Elapsed time is " <<GetElapsedMilliseconds(startTime) / 1000 << " seconds\n";
std::cout << "CPU time used is " << endCPUtime - startCPUtime << " seconds\n";
std::cout << "Sleep count " << sleepCount << "\n";
}
void testLong()
{
std::cout << "Running testLong\n";
DoTest(true);
}
void testShort()
{
std::cout << "Running testShort\n";
DoTest(false);
}

precise time measurement

I'm using time.h in C++ to measure the timing of a function.
clock_t t = clock();
someFunction();
printf("\nTime taken: %.4fs\n", (float)(clock() - t)/CLOCKS_PER_SEC);
however, I'm always getting the time taken as 0.0000. clock() and t when printed separately, have the same value. I would like to know if there is way to measure the time precisely (maybe in the order of nanoseconds) in C++ . I'm using VS2010.
C++11 introduced the chrono API, you can use to get nanoseconds :
auto begin = std::chrono::high_resolution_clock::now();
// code to benchmark
auto end = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end-begin).count() << "ns" << std::endl;
For a more relevant value it is good to run the function several times and compute the average :
auto begin = std::chrono::high_resolution_clock::now();
uint32_t iterations = 10000;
for(uint32_t i = 0; i < iterations; ++i)
{
// code to benchmark
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end-begin).count();
std::cout << duration << "ns total, average : " << duration / iterations << "ns." << std::endl;
But remember the for loop and assigning begin and end var use some CPU time too.
I usually use the QueryPerformanceCounter function.
example:
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
// start timer
QueryPerformanceCounter(&t1);
// do something
...
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
The following text, that i completely agree with, is quoted from Optimizing software in C++ (good reading for any C++ programmer) -
The time measurements may require a very high resolution if time
intervals are short. In Windows, you can use the
GetTickCount or
QueryPerformanceCounter functions for millisecond resolution. A much
higher resolution can be obtained with the time stamp counter in the
CPU, which counts at the CPU clock frequency.
There is a problem that "the clock frequency may vary dynamically and that
measurements are unstable due to interrupts and task switches."
In C or C++ I usually do like below. If it still fails you may consider using rtdsc functions
struct timeval time;
gettimeofday(&time, NULL); // Start Time
long totalTime = (time.tv_sec * 1000) + (time.tv_usec / 1000);
//........ call your functions here
gettimeofday(&time, NULL); //END-TIME
totalTime = (((time.tv_sec * 1000) + (time.tv_usec / 1000)) - totalTime);

boost chrono endTime - startTime returns negative in boost 1.51

The boost chrono library vs1.51 on my macbook pro returns negative times when I substract endTime - startTime. If you print the timepoints you see that the end time is earlier than the startTime. How can this happen?
typedef boost::chrono::steady_clock clock_t;
clock_t clock;
// Start time measurement
boost::chrono::time_point<clock_t> startTime = clock.now();
short test_times = 7;
// Spend some time...
for ( int i=0; i<test_times; ++i )
{
xnodeptr spResultDoc=parser.parse(inputSrc);
xstring sXmlResult = spResultDoc->str();
const char16_t* szDbg = sXmlResult.c_str();
BOOST_CHECK(spResultDoc->getNodeType()==xnode::DOCUMENT_NODE && sXmlResult == sXml);
}
// Stop time measurement
boost::chrono::time_point<clock_t> endTime = clock.now();
clock_t::duration elapsed( endTime - startTime);
std::cout << std::endl;
std::cout << "Now time: " << clock.now() << std::endl;
std::cout << "Start time: " << startTime << std::endl;
std::cout << "End time: " << endTime << std::endl;
std::cout << std::endl << "Total Parse time: " << elapsed << std::endl;
std::cout << "Avarage Parse time per iteration: " << (boost::chrono::duration_cast<boost::chrono::milliseconds>(elapsed) / test_times) << std::endl;
I tried different clocks but no difference.
Any help would be appreciated!
EDIT: Forgot to add the output:
Now time: 1 nanosecond since boot
Start time: 140734799802912 nanoseconds since boot
End time: 140734799802480 nanoseconds since boot
Total Parse time: -432 nanoseconds
Avarage Parse time per iteration: 0 milliseconds
Hyperthreading or just scheduling interference, the Boost implementation punts monotonic support to the OS:
POSIX: clock_gettime (CLOCK_MONOTONIC) although it still may fail due to kernel errors handling hyper-threading when calibrating the system.
WIN32: QueryPerformanceCounter() which on anything but Nehalem architecture or newer is not going to be monotonic across cores and threads.
OSX: mach_absolute_time(), i.e. the steady & high resolution clocks are the same. The source code shows that it uses RDTSC thus strict dependency upon hardware stability: i.e. no guarantees.
Disabling hyperthreading is a recommended way to go, but say on Windows you are really limited. Aside of dropping timer resolution the only available method is direct access to the underlying hardware timers whilst ensuring thread affinity.
It looks like a good time to submit a bug to Boost, I would recommend:
Win32: Use GetTick64Count(), as discussed here.
OSX: Use clock_get_time (SYSTEM_CLOCK) according to this question.

boost::date_time (boost-145) using a 64-bit uint with microsec calculations, without truncation

I am using date_time to abstract away platform peculiarities. and I need to produce a 64-bit microsec resolution uint64_t which will be used in serialization. I do not understand what is going wrong below.
#include <boost/date_time/posix_time/posix_time.hpp>
#include <boost/cstdint.hpp>
#include <iostream>
using namespace boost::posix_time;
using boost::uint64_t;
ptime UNIX_EPOCH(boost::gregorian::date(1970,1,1));
int main() {
ptime current_time = microsec_clock::universal_time();
std::cout << "original time: "<< current_time << std::endl;
long microsec_since_epoch = ((current_time -UNIX_EPOCH).total_microseconds());
ptime output_ptime = UNIX_EPOCH + microseconds(microsec_since_epoch);
std::cout << "Deserialized time : " << output_ptime << std::endl;
std::cout << "Microsecond output: " << microsec_since_epoch << std::endl;
std::cout << "Microsecond to second arithmetic: "
<< microsec_since_epoch/(10*10*10*10*10*10) << std::endl;
std::cout << "Microsecond to tiume_duration, back to microsecond : " <<
microseconds(microsec_since_epoch).total_microseconds() << std::endl;
return 0;
}
Here is the output I get.
original time: 2010-Dec-17 09:52:06.737123
Deserialized time : 1970-Jan-16 03:10:41.577454
Microsecond output: 1292579526737123
Microsecond to second arithmetic: 1292579526
Microsecond to tiume_duration, back to microsecond : 1307441577454
When I switch to using total_seconds() and +seconds(..) The problems dissapear --i.e., input changes to:
2010-Dec-15 18:26:22.606978
2010-Dec-15 18:26:22
date_time claims to use a 64-bit type internally, and 2^64÷ (10^6×3600×24×365) ~= 584942 even 2^60÷ (10^6×3600×24×365) ~= 36558.
The opening lines from wikipedia have this to say about Posix time
Unix time, or POSIX time, is a system
for describing points in time, defined
as the number of seconds elapsed since
midnight Coordinated Universal Time
(UTC) of January 1, 1970
Why is such Massive truncation going on 40 years down the line ?
How do I use the full 64-bit space with microsecond resolution using boost::date_time ?
--edit1 in response to hans--
The post has been changed to reflect the integer output of the duration.total_microseconds() part. Note 1292576572566904÷(10^6×3600×24×365) ~= 40.98 years. The output from seconds has not been updated.
--edit2--
Downscaling the microseconds to seconds before the "deserialization" step, also works well. This approach solved my problem,I only need the microsecond resolution at creation, and I can live without it at deserialization.
I do still want to know the what and why of the problem.
This seems to be a problem with the microseconds() not be ing able to handle such a large microseconds input. The following snippit is a fix to this problem:
#define MICROSEC 1000000
uint64_t sec_epoch = microsec_since_epoch / MICROSEC;
uint64_t mod_micro_epoch= microsec_since_epoch % MICROSEC;
ptime new_method = UNIX_EPOCH + seconds(sec_epoch) + microseconds(mod_micro_epoch);
std::cout << "Deserialization with new method: " << new_method << std::endl;
The return type for total_microseconds() is tick_type, not long. Looks like you're compiling this with a compiler that has a 32-bit long type. Much to small to store 40 years worth of microseconds.