Retargeting newlib for c++ chrono

Retargeting newlib for c++ chrono - c++

I am using arm-none-eabi toolchain with newlib to target a custom board with an ARM Cortex-M0+ (specifically the MCU-on-eclipse version of the toolchain). I am compiling/linking with -nostartfiles and --specs=nano.specs and have re-targeted stdout and stderr to USB and a serial port respectively. I have created implementations for most of the C system calls.
I am using the chrono library with two custom clock, the now() functions get RTC time or my systick timer. It seems like this mirrors the purpose of the standard steady_clock and system_clock and so I though I could try using them.
to do so I had to implement the gettimeofday syscall which I did
// returning a set time of one second just for testing
int _gettimeofday(struct timeval* tv, void* tz) {
tv->tv_sec = 1;
tv->tv_usec = 255;
return 0;
}
my main code is as follows:
int main(void)
{
HWInit();
static std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();
static std::chrono::system_clock::time_point t2 = std::chrono::system_clock::now();
int64_t count1 = t1.time_since_epoch().count();
int64_t count2 = t2.time_since_epoch().count();
printf("Time 1: %lld\n Time 2: %lld\n", count1, count2);
for(;;){}
return 0;
}
using the debugger I can see that both steady_clock::now() and sysytem_clock::now() call my _gettimeofday() function and both end up with the exact same time-point.
of course if I try to do the following I get multiple definition errors:
using SysClock = std::chrono::system_clock;
SysClock::time_point SysClock::now() noexcept {
return SysClock::time_point( SysClock::duration(1983) );
}
So can I somehow overload the now() functions of the standard chrono clocks? or maybe the entire clock implementation with my own duration and rep typedefs that match the hardware better? I can overload new and delete for my embedded system (and should), so doing this for chrono would also be nice.

From gccs libstdc++ chrono.cc:
system_clock::now() uses gettimeofday(&tv, 0); or clock_gettime(CLOCK_REALTIME, &tp); or syscall. If gettimeofday works for you, that means it uses it.
steady_clock::now() uses clock_gettime(CLOCK_MONOTONIC, &tp);. So you should overload clock_gettime and handle CLOCK_MONOTONIC argument.
There is no _clock_gettime_r function provided by newlib, as one in _gettimeofday_t that passes newlib's struct reent around. If you want to handle multithreading within newlib, it's good to write your own similar wrapper that handles _reent->errno value. But the bet would be to overload _gettimeofday_r function as you aim only at newlib.

Instead of trying to change the behavior of system_clock and steady_clock, I recommend just writing your own custom clocks and using them. That way you can better tailor them to your hardware and needs. If you have some way to get the current time, creating a custom chrono clock to wrap that function is very easy.
class SysClock
{
public:
// 500MHz, or whatever you need
using period = std::ratio<1, 500'000'000>;
using rep = long long;
using duration = std::chrono::duration<rep, period>;
using time_point = std::chrono::time_point<SysClcok>;
static constexpr bool is_steady = true;
static time_point now() noexcept
{
return time_point{duration{
/*turn SysTick_getValue() into the number of ticks since epoch*/}};
}
};
Now use SysClock::now() in your code instead of system_clock::now(). This gives you SysClock::time_point and chrono::durations result from the subtraction of two SysClock::time_points.
If you can turn your low-level "now" into a count of ticks against some epoch, and you can describe those ticks as a compile-time fraction of a second with period, then you're good to go.

Related

Clock timing changes on different computers

I'm working on an implementation for the DMG-01 (A.K.A gameboy 1989) on my github.
I've already implemented both the APU and the PPU, with (almost) perfect timing on my pc (and the pc of my friends).
However, when I run the emulator on one of my friend's pc, it runs twice as fast as mine or the rest of my friends.
The code for syncronizing the clock (between the gameboy and the pc it's running on) is as follows:
Clock.h Header File:
class Clock
{
// ...
public:
void SyncClock();
private:
/* API::LR35902_HZ_CLOCK is 4'194'304 */
using lr35902_clock_period = std::chrono::duration<int64_t, std::ratio<1, API::LR35902_HZ_CLOCK>>;
static constexpr lr35902_clock_period one_clock_period{1};
using clock = std::chrono::high_resolution_clock;
private:
decltype(clock::now()) _last_tick{std::chrono::time_point_cast<clock::duration>(clock::now() + one_clock_period)};
};
Clock.cpp file
void Clock::SyncClock()
{
// Sleep until one tick has passed.
std::this_thread::sleep_until(this->_last_tick);
// Use time_point_cast to convert (via truncation towards zero) back to
// the "native" duration of high_resolution_clock
this->_last_tick = std::chrono::time_point_cast<clock::duration>(this->_last_tick + one_clock_period);
}
Which gets called in main.cpp like this:
int main()
{
// ...
while (true)
{
// processor.Clock() returns the number of clocks it took for the processor to run the
// current instruction. We need to sleep this thread for each clock passed.
for (std::size_t current_clock = processor.Clock(); current_clock > 0; --current_clock)
{
clock.SyncClock();
}
}
// ...
}
Is there a reason why chrono in this case would be affected in a different way in other computers? Time is absolute, I would understand why in one pc, running the emulator would be slower, but why faster?
I checked out the type of my clock (high_resolution_clock) but I don't see why this would be the case.
Thanks!

I think you may be running into overflow under the hood of <chrono>.
The expression:
clock::now() + one_clock_period
is problematic. clock is high_resolution_clock, and it is common for this to have nanoseconds resolution. one_clock_period has units of 1/4'194'304. The resultant expression will be a time_point with a period of 1/8'192'000'000'000.
Using signed 64 bit integral types, the max() on such a precision is slightly over 13 days. So if clock::now() returns a .time_since_epoch() greater than 13 days, _last_tick is going to overflow, and may some times be negative (depending on how much clock::now() is beyond 13 days).
To correct try casting one_clock_period to the precision of clock immediately:
static constexpr clock::duration one_clock_period{
std::chrono::duration_cast<clock::duration>(lr35902_clock_period{1})};

Discerne between draw time and swap wait

I have some routines that draw a scene, then I swap the buffers, and since I have swap wait set to 1, the call could block waiting for vertical synch.
Is it possible to measure how much time is spent in drawing the scene, and how much in waiting for the vertical sync? I tried to do the following:
start = clock();
drawstuff();
glFinish();
end = clock();
SwapBuffers();
swapend = clock();
but it doesn't seem to work, at least with my hardware and driver, because end and swapend are always the same.

Your clocks resolution is not enough. Use std::chrono, boost::chrono, or platform-specific clocks.
Example (ideone):
#include <chrono>
#include <thread>
#include <iostream>
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::nanoseconds;
using std::chrono::microseconds;
typedef std::chrono::high_resolution_clock myclock;
typedef myclock::time_point time_point;
typedef myclock::duration duration;
auto time_to_wait = microseconds(1000);
inline void doStuff()
{
std::this_thread::sleep_for(time_to_wait);
}
int main()
{
time_point start, end;
start = myclock::now();
doStuff();
end = myclock::now();
auto elapsed = duration_cast<microseconds>(end - start);
std::cout << elapsed .count() << " microseconds elapsed\n";
}
Notes:
Better use profiler.
Even better use graphics profiler.
You will get very different results on different platforms, different vendors and even drivers versions. Does it really make any sense to measure?
You don't really need to call glFinish()
If you're on Windows and MSVC compiler, don't use std::chrono. Use boost::chrono or QueryPerformanceCounter instead.
AFAIK Swap blocking depends on driver implementation. Typically it is non-blocking, due to vendors implement it in threaded way. If you're in doubt, it is always good idea to move your calculations (but not rendering) to separate thread.

c++ get elapsed time platform independent

For a game I wanna measure the time that has passed since the last frame.
I used glutGet(GLUT_ELAPSED_TIME) to do that. But after including glew the compiler can't find the glutGet function anymore (strange). So I need an alternative.
Most sites I found so far suggest using clock in ctime but that function only measures the cpu time of the program not the real time! The time function in ctime is only accurate to seconds. I need at least millisecond accuracy.
I can use C++11.

I don't think there is a high resolution clock built-in C++ before C++11. If you are unable to use C++11 you have to either fix your error with glut and glew or use the platform dependent timer functions.
#include <chrono>
class Timer {
public:
Timer() {
reset();
}
void reset() {
m_timestamp = std::chrono::high_resolution_clock::now();
}
float diff() {
std::chrono::duration<float> fs = std::chrono::high_resolution_clock::now() - m_timestamp;
return fs.count();
}
private:
std::chrono::high_resolution_clock::time_point m_timestamp;
};

Boost provides std::chrono like clocks: boost::chrono
You should consider using std::chrono::steady_clock (or boost equivalent) as opposed to std::chrono::high_resolution_clock - or at least ensure std::chrono::steady_clock::is_steady() == true - if you want to use it to calculate duration, as the time returned by a non-steady clock might even decrease as physical time moves forward.

program supporting real and non-real time modes

I am attempting to transition an existing program to use the new time facilities in C++11 from (homegrown) existing time classes. For real-time processing it is clear how to map the C++11 functionality into the homegrown time classes. It is less clear how the C++11 chrono time facilities can be used to support a non-real time mode (e.g., a "run as fast as you can batch mode", a "run at quarter speed demonstration mode", etc.) which the homegrown classes support. Is this accomplished via defining special clocks that are mapping the wall time to the "playback" speed properly? Any help appreciated and an example would be fantastic.
For example, the code I will transitioning has constructs such as
MessageQueue::poll( Seconds( 1 ) );
or
sleep( Minutes( 2 ) );
where the Seconds or Minutes object is aware of the speed at which the program is being run at to avoid having to use a multiplier or conversion function all of over the place like
MessageQueue::poll( PlaybackSpeed * Seconds( 1 ) );
or
MessageQueue::poll( PlaybackSpeed( Seconds( 1 ) ) );
What I was hoping was possible was to obtain the same sort of behavior with std::chrono::duration and std::chrono::time_point by providing a custom clock.

Whether or not making your own clock will be sufficient depends on how you use the time durations you create. For example if you wanted to run at half speed but somewhere called:
std::this_thread::sleep_for(std::chrono::minutes(2));
The duration would not be adjusted. Instead you'd need to use sleep_until and provide a time point that uses your 'slow' clock. But making a clock that runs slow is pretty easy:
template<typename Clock,int slowness>
struct slow_clock {
using rep = typename Clock::rep;
using period = typename Clock::period;
using duration = typename Clock::duration;
using time_point = std::chrono::time_point<slow_clock>;
constexpr static bool is_steady = Clock::is_steady;
static time_point now() {
return time_point(start_time.time_since_epoch() + ((Clock::now() - start_time)/slowness));
}
static const typename Clock::time_point start_time;
};
template<typename Clock,int slowness>
const typename Clock::time_point
slow_clock<Clock,slowness>::start_time = Clock::now();
The time_points returned from now() will appear to advance at a slower rate relative to the clock you give it. For example here's a program so you can watch nanoseconds slowly tick by:
int main() {
using Clock = slow_clock<std::chrono::high_resolution_clock,500000000>;
for(int i=0;i<10;++i) {
std::this_thread::sleep_until(Clock::now()
+ std::chrono::nanoseconds(1));
std::cout << "tick\n";
}
}
All of the functions you implement, like MessageQueue::poll(), will probably need to be implemented in terms of a global clock typedef.
Of course none of this has anything to do with with how fast the program actually runs, except insofar as you're slowing down the program based on them. Functions that time out will take longer, sleep_until will take be longer, but operations that don't wait for some time point in the future will simply appear to be faster.
// appears to run a million times faster than normal according to (finish-start)
auto start = slow_clock<steady_clock,1000000>::now();
do_slow_operation();
auto finish = slow_clock<steady_clock,1000000>::now();

For this case:
MessageQueue::poll( Seconds( 1 ) );
You could easily use the standard time classes if you just make your MessageQueue understand what "speed" it's supposed to run at. Just call something like MessageQueue::setPlaybackSpeed(0.5) if you want to run at half-speed, and have the queue use that factor from then on when someone gives it an amount of time.
As for this:
sleep( Minutes( 2 ) );
What was your old code doing? I guess whatever object Minutes() created had an implicit conversion operator to int that returned the number of seconds? This seems too magical to me--better to just make a sleep() method on your MessageQueue or some other class, and then you can use the same solution as above.

Fastest timing resolution system

What is the fastest timing system a C/C++ programmer can use?
For example:
time() will give the seconds since Jan 01 1970 00:00.
GetTickCount() on Windows will give the time, in milliseconds, since the system's start-up time, but is limited to 49.7 days (after that it simply wraps back to zero).
I want to get the current time, or ticks since system/app start-up time, in milliseconds.
The biggest concern is the method's overhead - I need the lightest one, because I'm about to call it many many times per second.
My case is that I have a worker thread, and to that worker thread I post pending jobs. Each job has an "execution time". So, I don't care if the time is the current "real" time or the time since the system's uptime - it just must be linear and light.
Edit:
unsigned __int64 GetTickCountEx()
{
static DWORD dwWraps = 0;
static DWORD dwLast = 0;
DWORD dwCurrent = 0;
timeMutex.lock();
dwCurrent = GetTickCount();
if(dwLast > dwCurrent)
dwWraps++;
dwLast = dwCurrent;
unsigned __int64 timeResult = ((unsigned __int64)0xFFFFFFFF * dwWraps) + dwCurrent;
timeMutex.unlock();
return timeResult;
}

For timing, the current Microsoft recommendation is to use QueryPerformanceCounter & QueryPerformanceFrequency.
This will give you better-than-millisecond timing. If the system doesn't support a high-resolution timer, then it will default to milliseconds (the same as GetTickCount).
Here is a short Microsoft article with examples of why you should use it :)

I recently had this question and did some research. The good news is that all three of the major operating systems provide some sort of high resolution timer. The bad news is that it is a different API call on each system. For POSIX operating systems you want to use clock_gettime(). If you're on Mac OS X, however, this is not supported, you have to use mach_get_time(). For windows, use QueryPerformanceCounter. Alternatively, with compilers that support OpenMP, you can use omp_get_wtime(), but it may not provide the resolution that you are looking for.
I also found cycle.h from fftw.org (www.fftw.org/cycle.h) to be useful.
Here is some code that calls a timer on each OS, using some ugly #ifdef statements. The usage is very simple: Timer t; t.tic(); SomeOperation(); t.toc("Message"); And it will print out the elapsed time in seconds.
#ifndef TIMER_H
#define TIMER_H
#include <iostream>
#include <string>
#include <vector>
# if (defined(__MACH__) && defined(__APPLE__))
# define _MAC
# elif (defined(_WIN32) || defined(WIN32) || defined(__CYGWIN__) || defined(__MINGW32__) || defined(_WIN64))
# define _WINDOWS
# ifndef WIN32_LEAN_AND_MEAN
# define WIN32_LEAN_AND_MEAN
# endif
#endif
# if defined(_MAC)
# include <mach/mach_time.h>
# elif defined(_WINDOWS)
# include <windows.h>
# else
# include <time.h>
# endif
#if defined(_MAC)
typedef uint64_t timer_t;
typedef double timer_c;
#elif defined(_WINDOWS)
typedef LONGLONG timer_t;
typedef LARGE_INTEGER timer_c;
#else
typedef double timer_t;
typedef timespec timer_c;
#endif
//==============================================================================
// Timer
// A quick class to do benchmarking.
// Example: Timer t; t.tic(); SomeSlowOp(); t.toc("Some Message");
class Timer {
public:
Timer();
inline void tic();
inline void toc();
inline void toc(const std::string &msg);
void print(const std::string &msg);
void print();
void reset();
double getTime();
private:
timer_t start;
double duration;
timer_c ts;
double conv_factor;
double elapsed_time;
};
Timer::Timer() {
#if defined(_MAC)
mach_timebase_info_data_t info;
mach_timebase_info(&info);
conv_factor = (static_cast<double>(info.numer))/
(static_cast<double>(info.denom));
conv_factor = conv_factor*1.0e-9;
#elif defined(_WINDOWS)
timer_c freq;
QueryPerformanceFrequency(&freq);
conv_factor = 1.0/(static_cast<double>freq.QuadPart);
#else
conv_factor = 1.0;
#endif
reset();
}
inline void Timer::tic() {
#if defined(_MAC)
start = mach_absolute_time();
#elif defined(_WINDOWS)
QueryPerformanceCounter(&ts);
start = ts.QuadPart;
#else
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
start = static_cast<double>(ts.tv_sec) + 1.0e-9 *
static_cast<double>(ts.tv_nsec);
#endif
}
inline void Timer::toc() {
#if defined(_MAC)
duration = static_cast<double>(mach_absolute_time() - start);
#elif defined(_WINDOWS)
QueryPerformanceCounter(&qpc_t);
duration = static_cast<double>(qpc_t.QuadPart - start);
#else
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
duration = (static_cast<double>(ts.tv_sec) + 1.0e-9 *
static_cast<double>(ts.tv_nsec)) - start;
#endif
elapsed_time = duration*conv_factor;
}
inline void Timer::toc(const std::string &msg) { toc(); print(msg); };
void Timer::print(const std::string &msg) {
std::cout << msg << " "; print();
}
void Timer::print() {
if(elapsed_time) {
std::cout << "elapsed time: " << elapsed_time << " seconds\n";
}
}
void Timer::reset() { start = 0; duration = 0; elapsed_time = 0; }
double Timer::getTime() { return elapsed_time; }
#if defined(_WINDOWS)
# undef WIN32_LEAN_AND_MEAN
#endif
#endif // TIMER_H

GetSystemTimeAsFileTime is the fastest resource. Its granularity can be obtained
by a call to GetSystemTimeAdjustment which fills lpTimeIncrement. The system time as filetime has 100ns units and increments by TimeIncrement.
TimeIncrement can vary and it depends on the setting of the multimedia timer interface.
A call to timeGetDevCaps will disclose the capabilities of the time services. It returns
a value wPeriodMin for the minimum supported interrupt period. A call to timeBeginPeriod with wPeriodMin as argument will setup the system to operate at highest possible interrupt frequency (typically ~1ms). This will also force the time increment of the system filetime returned by GetSystemTimeAsFileTime to be smaller. Its granularity will be in the range of 1ms (10000 100ns units).
For your purpose, I'd suggest to go for this approach.
The QueryPerformanceCounter choice is questionable since its frequency is not
accurate by two means: Firstly it deviates from the value given by QueryPerformanceFrequency by a hardware specific offset. This offset can easely be
several hundred ppm, which means that a conversion into time will contain an error of several hundreds of microseconds per second. Secondly it has thermal drift. The drift of such devices can easely be several ppm. This way another - heat dependend - error of
several us/s is added.
So as long as a resolution of ~1ms is sufficient and the main question is the overhead,
GetSystemTimeAsFileTime is by far the best solution.
When microseconds matter, you'd have to go a longer way and see more details. Sub-millisecond time services are described at the Windows Timestamp Project

If you're just worried about GetTickCount() overflowing, then you can just wrap it like this:
DWORDLONG GetLongTickCount(void)
{
static DWORDLONG last_tick = 0;
DWORD tick = GetTickCount();
if (tick < (last_tick & 0xffffffff))
last_tick += 0x100000000;
last_tick = (last_tick & 0xffffffff00000000) | tick;
return last_tick;
}
If you want to call this from multiple threads you'll need to lock access to the last_tick variable. As long as you call GetLongTickCount() at least once every 49.7 days, it'll detect the overflow.

I'd suggest that you use the GetSystemTimeAsFileTime API if you're specifically targeting Windows. It's generally faster than GetSystemTime and has the same precision (which is some 10-15 milliseconds - don't look at the resolution); when I did a benchmark some years ago under Windows XP it was somewhere in the range of 50-100 times faster.
The only disadvantage is that you might have to convert the returned FILETIME structures to a clock time using e.g. FileTimeToSystemTime if you need to access the returned times in a more human-friendly format. On the other hand, as long as you don't need those converted times in real-time you can always do this off-line or in a "lazy" fashion (e.g. only convert the time stamps you need to display/process, and only when you actually need them).
QueryPerformanceCounter can be a good choice as others have mentioned, but the overhead can be rather large depending on the underlying hardware support. In my benchmark I mention above QueryPerformanceCounter calls was 25-200 times slower than calls to GetSystemTimeAsFileTime. Also, there are some reliability problems as e.g. reported here.
So, in summary: If you can cope with a precision of 10-15 milliseconds I'd recommend you to use GetSystemTimeAsFileTime. If you need anything better than that I'd go for QueryPerformanceCounter.
Small disclaimer: I haven't performed any benchmarking under later Windows versions than XP SP3. I'd recommend you to do some benchmarking on you own.

POSIX supports clock_gettime() which uses a struct timespec which has nanosecond resolution. Whether your system really supports that fine-grained a resolution is more debatable, but I believe that's the standard call with the highest resolution. Not all systems support it, and it is sometimes well hidden (library '-lposix4' on Solaris, IIRC).
Update (2016-09-20):
Mac OS X 10.6.4 did not support clock_gettime(), and neither did any other version of Mac OS X up to and including Mac OS X 10.11.6 El Capitan). However, starting with macOS Sierra 10.12 (released September 2016), macOS does have the function clock_gettime() and manual pages for it at long last. The actual resolution (on CLOCK_MONOTONIC) is still microseconds; the smaller units are all zeros. This is confirmed by clock_getres() which reports that the resolution is 1000 nanoseconds, aka 1 µs.
The manual page for clock_gettime() on macOS Sierra mentions mach_absolute_time() as a way to get high-resolution timing. For more information, amongst other places, see Technical Q&A QA1398: Mach Absolute Time Units and (on SO) What is mach_absolute_time() based on on iPhone?

On Linux you get microseconds:
struct timeval tv;
int res = gettimeofday(&tv, NULL);
double tmp = (double) tv.tv_sec + 1e-6 * (double) tv.tv_usec;
On Windows, only millseconds are available:
SYSTEMTIME st;
GetSystemTime(&st);
tmp += 1e-3 * st.wMilliseconds;
return tmp;
This came from R's datetime.c (and was edited down for brevity).
Then there is of course Boost's Date_Time which can have nanosecond resolution on some systems (details here and here).

If you are targeting a late enough version of the OS then you could use GetTickCount64() which has a much higher wrap around point than GetTickCount(). You could also simply build a version of GetTickCount64() on top of GetTickCount().

Have you reviewed the code in this MSDN article?
http://msdn.microsoft.com/en-us/magazine/cc163996.aspx
I have this code compiling on a Windows 7 64bit machine using both VC2005 and C++ Builder XE but when executing, it locks up my machine; have not debugged far enough to figure out why yet. It seems overly complicated. Templates of templates of templates of UG...

On Mac OS X, you can simple use UInt32 TickCount (void) to get the ticks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js