Discerne between draw time and swap wait - opengl

I have some routines that draw a scene, then I swap the buffers, and since I have swap wait set to 1, the call could block waiting for vertical synch.
Is it possible to measure how much time is spent in drawing the scene, and how much in waiting for the vertical sync? I tried to do the following:
start = clock();
drawstuff();
glFinish();
end = clock();
SwapBuffers();
swapend = clock();
but it doesn't seem to work, at least with my hardware and driver, because end and swapend are always the same.

Your clocks resolution is not enough. Use std::chrono, boost::chrono, or platform-specific clocks.
Example (ideone):
#include <chrono>
#include <thread>
#include <iostream>
using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::nanoseconds;
using std::chrono::microseconds;
typedef std::chrono::high_resolution_clock myclock;
typedef myclock::time_point time_point;
typedef myclock::duration duration;
auto time_to_wait = microseconds(1000);
inline void doStuff()
{
std::this_thread::sleep_for(time_to_wait);
}
int main()
{
time_point start, end;
start = myclock::now();
doStuff();
end = myclock::now();
auto elapsed = duration_cast<microseconds>(end - start);
std::cout << elapsed .count() << " microseconds elapsed\n";
}
Notes:
Better use profiler.
Even better use graphics profiler.
You will get very different results on different platforms, different vendors and even drivers versions. Does it really make any sense to measure?
You don't really need to call glFinish()
If you're on Windows and MSVC compiler, don't use std::chrono. Use boost::chrono or QueryPerformanceCounter instead.
AFAIK Swap blocking depends on driver implementation. Typically it is non-blocking, due to vendors implement it in threaded way. If you're in doubt, it is always good idea to move your calculations (but not rendering) to separate thread.

Related

sleep_until() and steady_clock loop drifting from real time in macOS

Good evening everyone,
I'm trying to learn concurrency using the C++ Concurrency Book by Anthony Williams. Having read the first 2 chapters I thought about coding a simple metronome working in its own thread:
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
class Metro
{
public:
// beats per minute
Metro(int bpm_in);
void start();
private:
// In milliseconds
int mPeriod;
std::vector<std::thread> mThreads;
private:
void loop();
};
Metro::Metro(int bpm_in):
mPeriod(60000/bpm_in)
{}
void Metro::start()
{
mThreads.push_back(std::thread(&Metro::loop, this));
mThreads.back().detach();
}
void Metro::loop()
{
auto x = std::chrono::steady_clock::now();
while(true)
{
x += std::chrono::milliseconds(mPeriod);
std::cout << "\a" << std::flush;
std::this_thread::sleep_until(x);
}
}
Now, this code seems to work properly, except for the time interval: the period (assuming bpm = 60 => mPeriod = 1000ms) is more than 1100ms. I read that sleep_until is not guaranteed to wake the process up exactly at the correct time (cppreference.com), but the lack of precision should not change the average period time, only delay the single "tic" inside the time grid, am I understanding it correctly? I assumed that storing the steady_clock::now() time only the first time and then using only the increment would be the correct way not to add drifting time at every cycle. Nevertheless, I also tried to change the x var update in the while loop to
x = std::chrono::steady_clock::now() + std::chrono::milliseconds(mPeriod);
but the period increases even more. I also tried using std::chrono::system_clock and high_resolution_clock, but the period didn't improve. Also, I think the properties I'm interested in for this application are monotonicity and steadiness, which steady_clock has. My question is: is there anything completely wrong I did in my code? Am I missing something concerning how to use std::chrono clocks and sleep_until? Or is this kind of method inherently not precise?
I've started analyzing the period by simply comparing it against some known metronomes (Logic Pro, Ableton Live, some mobile apps) and then recorded the output sound to have a better measurement. Maybe the sound buffer has some delay on itself, but same problem happens when making the program output a char. Also, the problem I'm concerned about is the drifting, not the single tic being a bit out of time.
I'm compiling from macos 10.15 terminal with g++ --std=c++11 -pthread and running it on Intel i7 4770hq.

std::this_thread::sleep_for sleeps for too long

Can anyone tell what the problem with following example is?
It produces 65 instead of 300 frames per second.
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <Thread>
#include <Chrono>
#include <String>
int main(int argc, const char* argv[]) {
using namespace std::chrono_literals;
constexpr unsigned short FPS_Limit = 300;
std::chrono::duration<double, std::ratio<1, FPS_Limit>> FrameDelay = std::chrono::duration<double, std::ratio<1, FPS_Limit>>(1.0f);
unsigned int FPS = 0;
std::chrono::steady_clock SecondTimer;
std::chrono::steady_clock ProcessTimer;
std::chrono::steady_clock::time_point TpS = SecondTimer.now();
std::chrono::steady_clock::time_point TpP = ProcessTimer.now();
while (true) {
// ...
// Count FPS
FPS++;
if ((TpS + (SecondTimer.now() - TpS)) > (TpS + 1s)) {
OutputDebugString(std::to_string(FPS).c_str()); OutputDebugString("\n");
FPS = 0;
TpS = SecondTimer.now();
}
// Sleep
std::this_thread::sleep_for(FrameDelay - (ProcessTimer.now() - TpP)); // FrameDelay minus time needed to execute other things
TpP = ProcessTimer.now();
}
return 0;
}
I guess it has something to do with std::chrono::duration<double, std::ratio<1, FPS_Limit>>, but when it is multiplied by FPS_Limit the correct 1 frames per second are produced.
Note that the limit of 300 frames per second is just an example.
It can be replaced by any other number and the program would still sleep for way too long.
In short, the problem is that you use std::this_thread::sleep_for at all. Or, any kind of "sleep" for that matter. Sleeping to limit the frame rate is just utterly wrong.
The purpose of sleep functionality is, well... I don't know to be honest. There are very few good uses for it at all, and in practically every situation, a different mechanism is better.
What std::this_thread::sleep_for does (give or take a few lines of sanity tests and error checking) is, it calls the Win32 Sleep function (or, on a different OS, a different, similar function such as nanosleep).
Now, what does Sleep do? It makes a note somewhere in the operating system's little red book that your thread needs to be made ready again at some future time, and then renders your thread not-ready. Being not-ready means simply that your thread is not on the list of candidates to be scheduled for getting CPU time.
Sometimes, eventually, a hardware timer will fire an interrupt. That can be a periodic timer (pre Windows 8) with an embarrassingly bad default resolution, or programmable one-shot interrupt, whatever. You can even adjust that timer's resolution, but doing so is a global thing which greatly increases the number of context switches. Plus, it doesn't solve the actual problem. When the OS handles the interrupt, it looks in its book to see which threads need to be made ready, and it does that.
That, however, is not the same as running your thread. It is merely a candidate for being run again (maybe, some time).
So, there's timer granularity, inaccuracy in your measurement, plus scheduling... which altogether is very, very unsuitable for short, periodic intervals. Also, different Windows versions are known to round differently to the scheduler's granularity.
Solution: Do not sleep. Enable vertical sync, or leave it to the user to enable it.

Retargeting newlib for c++ chrono

I am using arm-none-eabi toolchain with newlib to target a custom board with an ARM Cortex-M0+ (specifically the MCU-on-eclipse version of the toolchain). I am compiling/linking with -nostartfiles and --specs=nano.specs and have re-targeted stdout and stderr to USB and a serial port respectively. I have created implementations for most of the C system calls.
I am using the chrono library with two custom clock, the now() functions get RTC time or my systick timer. It seems like this mirrors the purpose of the standard steady_clock and system_clock and so I though I could try using them.
to do so I had to implement the gettimeofday syscall which I did
// returning a set time of one second just for testing
int _gettimeofday(struct timeval* tv, void* tz) {
tv->tv_sec = 1;
tv->tv_usec = 255;
return 0;
}
my main code is as follows:
int main(void)
{
HWInit();
static std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();
static std::chrono::system_clock::time_point t2 = std::chrono::system_clock::now();
int64_t count1 = t1.time_since_epoch().count();
int64_t count2 = t2.time_since_epoch().count();
printf("Time 1: %lld\n Time 2: %lld\n", count1, count2);
for(;;){}
return 0;
}
using the debugger I can see that both steady_clock::now() and sysytem_clock::now() call my _gettimeofday() function and both end up with the exact same time-point.
of course if I try to do the following I get multiple definition errors:
using SysClock = std::chrono::system_clock;
SysClock::time_point SysClock::now() noexcept {
return SysClock::time_point( SysClock::duration(1983) );
}
So can I somehow overload the now() functions of the standard chrono clocks? or maybe the entire clock implementation with my own duration and rep typedefs that match the hardware better? I can overload new and delete for my embedded system (and should), so doing this for chrono would also be nice.
From gccs libstdc++ chrono.cc:
system_clock::now() uses gettimeofday(&tv, 0); or clock_gettime(CLOCK_REALTIME, &tp); or syscall. If gettimeofday works for you, that means it uses it.
steady_clock::now() uses clock_gettime(CLOCK_MONOTONIC, &tp);. So you should overload clock_gettime and handle CLOCK_MONOTONIC argument.
There is no _clock_gettime_r function provided by newlib, as one in _gettimeofday_t that passes newlib's struct reent around. If you want to handle multithreading within newlib, it's good to write your own similar wrapper that handles _reent->errno value. But the bet would be to overload _gettimeofday_r function as you aim only at newlib.
Instead of trying to change the behavior of system_clock and steady_clock, I recommend just writing your own custom clocks and using them. That way you can better tailor them to your hardware and needs. If you have some way to get the current time, creating a custom chrono clock to wrap that function is very easy.
class SysClock
{
public:
// 500MHz, or whatever you need
using period = std::ratio<1, 500'000'000>;
using rep = long long;
using duration = std::chrono::duration<rep, period>;
using time_point = std::chrono::time_point<SysClcok>;
static constexpr bool is_steady = true;
static time_point now() noexcept
{
return time_point{duration{
/*turn SysTick_getValue() into the number of ticks since epoch*/}};
}
};
Now use SysClock::now() in your code instead of system_clock::now(). This gives you SysClock::time_point and chrono::durations result from the subtraction of two SysClock::time_points.
If you can turn your low-level "now" into a count of ticks against some epoch, and you can describe those ticks as a compile-time fraction of a second with period, then you're good to go.

c++ get elapsed time platform independent

For a game I wanna measure the time that has passed since the last frame.
I used glutGet(GLUT_ELAPSED_TIME) to do that. But after including glew the compiler can't find the glutGet function anymore (strange). So I need an alternative.
Most sites I found so far suggest using clock in ctime but that function only measures the cpu time of the program not the real time! The time function in ctime is only accurate to seconds. I need at least millisecond accuracy.
I can use C++11.
I don't think there is a high resolution clock built-in C++ before C++11. If you are unable to use C++11 you have to either fix your error with glut and glew or use the platform dependent timer functions.
#include <chrono>
class Timer {
public:
Timer() {
reset();
}
void reset() {
m_timestamp = std::chrono::high_resolution_clock::now();
}
float diff() {
std::chrono::duration<float> fs = std::chrono::high_resolution_clock::now() - m_timestamp;
return fs.count();
}
private:
std::chrono::high_resolution_clock::time_point m_timestamp;
};
Boost provides std::chrono like clocks: boost::chrono
You should consider using std::chrono::steady_clock (or boost equivalent) as opposed to std::chrono::high_resolution_clock - or at least ensure std::chrono::steady_clock::is_steady() == true - if you want to use it to calculate duration, as the time returned by a non-steady clock might even decrease as physical time moves forward.

program supporting real and non-real time modes

I am attempting to transition an existing program to use the new time facilities in C++11 from (homegrown) existing time classes. For real-time processing it is clear how to map the C++11 functionality into the homegrown time classes. It is less clear how the C++11 chrono time facilities can be used to support a non-real time mode (e.g., a "run as fast as you can batch mode", a "run at quarter speed demonstration mode", etc.) which the homegrown classes support. Is this accomplished via defining special clocks that are mapping the wall time to the "playback" speed properly? Any help appreciated and an example would be fantastic.
For example, the code I will transitioning has constructs such as
MessageQueue::poll( Seconds( 1 ) );
or
sleep( Minutes( 2 ) );
where the Seconds or Minutes object is aware of the speed at which the program is being run at to avoid having to use a multiplier or conversion function all of over the place like
MessageQueue::poll( PlaybackSpeed * Seconds( 1 ) );
or
MessageQueue::poll( PlaybackSpeed( Seconds( 1 ) ) );
What I was hoping was possible was to obtain the same sort of behavior with std::chrono::duration and std::chrono::time_point by providing a custom clock.
Whether or not making your own clock will be sufficient depends on how you use the time durations you create. For example if you wanted to run at half speed but somewhere called:
std::this_thread::sleep_for(std::chrono::minutes(2));
The duration would not be adjusted. Instead you'd need to use sleep_until and provide a time point that uses your 'slow' clock. But making a clock that runs slow is pretty easy:
template<typename Clock,int slowness>
struct slow_clock {
using rep = typename Clock::rep;
using period = typename Clock::period;
using duration = typename Clock::duration;
using time_point = std::chrono::time_point<slow_clock>;
constexpr static bool is_steady = Clock::is_steady;
static time_point now() {
return time_point(start_time.time_since_epoch() + ((Clock::now() - start_time)/slowness));
}
static const typename Clock::time_point start_time;
};
template<typename Clock,int slowness>
const typename Clock::time_point
slow_clock<Clock,slowness>::start_time = Clock::now();
The time_points returned from now() will appear to advance at a slower rate relative to the clock you give it. For example here's a program so you can watch nanoseconds slowly tick by:
int main() {
using Clock = slow_clock<std::chrono::high_resolution_clock,500000000>;
for(int i=0;i<10;++i) {
std::this_thread::sleep_until(Clock::now()
+ std::chrono::nanoseconds(1));
std::cout << "tick\n";
}
}
All of the functions you implement, like MessageQueue::poll(), will probably need to be implemented in terms of a global clock typedef.
Of course none of this has anything to do with with how fast the program actually runs, except insofar as you're slowing down the program based on them. Functions that time out will take longer, sleep_until will take be longer, but operations that don't wait for some time point in the future will simply appear to be faster.
// appears to run a million times faster than normal according to (finish-start)
auto start = slow_clock<steady_clock,1000000>::now();
do_slow_operation();
auto finish = slow_clock<steady_clock,1000000>::now();
For this case:
MessageQueue::poll( Seconds( 1 ) );
You could easily use the standard time classes if you just make your MessageQueue understand what "speed" it's supposed to run at. Just call something like MessageQueue::setPlaybackSpeed(0.5) if you want to run at half-speed, and have the queue use that factor from then on when someone gives it an amount of time.
As for this:
sleep( Minutes( 2 ) );
What was your old code doing? I guess whatever object Minutes() created had an implicit conversion operator to int that returned the number of seconds? This seems too magical to me--better to just make a sleep() method on your MessageQueue or some other class, and then you can use the same solution as above.