Is this possible to wait for glRender() and glSwapBuffers() finished? - opengl

I noticed that when a key is pressed, and the redraw is initiated from the keyboard even function, the previous draw sometimes is not completely finished. The result is a sloppy "animation". I am basically scrolling the contents of the window. When I measure my draw() function I can see it takes 5 ms. Which is more than enough for a smooth scrolling. But my guess is that the actual drawing is done asynchronously by OpenGL driver somewhere under the hood. So the question:
Can I get notified when the actual rendering and screen update is finished?
function draw(ev) {
var gl = GLX.renderPipeline();
gl.Viewport(0, 0, width, height);
gl.MatrixMode(GL_PROJECTION);
gl.LoadIdentity();
gl.Ortho(0, width, height, 0, -1, 1);
gl.MatrixMode(GL_MODELVIEW);
gl.LoadIdentity();
gl.ClearColor(0.3,0.3,0.8,0.0);
gl.Clear(0x00004000|0x00000100);
gl.Color4f(0, 0, 0.9, 0.5);
rect(gl, 100, 100, 400, 300)
var s = 'ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER ATARI 65 XE FOREVER!'
s = s + s
s = s.split('')
for (var i = 0; i < s.length; i++) s[i] = s[i].charCodeAt(0) + charListBegin
for (var y = 0; y < 60; y++) {
gl.LoadIdentity();
gl.Translatef(charsX, charsY + y * fontSize*8, 0)
gl.Color4f(colors[y][0], colors[y][1], colors[y][2], 0.5);
gl.CallLists(s)
}
gl.Render(ctx);
GLX.SwapBuffers(ctx, win);
}

In general, your commands are placed into a queue. The command queue is flushed at distinct points, for example when you call SwapBuffers or glFlush (but also on some other occasions, e.g. when the queue is full) and the commands are worked off asynchronously. Most commands simply post a command and return immediately, unless they have some lengthy work to do that cannot be postponed, like glBufferData performing a copy of a few hundred kilobytes into a buffer object (this is something that has to happen immediately too, because OpenGL cannot know if the data is still valid at a later time). The time it takes to post commands is what you measure, but it's not what you are interested in.
If your GL version is at least 3.2, you can be "kind of notified" by calling glFenceSync, which inserts a fence object, and allows you to block until the fence has been realized using glClientWaitSync. When glClientWaitSync returns, all commands up to the fence have completed.
If you have at least version 3.3, you can measure the time your OpenGL commands take to render by inserting a query of type GL_TIME_ELAPSED. This works without blocking and is therefore by far the preferrable thing. This is the actual time it takes to draw your stuff.
SwapBuffers, like most commands, does mostly nothing. It will call glFlush, insert the equivalent of a fence, and mark the framebuffer as "locked, and ready to be swapped".
Eventually, when all draw commands have finished and when the driver or window manager can be bothered (think of vertical sync and compositing window managers!), the driver will unlock and swap buffers. This is when your stuff actually gets visible.
If you perform any other command in the mean time that would alter the locked frame buffer, the command blocks. This is what gives SwapBuffers the illusion of blocking.
You don't have much control over that (other than modifying the swap interval, if the implementation lets you) and you can't make it any faster -- but by playing with things like glFlush or glFinish you can make it slower.

Usually you queue a redraw, instead of calling draw directly. I.e. when using Qt, you'd call QWidget::update().
As far as waiting until all commands in the pipeline have been processed, you can call glFinish(), see https://www.opengl.org/sdk/docs/man4/xhtml/glFinish.xml .

Related

Is there any way to save the path and restore it in Cairo?

I have two graphs of drawing signals on a gtkmm application.
The problem comes when I have to paint a graph with many points (around 300-350k) and lines to the following points since it slows down a lot to paint all the points each iteration.
bool DrawArea::on_draw(const Cairo::RefPtr<Cairo::Context>& c)
{
cairo_t* cr = c->cobj();
//xSignal.size() = ySignal.size() = 350000
for (int j = 0; j < xSignal.size() - 1; ++j)
{
cairo_move_to(cr, xSignal[j], ySignal[j]);
cairo_line_to(cr, xSignal[j + 1], ySignal[j + 1]);
}
cairo_stroke(cr);
return true;
}
I know that exist a cairo_stroke_preserve but i think is not valid for me because when I switch between graphs, it disappears.
I've been researching about save the path and restore it on the Cairo documentation but i don´t see anything. In 2007, a user from Cairo suggested in the documentation 'to do' the same thing but apparently it has not been done.
Any suggestion?
It's not necessary that you draw everything in on_draw. What I understand from your post is that you have a real-time waveform drawing application where samples are available at fixed periods (every few milliseconds I presume). There are three approaches you can follow.
Approach 1
This is good particularly when you have limited memory and do not care about retaining the plot if window is resized or uncovered. Following could be the function that receives samples (one by one).
NOTE: Variables prefixed with m_ are class members.
void DrawingArea::PlotSample(int nSample)
{
Cairo::RefPtr <Cairo::Context> refCairoContext;
double dNewY;
//Get window's cairo context
refCairoContext = get_window()->create_cairo_context();
//TODO Scale and transform sample to new Y coordinate
dNewY = nSample;
//Clear area for new waveform segment
{
refCairoContext->rectangle(m_dPreviousX
+ 1,
m_dPreviousY,
ERASER_WIDTH,
get_allocated_height()); //See note below on m_dPreviousX + 1
refCairoContext->set_source_rgb(0,
0,
0);
refCairoContext->fill();
}
//Setup Cairo context for the trace
{
refCairoContext->set_source_rgb(1,
1,
1);
refCairoContext->set_antialias(Cairo::ANTIALIAS_SUBPIXEL); //This is up to you
refCairoContext->set_line_width(1); //It's 2 by default and better that way with anti-aliasing
}
//Add sub-path and stroke
refCairoContext->move_to(m_dPreviousX,
m_dPreviousY);
m_dPreviousX += m_dXStep;
refCairoContext->line_to(m_dPreviousX,
dNewY);
refCairoContext->stroke();
//Update coordinates
if (m_dPreviousX
>= get_allocated_width())
{
m_dPreviousX = 0;
}
m_dPreviousY = dNewY;
}
While clearing area the X coordinate has to be offset by 1 because otherwise the 'eraser' will clear of the anti-aliasing on the last coulmn and your trace will have jagged edges. It may need to be more than 1 depending on your line thickness.
Like I said before, with this method your trace will get cleared if the widget is resized or 'revealed'.
Approach 2
Even here the sample are plotted the same way as before. Only difference is that each sample received is pushed directly into a buffer. When the window is resized or 'reveled' the widget's on_draw is called and there you can plot all the samples one time. Of course you'll need some memory (quite a lot if you have 350K samples in queue) but the trace stays on screen no matter what.
Approach 3
This one also takes up a little bit of memory (probably much more depending on the size of you widget), and uses an off-screen buffer. Here instead of storing samples we store the rendered result. Override the widgets on_map method and on_size_allocate to create an offsceen buffer.
void DrawingArea::CreateOffscreenBuffer(void)
{
Glib::RefPtr <Gdk::Window> refWindow = get_window();
Gtk::Allocation oAllocation = get_allocation();
if (refWindow)
{
Cairo::RefPtr <Cairo::Context> refCairoContext;
m_refOffscreenSurface =
refWindow->create_similar_surface(Cairo::CONTENT_COLOR,
oAllocation.get_width(),
oAllocation.get_height());
refCairoContext = Cairo::Context::create(m_refOffscreenSurface);
//TODO paint the background (grids may be?)
}
}
Now when you receive samples, instead of drawing into the window directly draw into the off-screen surface. Then block copy the off screen surface by setting this surface as your window's cairo context's source and then draw a rectangle to draw the newly plotted sample. Also in your widget's on_draw just set this surface as the source of widget's cairo context and do a Cairo::Context::paint(). This approach is particularly useful if your widget probably doesn't get resized and the advantage is that the blitting (where you transfer contents of one surface to the other) is way faster than plotting individual line segments.
To answer your question:
There is cairo_copy_path() and cairo_append_path() (there is also cairo_copy_path_flat() and cairo_path_destroy()).
Thus, you can save a path with cairo_copy_path() and later append it to the current path with cairo_append_path().
To answer your not-question:
I doubt that this will speed up your drawing. Appending these lines to the current path is unlikely to be slow. Rather, I would expect the actual drawing of these lines to be slow.
You write "it slows down a lot to paint all the points each iteration.". I am not sure what "each iteration" refers to, but why are you drawing all these points all the time? Wouldn't it make more sense to only draw them once and then to re-use the drawn result?

OpenGL Precompute Vertices/Matrices for Particle System / Optimization

I have a particle system which I want to make as really fast as possible without any effects on the main display function, I basically placed all particles calculations on a separate infinite thread which I keep synchronized with WaitForEvent() (Windows), DataLock flags, etc.
I use glColorPointer, glNormalPointer, glVertexPointer etc to point to the buffered data on the GPU (glGenBuffers, glBufferData) and then glDrawElements to render them.
At the moment I don't have the code so I hope that won't be a problem but I'll try my best to get the infrastructure described:
Main [Init]
Create a pre-calc queue 30% in size of N particles and do sequential calculations (Thread 1 #2)
Thread 1
Wait for Calculate Event signal or if pre-calc queue is not full then continue
Loop through N particles and update position / velocity, storing it in pUpdate
If pre-calc queue is not full, add pUpdate to it
Main [Render]
glActiveTexture(TEXTURE0)
glCol/glNorm/glTex/glVertexPointer
If pre-calc is empty use the most recent pUpdate
OR use one of the pre-calc and delete
Store item in buffer using glBufferSubData()
DrawElements() to draw them
SwapBuffers
The problem is that the Render function uses about 50 pre-calc per second (which speeds up rendering while there are enough left) before 1 could even be added. In short order the pre-calc is empty so everything slows down and the program reverts to Main-Render #3
Any ideas?

How to reduce OpenGL CPU usage and/or how to use OpenGL properly

I'm working a on a Micromouse simulation application built with OpenGL, and I have a hunch that I'm not doing things properly. In particular, I'm suspicious about the way I am getting my (mostly static) graphics to refresh at a close-to-constant framerate (60 FPS). My approach is as follows:
1) Start a timer
2) Draw my shapes and text (about a thousand of them):
glBegin(GL_POLYGON);
for (Cartesian vertex : polygon.getVertices()) {
std::pair<float, float> coordinates = getOpenGlCoordinates(vertex);
glVertex2f(coordinates.first, coordinates.second);
}
glEnd();
and
glPushMatrix();
glScalef(scaleX, scaleY, 0);
glTranslatef(coordinates.first * 1.0/scaleX, coordinates.second * 1.0/scaleY, 0);
for (int i = 0; i < text.size(); i += 1) {
glutStrokeCharacter(GLUT_STROKE_MONO_ROMAN, text.at(i));
}
glPopMatrix();
3) Call
glFlush();
4) Stop the timer
5) Sleep for (1/FPS - duration) seconds
6) Call
glutPostRedisplay();
The "problem" is that the above approach really hogs my CPU - the process is using something like 96-100%. I know that there isn't anything inherently wrong with using lots of CPU, but I feel like I shouldn't be using that much all of the time.
The kicker is that most of the graphics don't change from frame to frame. It's really just a single polygon moving over (and covering up) some static shapes. Is there any way to tell OpenGL to only redraw what has changed since the previous frame (with the hope it would reduce the number of glxxx calls, which I've deemed to be the source of the "problem")? Or, better yet, is my approach to getting my graphics to refresh even correct?
First and foremost the biggest CPU hog with OpenGL is immediate mode… and you're using it (glBegin, glEnd). The problem with IM is, that every single vertex requires a whole couple of OpenGL calls being made; and because OpenGL uses a thread local state this means that each and every OpenGL call must go through some indirection. So the first step would be getting rid of that.
The next issue is with how you're timing your display. If low latency between user input and display is not your ultimate goal the standard approach would setting up the window for double buffering, enabling V-Sync, set a swap interval of 1 and do a buffer swap (glutSwapBuffers) once the frame is rendered. The exact timings what and where things will block are implementation dependent (unfortunately), but you're more or less guaranteed to exactly hit your screen refresh frequency, as long as your renderer is able to keep up (i.e. rendering a frame takes less time that a screen refresh interval).
glutPostRedisplay merely sets a flag for the main loop to call the display function if no further events are pending, so timing a frame redraw through that is not very accurate.
Last but not least you may be simply mocked by the way Windows does account CPU time (time spent in driver context, which includes blocking, waiting for V-Sync) will be accouted to the consumed CPU time, while it's in fact interruptible sleep. However you wrote, that you already do a sleep in your code, which would rule that out, because the go-to approach to get a more reasonable accounting would be adding a Sleep(1) before or after the buffer swap.
I found that by putting render thread to sleep helps reducing cpu usage from (my case) 26% to around 8%
#include <chrono>
#include <thread>
void render_loop(){
...
auto const start_time = std::chrono::steady_clock::now();
auto const wait_time = std::chrono::milliseconds{ 17 };
auto next_time = start_time + wait_time;
while(true){
...
// execute once after thread wakes up every 17ms which is theoretically 60 frames per
// second
auto then = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_until(next_time);
...rendering jobs
auto elasped_time =
std::chrono::duration_cast<std::chrono::milliseconds> (std::chrono::high_resolution_clock::now() - then);
std::cout << "ms: " << elasped_time.count() << '\n';
next_time += wait_time;
}
}
I thought about attempting to measure the frame rate while the thread is asleep but there isn't any reason for my use case to attempt that. The result was averaging around 16ms so I thought it was good enough
Inspired by this post

which one is proper method of writing this gl code

i have been doing some experiments with opengl and handling textures.
in my experiment i have a 2d array of (int) which are randomly generated
int mapskeleton[300][300];
then after that i have my own obj file loader for loading obj with textures
m2d wall,floor;//i initialize and load those files at start
for recording statistics of render times i used
bool Once = 1;
int secs = 0;
now to the render code here i did my experiment
// Code A: Benchmarked on radeon 8670D
// Takes 232(average) millisecs for drawing 300*300 tiles
if(Once)
secs = glutGet(GLUT_ELAPSED_TIME);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_Wall){
glBindTexture(GL_TEXTURE_2D,wall.texture);
glPushMatrix();
glTranslatef(j*10,i*10,0);
wall.Draw();//Draws 10 textured triangles
glPopMatrix();
}
if(mapskeleton[j][i] == skel_floor){
glBindTexture(GL_TEXTURE_2D,floor.texture);
glPushMatrix();
glTranslatef(j*10,i*10,0);
floor.Draw();//Draws 2 textured triangles
glPopMatrix();
}
}
}
if(Once){
secs = glutGet(GLUT_ELAPSED_TIME)-secs;
printf("time taken for rendering %i msecs",secs)
Once = 0;
}
and other code is
// Code B: Benchmarked on radeon 8670D
// Takes 206(average) millisecs for drawing 300*300 tiles
if(Once)
secs = glutGet(GLUT_ELAPSED_TIME);
glBindTexture(GL_TEXTURE_2D,floor.texture);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_floor){
glPushMatrix();
glTranslatef(j*10,i*10,0);
floor.Draw();
glPopMatrix();
}
}
}
glBindTexture(GL_TEXTURE_2D,wall.texture);
for(int i=0;i<mapHeight;i++){
for(int j=0;j<mapWidth;j++){
if(mapskeleton[j][i] == skel_Wall){
glPushMatrix();
glTranslatef(j*10,i*10,0);
wall.Draw();
glPopMatrix();
}
}
}
if(Once){
secs = glutGet(GLUT_ELAPSED_TIME)-secs;
printf("time taken for rendering %i msecs",secs)
Once = 0;
}
for me code A looks good with a point of a person(Beginner) viewing code. but benchmarks say different.
my gpu seems to like code B. I don't understand why does code B takes less time to render?
Changes to OpenGL state can generally be expensive - the driver's and/or GPUs data structures and caches can become invalidated. In your case, the change in question is binding a different texture. In code B, you're doing it twice. In code A, you're easily doing it thousands of times.
When programming OpenGL rendering, you'll generally want to set up the pipeline for settings A, render everything which needs settings A, re-set the pipeline for settings B, render everything which needs settings B, and so on.
#Angew covered why one options is more efficient than the other. But there is an important point that needs to be stated very clearly. Based on the text of your question, particularly here:
for recording statistics of render times
my gpu seems to like code B
you seem to attempt to measure rendering/GPU performance.
You are NOT AT ALL measuring GPU performance!
You measure the time for setting up the state and making the draw calls. OpenGL lets the GPU operate asynchronously from the code executed on the CPU. The picture you should keep in mind when you make (most) OpenGL calls is that you're submitting work to the GPU for later execution. There's no telling when the GPU completes that work. It most definitely (except for very few calls that you want to avoid in speed critical code) does not happen by the time the call returns.
What you're measuring in your code is purely the CPU overhead for making these calls. This includes what's happening in your own code, and what happens in the driver code for handling the calls and preparing the work for later submission to the GPU.
I'm not saying that the measurement is not useful. Minimizing CPU overhead is very important. You just need to be very aware of what you are in fact measuring, and make sure that you draw the right conclusions.

Linear movement stutter

I have created simple, frame independent, variable time step, linear movement in Direct3D9 using ID3DXSprite. Most users cant notice it, but on some (including mine) computers it happens often and sometimes it stutters a lot.
Stuttering occurs with VSync enabled and disabled.
I figured out that same happens in OpenGL renderer.
Its not floating point problem.
Seems like problem only exist in AERO Transparent Glass windowed mode (fine or at least much less noticeable in fullscreen, borderless full screen window or with aero disabled), even worse when window lost focus.
EDIT:
Frame delta time doesnt leave bounds 16 .. 17 ms even when stuttering occurs.
Seems like my frame delta time measurement log code was bugged. I fixed it now.
Normally with VSync enabled frame renders 17ms, but sometimes (probably when sutttering happens) it jumps to 25-30ms.
(I dump log only once at application exit, not while running, rendering, so its does not affect performance)
device->Clear(0, 0, D3DCLEAR_TARGET, D3DCOLOR_ARGB(255, 255, 255, 255), 0, 0);
device->BeginScene();
sprite->Begin(D3DXSPRITE_ALPHABLEND);
QueryPerformanceCounter(&counter);
float time = counter.QuadPart / (float) frequency.QuadPart;
float deltaTime = time - currentTime;
currentTime = time;
position.x += velocity * deltaTime;
if (position.x > 640)
velocity = -250;
else if (position.x < 0)
velocity = 250;
position.x = (int) position.x;
sprite->Draw(texture, 0, 0, &position, D3DCOLOR_ARGB(255, 255, 255, 255));
sprite->End();
device->EndScene();
device->Present(0, 0, 0, 0);
Fixed timer thanks to Eduard Wirch and Ben Voigt (although it doesnt fix initial problem)
float time()
{
static LARGE_INTEGER start = {0};
static LARGE_INTEGER frequency;
if (start.QuadPart == 0)
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return (float) ((counter.QuadPart - start.QuadPart) / (double) frequency.QuadPart);
}
EDIT #2:
So far I have tried three update methods:
1) Variable time step
x += velocity * deltaTime;
2) Fixed time step
x += 4;
3) Fixed time step + Interpolation
accumulator += deltaTime;
float updateTime = 0.001f;
while (accumulator > updateTime)
{
previousX = x;
x += velocity * updateTime;
accumulator -= updateTime;
}
float alpha = accumulator / updateTime;
float interpolatedX = x * alpha + previousX * (1 - alpha);
All methods work pretty much same, fixed time step looks better, but it's not quite an option to depend on frame rate and it doesn't solve problem completely (still jumps (stutters) from time to time rarely).
So far disabling AERO Transparent Glass or going full screen is only significant positive change.
I am using NVIDIA latest drivers GeForce 332.21 Driver and Windows 7 x64 Ultimate.
Part of the solution was a simple precision data type problem. Exchange the speed calculation by a constant, and you'll see a extremely smooth movement. Analysing the calculation showed that you're storing the result from QueryPerformanceCounter() inside a float. QueryPerformanceCounter() returns a number which looks like this on my computer: 724032629776. This number requires at least 5 bytes to be stored. How ever a float uses 4 bytes (and only 24 bits for actual number) to store the value. So precision is lost when you convert the result of QueryPerformanceCounter() to float. And sometimes this leads to a deltaTime of zero causing stuttering.
This explains partly why some users do not experience this problem. It all depends on if the result of QueryPerformanceCounter() does fit into a float.
The solution for this part of the problem is: use double (or as Ben Voigt suggested: store the initial performance counter, and subtract this from new values before converting to float. This would give you at least more head room, but might eventually hit the float resolution limit again, when the application runs for a long time (depends on the growth speed of the performance counter).)
After fixing this, the stuttering was much less but did not disappear completely. Analyzing the runtime behaviour showed that a frame is skipped now and then. The application GPU command buffer is flushed by Present but the present command remains in the application context queue until the next vsync (even though Present was invoked long before vsync (14ms)). Further analysis showed that a back ground process (f.lux) told the system to set the gamma ramp once in a while. This command required the complete GPU queue to run dry before it was executed. Probably to avoid side effects. This GPU flush was started just before the 'present' command was moved to the GPU queue. The system blocked the video scheduling until the GPU ran dry. This took until the next vsync. So the present packet was not moved to GPU queue until the next frame. The visible effect of this: stutter.
It's unlikely that you're running f.lux on your computer too. But you're probably experiencing a similar background intervention. You'll need to look for the source of the problem on your system yourself. I've written a blog post about how to diagnose frame skips: Diagnose frame skips and stutter in DirectX applications. You'll also find the whole story of diagnosing f.lux as the culprit there.
But even if you find the source of your frame skip, I doubt that you'll achieve stable 60fps while dwm window composition is enabled. The reason is, you're not drawing to the screen directly. But instead you draw to a shared surface of dwm. Since it's a shared resource it can be locked by others for an arbitrary amount of time making it impossible for you to keep the frame rate stable for your application. If you really need a stable frame rate, go full screen, or disable window composition (on Windows 7. Windows 8 does not allow disabling window composition):
#include <dwmapi.h>
...
HRESULT hr = DwmEnableComposition(DWM_EC_DISABLECOMPOSITION);
if (!SUCCEEDED(hr)) {
// log message or react in a different way
}
I took a look at your source code and noticed that you only process one window message every frame. For me this caused stuttering in the past.
I would recommend to loop on PeekMessage until it returns zero to indicate that the message queue is exhausted. After that render a frame.
So change:
if (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
to
while (PeekMessageW(&message, 0, 0, 0, PM_REMOVE))
Edit:
I compiled and ran you code (with another texture) and it displayed the movement smoothly for me. I don't have aero though (Windows 8).
One thing I noticed: You set D3DCREATE_SOFTWARE_VERTEXPROCESSING. Have you tried to set this to D3DCREATE_HARDWARE_VERTEXPROCESSING?