Understanding buffer swapping in more detail - opengl

This is more a theoretical question. This is what I understand regarding buffer swapping and vsync:
I - When vsync is off, whenever the developer swap the front/back buffers, the buffer that the GPU is reading from and sending to the monitor will be changed to the new one, regardless if the old buffer was being read (i.e. no vblank is needed).
II - When vsync is on, the buffers are not immediately swapped, they will only be changed when the old buffer was completely read (i.e. vblank is needed).
III - Turning vsync off can boost the frame rate to be greater than the monitor refresh rate, but screen tearing can appear when buffers are swapped when they are being read
IV - Turning vsync on prevents tearing, but the monitor refresh rate limits the FPS.
Based on this I tried to do the following experiment: I disabled vsync and every frame I rendered all pixels with a solid color using glClearColor + glClear, choosing a new random color per frame. I got ~2400FPS in a 60Hz monitor. Since every frame I swapped the buffers, and since the monitor takes 1/60 second for each full screen drawing, I was expecting that each time the monitor was being refreshed, the buffers would have been swapped roughly ~40 times. This is because in 1/60s, there are around 40 buffer swapping calls. Since everytime the buffers are swapped the clear color is different, I was expecting to see a really messy image, with lots of different colors, because of the tearing. Instead, by taking some screenshots I didn't see any tearing... every pixel had the same solid color.
Could someone point the wrong assumptions that I had and why I see such behavior?
Thanks in advance!

The problem was related to the window manager. I could see the expected behavior when I ran in full screen.

Related

OSX pushing pixels to screen with minimum latency

I'm trying to develop some very low-latency graphics applications and am getting really frustrated by how long it takes to draw to screen through OpenGL. Every discussion I find about it online addresses optimizing the OpenGL pipeline, but doesn't get anywhere near the results that I need.
Check this out:
https://www.dropbox.com/s/dbz4bq67cxluhs7/MouseLatency.MOV?dl=0
You probably noticed this before: With a c++ OpenGL app, dragging the mouse around the screen, and drawing the mouse location in OpenGL, the OpenGL lags behind by 3 or 4 frames. Clearly OSX CAN draw [the cursor] to screen with very low latency, but OpenGL is much slower. So let's say I don't need to do any fancy OpenGL rendering. I just want to push pixels to screen somehow. Is there a way for me to bypass OpenGL completely and draw to screen faster? Or is this kind of functionality going to be locked inside the kernel somewhere that I can't reach it?
datenwolf's answer is excellent. I just wanted to add one thing to this discussion regarding triple buffering at the compositor level, since I am very familiar with the Microsoft Windows desktop compositor.
I know you are asking about OS X here, but the implementation details I am going to discuss are the most sensible way of implementing this stuff and I would expect to see other systems work this way too.
Triple buffering as you might enable at the application level adds a third buffer to the swap-chain that is synchronized to refresh. That way of doing triple buffering does add latency, because that third buffer has to be displayed and nothing is allowed to touch it until this happens (this is D3D's mandated behavior -- the behavior and feature itself are undefined in OpenGL); but the way the Desktop Window Manager (Windows) works is slightly different.
The behavior I have seen most drivers implement for desktop composition is frame dropping. Any situation where multiple frames are finished between refreshes, all but 1 of those frames are discarded. You actually get lower latency using a window rather than fullscreen + triple buffering, because it does not block buffer swaps when the third buffer (owned by the compositor) has a finished frame waiting to be displayed.
It creates a whole different set of visual issues if framerate is not reasonably consistent. Technically, pixels belonging to dropped frames have infinite latency, so the benefits from latency reduction done this way might be worthless if you needed every single frame drawn to appear on screen.
I believe you can get this behavior on OS X (if you want it) by disabling VSYNC and drawing in a window. VSYNC basically only serves as a form of frame pacing (trade latency for consistency) in this scenario and tearing is eliminated by the compositor itself regardless what rate you draw at.
Regarding mouse cursor latency:
The cursor in any modern window system will always track with minimum latency. There is literally a feature on graphics hardware called a "hardware cursor," where the driver stores the cursor position and then once per-refresh, has the hardware overlay the cursor on top of whatever is sitting in the framebuffer waiting to be scanned-out. So even if your application is drawing at 30 FPS on a 60 Hz display, the cursor is updated every 16 ms when the hardware cursor's used.
This bypasses all graphics APIs altogether, but is quite limited (e.g. it uses the OS-defined cursor).
TL;DR: Latency comes in many forms.
If your problem is input latency, then you can mitigate that by reducing the number of pre-rendered frames and avoiding triple buffering. I could not begin to tell you how to reduce the number of driver pre-rendered frames on OS X.
Minimize length of time before something shows up on screen
If your problem is the amount of time that passes between executions of your render loop, you would go the other way. Increase pre-rendered frames, draw in a window and disable VSYNC. You may run into a lot of frames that are drawn but never displayed in this scenario.
Minimize time spent blocking (increase FPS); some frames will never be displayed
Pre-rendered frames are a powerful little feature that you do not get control over at the OpenGL API level. It sets up how deeply the driver is allowed to pipeline everything and depending on the desired task you will trade different types of latency by fiddling with it. Many gamers swear by setting this value to 1 to minimize input latency at the cost of overall framerate "smoothness."
UPDATE:
Pre-rendered frames are one reason for your multi-frame delay. Fixing this in a cross-platform way is difficult (it's a driver setting), but if you have access to Fence Sync Objects you can produce the same behavior as forcing this to 1.
I can explain this in more detail if need be, the general idea is that you insert a fence sync after the buffer swap and then wait for it to be signaled before the first command in the next frame is allowed to begin. Performance may take a nose dive, but latency will be minimized since the CPU won't be rendering ahead of the GPU anymore.
There are a number of latencies at play here.
Input event → drawing state latency
In your typical interactive application you have a event loop that usually goes
collect user input
process user input
determine what's to be drawn
draw to the back buffer
swap back to front buffer
With the usual ways in which event–update–display loops are written there's almost no delay between step 5 of the previous and step 1 of the following iteration. which means that steps 2, 3, and 4 operate with data that lags about one frame period behind.
So this is the first source of latency.
Tripple buffering / composition latency
Many graphics pipelines enable triple buffering for smoother display update. Instead of keeping only a back and a front buffer around, there's also a third buffer inbetween. The average rate at which to these buffers is drawn is the display refresh period. The buffers themself are stepped at exactly the display refresh period. So this adds another frame period of latency.
If you're running on a system with a window compositor (which is the default by MacOS X) this adds effectively another buffer stage, so if you've got a double buffer mode it gives you triple buffer and if you had a triple buffer it'd give you a "quad" buffer (quotes here, because quad buffer is a term usually used with stereoscopic rendering).
What can you do about this:
Turn off composition
Windows through the DWM API and MacOS X allow to turn off composition or bypass the compositor.
Reducing input lag
Try to collect and integrate the user input as late as possible (use high resolution sleeps). If you've got only a very simple scene you can push the drawing quite close to the V-Sync deadline; in fact the NVidia OpenGL implementation has a vendor specific extension that allows to sleep until a specific amount of time before the next V-Sync.
If your scene is complex but is separable in parts that require low latency user input and stuff where it doesn't matter so much you can draw the higher latency stuff earlier and only at the very last moment integrate user input into it. Of course if the mouse is used to control the viewing direction, or even worse you're rendering for a VR head mounted display things are going to become difficult.

Allocating a new buffer per each frame to prevent screen tearing

When I use the SDL library to set the pixel values in the memory and update the screen, screen tearing occurs whenever the update is too fast. I don't know much about the SDL internals, but my understanding from what I see is that:
The update function returns right after signalling the graphics hardware to read the pixel data from (say) buffer1.
The next frame is painted on buffer2, and update is called again, but this was too fast and the reading from buffer1 still hasn't completed;
My program doesn't know anything about the hardware and assumes that its okay to paint again in buffer1, while this buffer is being sent to the monitor.
The screen is torn.
This isn't a big problem when the velocity of the to-be-painted object is not too fast. The screen still tears, but it is almost non-visible to the human eye, but I'd still be happy if this tearing does not occur at all. I dislike vertical sync, as it produces consistent latency per each frame.
My idea is that probably a new screen buffer can be allocated per each frame to be painted on. When the monitor wants to display something, it should read from the newest buffer.
Is this a possible way already used in practice? If I do want to test my idea, what kind of low level and cross platform library or API I may use? SDL? OpenGL?
Do you think that updating the screen faster than the human eye can see it is productive? if you really must have your engine 100% independent of the retrace, use a triple buffer system. One buffer to display, and 2 buffers to update back and forth to until the screen is ready for the next buffer. Triple is as high as you need to go as if you fill the 2nd back buffer, you can just write over the now defunct 1st back buffer instead. No GPU lag and only 3 buffers.
Here is a nice link describing this technique along with some warnings about using it on modern GPUs...

OpenGL framerate: connection with the size of the window

I was in the process of tracking down and eliminating those parts of my C++/OpenGL/GLUT code that were inefficient and slow, and in doing so, I watched my frames per second counter to know if I was actually making progress. I noticed that my frame rate dropped from about 120 to 60 if I maximized the window.
Further experimentation revealed that this was a linear thing, I could change the frame rate by changing the size of the window.
Does this mean that my bottleneck in in the GPU rendering? Surely GPUs these days are more than powerful enough not to notice the difference between a 300x300 and 1920x1080? Or am I asking too much from my graphics card?
The alternative is that there is some bug in my code that is causing the system to slow down on larger renders.
What I am asking is this: is it reasonable to expect a halving of framerate when changing the window size, or is there something very wrong?
Further experimentation revealed that this was a linear thing, I could change the frame rate by changing the size of the window.
Congratulations: You discovered fill rate
Does this mean that my bottleneck in in the GPU rendering?
Yes, pretty much. To be specific the bottleneck is either the bandwidth from/to the graphics memory, or the complexity of the fragment shader, or a combination of both.
Surely GPUs these days are more than powerful enough not to notice the difference between a 300x300 and 1920x1080?
300×300 = 90000
1920×1080 = 2073600
Or in other words: You ask the GPU to fill about 20 times as many pixels. Which means 20 times as much data must be flung around and also be processed.
That drop from 120Hz to 60Hz comes from V-Sync. If you disabled V-Sync you'd find, that your program would probably reach way higher rates than 60Hz for 1920×1080, but for 300×300 it will be something below 180Hz.
The reason for that is simple: When synched to the display vertical retrace, your GPU can "put out" the next frame only at the moment the display is v-syncing. If your display can do 120Hz (like yours as it's obvious) and your rendering takes less time than 1/120s to complete it will make the deadline and your framerate synchronizes to the display. If however drawing a frame takes more then 1/120s, then it will sync with every 2nd frame displayed. If rendering takes more than 1/60s second every 3rd, 1/30s every 4th and so on.

Perfect V-sync implementation for a lightweight OpenGL game: need one tidbit of information

In the game our Internet-assembled team is programming, we're assuming everybody from our audience will have WAY over fullspeed in the game.
So, to save video RAM, and hopefully give a little more idle time to the graphics card, using V-sync without double buffering would be our best option. So, in OpenGL, we need to know how to do that.
From my understanding, V-sync is when the graphics card is paused once it's done rendering a single frame until that frame has finished being sent to the display device. Double buffering doesn't pause render operations (or maybe it does, or maybe it's implementation-specific; not sure), because it instead draws to a second buffer before copying to the framebuffer, so that the monitor either gets the full frame or no new frame at all (specifically, the last stored image in the framebuffer). Well, we don't need that feature, as long as the graphics card just writes to the framebuffer ONLY when it damn needs to.
This is a pretty slow online game (But it's VERY creative ^_^). There's very little realtime action. Therefore, extremely precise user input is not a necessity; it can be captured from the OS as a single unit any time before rendering a frame.
So, in order to do EXACTLY this, I need to be able to get a "Frame has finished sending to monitor" message from OpenGL. Is it possible? If not, what is the best alternative?
The game is being programmed for Windows only at the moment but should have work done for Linux in a few months.
You suffer from a misconception what V-Sync does. There's a part in video RAM that's continously sent to the display device at a constant rate, the frame refresh rate. So immediately after a full frame has been sent the next frame gets sent, after a very short blank time. But the time between sending frames is far shorter than the time it takes to send the full frame.
What happens without V-Sync is, that operations on the contents of the framebuffer get visible, for example if the frame is filled alternating with red and green and there's no V-Sync you'll see red and green bands on the monitor. To avoid this, V-Sync swaps the pointer the display driver uses to access the framebuffer just after a full frame has been sent.
Which brings us to what doublebuffering does. Without doublebuffering there's little use for a V-Sync. The action triggered by V-Sync must happen very, very fast. So this boils down to swapping a pointer or a very fast blitting operation (potentially by simply setting CoW attributes for the GPU's MMU).
Without doublebuffering and no V-Sync the effect is, that one can see the process in which the picture is rendered piece by piece to the framebuffer. Of course if rendering happens faster than a frame period this has the effect that top-down you'll see a only sparsely populated image with more and more content being visible toward the bottem, and somewhere inbetween it'll hit the lower screen edge, wapping around to the top. The intersection line will be moving.
TL;DR: Just use double buffering and enable V-Sync for buffer swap. Don't be afraid of memory consumption. All GPUs in circulation today have more than enough RAM to easily provide the memory for doublebuffered colour planes. Just do the math: 1920x1200 * RGB = 6MiB, even the smallest GPUs in PCs today deliver at least 128MiB of RAM. Mobile devices, let's say iPad 1024*768 * RGB = 2MiB vs. 32MiB for graphics. The UI of the iPad is doublebuffered anyway.
You can use wglGetProcAddress to get the address of wglSwapIntervalEXT, and then call wglSwapIntervalEXT(1); to synchronize updates with the vertical synch. When you do this, you don't get a message at the vertical synch -- instead glFlush simply doesn't return until a vertical retrace has happened, and the screen has been updated. So, you have a WM_PAINT handler that looks something like this:
BeginPaint
wglMakeCurrent
do drawing
glFlush
EndPaint
The glFlush is needed in any case, to ensure the drawing you've done gets sent to the screen.

Opengl Hardware Accelerator Sleeping After Period of Inactivity

I am working on a Opengl based 2D CAD software which requires heavy use of hardware OpenGL acceleator (pushing 250 million vertex per second at times). Here is my problem.... whenever the viewport is stagnant for more than 10 seconds, the Opengl accelerator (Geforce 9800 GT in this case) goes to a inactive mode. When the viewport is being rendered again after the inactive period, I am getting 1/4th the normal framerate and this will last for 3-4 seconds before the 3D accelerator wakes up and kicks into full speed.
Question :
How do I prevent this from happening ?
Is there an Opengl way to prevent GPus from going into inactive mode?
Thank you for your replies.
Gary
There are several ways you can keep a GPU busy but the most sure fire way to guarantee it is doing something and not just deferring your commands is to actually draw something. glClear() and every glDraw* command constitute actual drawing commands. Throw in a glFinish() at the end of the draw to guarantee execution of the gl command stream.
Presumably you don't want to see this drawing so create a new framebuffer object, create a small RGBA texture (say 256 on a side), then attach the texture to color attachment point 0.
When you want to keep the GPU busy draw to this offscreen buffer.
This is all with the assumption that you can't, for instance, just change your boot-args or control panel settings to modulate power management behavior on the card. Every OS has different semantics here.