Weird results when running my program in gDebugger - c++

I'm trying to run a an OpenGL program through gDEBugger (http://www.gremedy.com) and I'm seeing a couple of strange things:
The frames seem to be rendering MUCH faster with gDEBugger. For example, if I update some object's position every frame - it'll just fly across the screen really fast, but when the program is run without gDEBugger, it'll move at much slower speed.
Strangely, gDEBugger reports 8 GL frames/second. Which doesn't seem realistic: clearly, FPS is higher than 8 (btw I have checked every possible OpenGL Render Frame Terminator in the Debug Settings dialog). Here's a screenshot (click here for full resolution):
My program uses SDL to create an OpenGL rendering context:
Uint32 flags = SDL_HWSURFACE | SDL_DOUBLEBUF | SDL_OPENGL;
if(fullscreen) flags |= SDL_FULLSCREEN;
// Initialize SDL's video subsystem
SDL_Init(SDL_INIT_VIDEO) == -1;
// Attempt to set the video mode
SDL_GL_SetAttribute( SDL_GL_DOUBLEBUFFER, 1 );
SDL_Surface* s = SDL_SetVideoMode(width, height, 0, flags);
I'm using Windows 7 and an NVidia graphics card (geforce gtx 660m).
My question is, how does one explain the strange behavior that I'm seeing in 1) and 2) ? Could it be that for some reason the rendering is being performed in software instead of the graphics card?
UPD: Obviously, I'm calling SDL_GL_SwapBuffers (which isn't listed as one of render frame terminators) at the end of each frame, but I assume it should just call the windows SwapBuffers function.

Regarding issue 1: apparently gDebugger disables wait-for-vsync, which is why the framerate is much higher than 60 fps.
Regarding issue 2: for some reason, when working with SDL, 2 OpenGL contexts are created. One can see the correct number by adding performance counters for the second context.

Related

Is there a way to remove 60 fps cap in GLFW?

I'm writting a game with OGL / GLFW in c++.
My game is always running at 60 fps and without any screen tearing. After doing some research, it seems that the glfwSwapInterval() function should be able to enable/disable V-sync or the 60fps cap.
However, no matter the value I pass to the function, the framerate stays locked at 60 and there is no tearing whatsoever. I have also checked the compositor settings on linux and the nvidia panel, and they take no effect.
This is a common thing I assume, is there a way to get around this fps cap?
Is there a way to remove 60 fps cap in GLFW?
The easiest way is to use single buffering instead of double buffering. Since at single buffering is always use the same buffer there is no buffer swap and no "vsync".
Use the glfwWindowHint to disable double buffering:
glfwWindowHint(GLFW_DOUBLEBUFFER, GLFW_FALSE);
GLFWwindow *wnd = glfwCreateWindow(w, h, "OGL window", nullptr, nullptr);
Note, when you use singel buffering, then you have to explicite force execution of the GL commands by (glFlush), instead of the buffer swap (glfwSwapBuffers).
Another possibility is to set the number of screen updates to wait from the time glfwSwapBuffers was called before swapping the buffers to 0. This can be done by glfwSwapInterval, after making the OpenGL context current (glfwMakeContextCurrent):
glfwMakeContextCurrent(wnd);
glfwSwapInterval(0);
But note, whether this solution works or not, may depend on the hardware and the driver.

Reliable windowed vsync with OpenGL on Windows?

SUMMARY
It seems that vsync with OpenGL is broken on Windows in windowed mode. I've tried different APIs (SDL, glfw, SFML), all with the same result: While the framerate is limited (and consistently around 16-17 ms according to CPU measurements on multiple 60 Hz setups I've tried), and the CPU is in fact sleeping most of the time, frames are very often skipped. Depending on the machine and the CPU usage for things other than rendering, this can be as bad as effectively cutting the frame rate in half. This problem does not seem to be driver related.
How to have working vsync on Windows with OpenGL in windowed mode, or a similar effect with these properties (if I forgot something notable, or if something is not sensible, please comment):
CPU can sleep most of the time
No tearing
No skipped frames (under the assumption that the system is not overloaded)
CPU gets to know when a frame has actually been displayed
DETAILS / SOME RESEARCH
When I googled opengl vsync stutter or opengl vsync frame drop or similar queries, I found that many people are having this issue (or a very similar one), yet there seems to be no coherent solution to the actual problem (many inadequately answered questions on the gamedev stackexchange, too; also many low-effort forums posts).
To summarize my research: It seems that the compositing window manager (DWM) used in newer versions of Windows forces triple buffering, and that interferes with vsync. People suggest disabling DWM, not using vsync, or going fullscreen, all of which are not a solution to the original problem (FOOTNOTE1). I have also not found a detailed explanation why triple buffering causes this issue with vsync, or why it is technologically not possible to solve the problem.
However: I've also tested that this does not occur on Linux, even on VERY weak PCs. Therefore it must be technically possible (at least in general) for OpenGL-based hardware acceleration to have functional vsync enabled without skipping frames.
Also, this is not a problem when using D3D instead of OpenGL on Windows (with vsync enabled). Therefore it must be technically possible to have working vsync on Windows (I have tried new, old, and very old drivers and different (old and new) hardware, although all the hardware setups I have available are Intel + NVidia, so I don't know what happens with AMD/ATI).
And lastly, there surely must be software for Windows, be it games, multimedia applications, creative production, 3D modeling/rendering programs or whatever, that use OpenGL and work properly in windowed mode while still rendering accurately, without busy-waiting on the CPU, and without frame drops.
I've noticed that, when having a traditional rendering loop like
while (true)
{
poll_all_events_in_event_queue();
process_things();
render();
}
The amount of work the CPU has to do in that loop affects the behavior of the stuttering. However, this is most definitely not an issue of the CPU being overloaded, as the problem also occurs in one of the most simple programs one could write (see below), and on a very powerful system that does nothing else (the program being nothing other than clearing the window with a different color on each frame, and then displaying it).
I've also noticed that it never seems to get worse than skipping every other frame (i.e., in my tests, the visible framerate was always somewhere between 30 and 60 on a 60 Hz system). You can observe somewhat of a Nyquist sampling theorem violation when running program that changes the background color between 2 colors on odd and even frames, which makes me believe that something is not synchronized properly (i.e. a software bug in Windows or its OpenGL implementation). Again, the framerate as far as the CPU is concerned is rock solid. Also, timeBeginPeriod has had no noticeable effect in my tests.
(FOOTNOTE1) It should be noted though that, because of the DWM, tearing does not occur in windowed mode (which is one of the two main reasons to use vsync, the other reason being making the CPU sleep for the maximum amount of time possible without missing a frame). So it would be acceptable for me to have a solution that implements vsync in the application layer.
However, the only way I see that being possible is there is a way to explicitly (and accurately) wait for a page flip to occur (with possibility of timeout or cancellation), or to query a non-sticky flag that is set when the page is flipped (in a way that doesn't force flushing the entire asynchronous render pipeline, like for example glGetError does), and I have not found a way to do either.
Here is some code to get a quick example running that demonstrates this problem (using SFML, which I found to be the least painful to get to work).
You should see homogenous flashing. If you ever see the same color (black or purple) for more than one frame, it's bad.
(This flashes the screen with the display's refresh rate, so maybe epilepsy warning):
// g++ TEST_TEST_TEST.cpp -lsfml-system -lsfml-window -lsfml-graphics -lGL
#include <SFML/System.hpp>
#include <SFML/Window.hpp>
#include <SFML/Graphics.hpp>
#include <SFML/OpenGL.hpp>
#include <iostream>
int main()
{
// create the window
sf::RenderWindow window(sf::VideoMode(800, 600), "OpenGL");
window.setVerticalSyncEnabled(true);
// activate the window
window.setActive(true);
int frame_counter = 0;
sf::RectangleShape rect;
rect.setSize(sf::Vector2f(10, 10));
sf::Clock clock;
while (true)
{
// handle events
sf::Event event;
while (window.pollEvent(event))
{
if (event.type == sf::Event::Closed)
{
return 0;
}
}
++frame_counter;
if (frame_counter & 1)
{
glClearColor(0, 0, 0, 1);
}
else
{
glClearColor(60.0/255.0, 50.0/255.0, 75.0/255.0, 1);
}
// clear the buffers
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Enable this to display a column of rectangles on each frame
// All colors (and positions) should pop up the same amount
// This shows that apparently, 1 frame is skipped at most
#if 0
int fc_mod = frame_counter % 8;
int color_mod = fc_mod % 4;
for (int i = 0; i < 30; ++i)
{
rect.setPosition(fc_mod * 20 + 10, i * 20 + 10);
rect.setFillColor(
sf::Color(
(color_mod == 0 || color_mod == 3) ? 255 : 0,
(color_mod == 0 || color_mod == 2) ? 255 : 0,
(color_mod == 1) ? 155 : 0,
255
)
);
window.draw(rect);
}
#endif
int elapsed_ms = clock.restart().asMilliseconds();
// NOTE: These numbers are only valid for 60 Hz displays
if (elapsed_ms > 17 || elapsed_ms < 15)
{
// Ideally you should NEVER see this message, but it does tend to stutter a bit for a second or so upon program startup - doesn't matter as long as it stops eventually
std::cout << elapsed_ms << std::endl;
}
// end the current frame (internally swaps the front and back buffers)
window.display();
}
return 0;
}
System info:
Verified this problem on these systems:
Windows 10 x64 i7-4790K + GeForce 970 (verified that problem does not occur on Linux here) (single 60 Hz monitor)
Windows 7 x64 i5-2320 + GeForce 560 (single 60 Hz monitor)
Windows 10 x64 Intel Core2 Duo T6400 + GeForce 9600M GT (verified that problem does not occur on Linux here) (single 60 Hz laptop display)
And 2 other people using Windows 10 x64 and 7 x64 respectively, both "beefy gaming rigs", can request specs if necessary
UPDATE 20170815
Some additional testing I've done:
I tried adding explicit sleeps (via the SFML library, which basically just calls Sleep from the Windows API while ensuring that timeBeginPeriod is minimal).
With my 60 Hz setup, A frame should ideally be 16 2/3 Hz. According to QueryPerformanceCounter measurements, my system is, most of the time, very accurate with those sleeps.
Adding a sleep of 17 ms causes me to render slower than the refresh rate. When I do this, some frames are displayed twice (this is expected), but NO frames are dropped, ever. The same is true for even longer sleeps.
Adding a sleep of 16 ms sometimes causes a frame to be displayed twice, and sometimes causes a frame to be dropped. This is plausible in my opinion, considering a more or less random combination of the result at 17 ms, and the result at no sleep at all.
Adding a sleep of 15 ms behaves very similarly to having no sleep at all. It's fine for a short moment, then about every 2nd frame is dropped. The same is true for all values from 1 ms to 15 ms.
This reinforced my theory that the problem might be nothing other than some plain old concurrency bug in the vsync logic in the OpenGL implementation or the operating system.
I also did more tests on Linux. I hadn't really looked much into it before - I merely verified that the frame drop problem didn't exist there and that the CPU was, in fact, sleeping most of the time. I realised that, depending on several factors, I can make tearing consistenly occur on my test machine, despite vsync. As of yet, I do not know whether this that issue is connected to the original problem, or if it is something entirely different.
It seems like the better approach would be some gnarly workarounds and hacks, and ditching vsync altogether and implementing everything in the application (because apparently in 2017 we can't get the most basic frame rendering right with OpenGL).
UPDATE 20170816
I have tried to "reverse-engineer" a bunch of open source 3D engines (got hung up on obbg (https://github.com/nothings/obbg) in particular).
First, I checked that the problem does not occur there. The frame rate is butter smooth. Then, I added my good old flashing purple/black with the colored rects and saw that the stuttering was indeed minimal.
I started ripping out the guts of the program until I ended up with a simple program like mine. I found that there is some code in obbg's rendering loop that, when removed, causes heavy stutter (namely, rendering the main part of the obbg ingame world). Also, there is some code in the initialization that also causes stutter when removed (namely, enabling multisampling). After a few hours of fiddling around it seems that OpenGL needs a certain amount of workload to function properly, but I have yet to find out what exactly needs to be done. Maybe rendering a million random triangles or something will do.
I also reaslised that all my existing tests behave slightly differently today. It seems that I have overall fewer, but more randomly distributed frame drops today than the days before.
I also created a better demo project that uses OpenGL more directly, and since obbg used SDL, I also switched to that (although I briefly looked over the library implementations and it would surprise me if there was a difference, but then again this entire ordeal is a surprise anyway). I wanted to approach the "working" state from both the obbg-based side, and the blank project side so I can be really sure what the problem is. I just put all the required SDL binaries inside the project; if as long as you have Visual Studio 2017 there should be no additional dependencies and it should build right away. There are many #ifs that control what is being tested.
https://github.com/bplu4t2f/sdl_test
During the creation of that thing I also took another look how SDL's D3D implementation behaves. I had tested this previously, but perhaps not quite extensively enough. There were still no duplicate frames and no frame drops at all, which is good, but in this test program I implemented a more accurate clock.
To my surprise I realised that, when using D3D instead of OpenGL, many (but not the majority) loop iterations take somewhere between 17.0 and 17.2 ms (I would not have caught that with my previous test programs). This does not happen with OpenGL. The OpenGL rendering loop is consistently in the range 15.0 .. 17.0. If it is true that sometimes there needs to be a slightly longer waiting period for the vertical blank (for whatever reason), then OpenGL seems to miss that. That might be the root cause of the entire thing?
Yet another day of literally staring at a flickering computer screen. I have to say I really did not expect to spend that amount of time on rendering nothing but a flickering background and I'm not particularly fond of that.

OpenGL / SDL2 : stencil buffer bits always 0 on PC

I'm writing an app using SDL2 / OpenGL, and doing some stencil operations.
Everything works as expected on Mac, however on PC the stenciling doesn't work.
Upon closer inspection I realized that the following code provides different outcomes on my Mac and PC:
SDL_Init(SDL_INIT_VIDEO);
SDL_GL_SetAttribute( SDL_GL_STENCIL_SIZE, 8 );
SDL_CreateWindow( ... );
SDL_CreateRenderer( ... )
... do stuff ...
When I print out the stencil bits ( SDL_GL_STENCIL_SIZE ) on Mac I get 8. When I do the same on PC, I get 0.
The same happens whether I run it on an actual PC, or on a PC emulator on the Mac.
What am I missing? How can I force SDL2 to request a context with a stencil buffer?
It looks to me like the Mac's OpenGL implementation has different defaults than the PC one, so I'm probably forgetting to do something to specifically request a stencil buffer, but I can't find any good information online ...
Help ^_^' ?
Never mind, I found the answer:
On PC SDL2 was defaulting to Direct3D (which I guess would explain why my opengl stencil buffer was not there ^_^').
To force SDL2 to use a specific driver, you can use the second parameter in SDL_CreateRenderer.
Problem solved :D
StackOverflow, the biggest rubber duck available... ^-^'

Why do DirectX fullscreen applications give black screenshots?

You may know that trying to capture DirectX fullscreen applications the GDI way (using BitBlt()) gives a black screenshot.
My question is rather simple but I couldn't find any answer: why? I mean technically, why does it give a black screenshot?
I'm reading a DirectX tutorial here: http://www.directxtutorial.com/Lesson.aspx?lessonid=9-4-1. It's written:
[...] the function BeginScene() [...] does something called locking, where the buffer in the video RAM is 'locked', granting you exclusive access to this memory.
Is this the reason? VRAM is locked so GDI can't access it and it gives a black screenshot?
Or is there another reason? Like DirectX directly "talks" to the graphic card and GDI doesn't get it?
Thank you.
The reason is simple: performance.
The idea is to render a scene as much as possible on the GPU out of lock-step with the CPU. You use the CPU to send the rendering buffers to the GPU (vertex, indices, shaders etc), which is overall really cheap because they're small, then you do whatever you want, physics, multiplayer sync etc. The GPU can just crunch the data and render it on its own.
If you require the scene to be drawn on the window, you have to interrupt the GPU, ask for the rendering buffer bytes (LockRect), ask for the graphics object for the window (more interference with the GPU), render it and free every lock. You just lost any sort of gain you had by rendering on the GPU out of sync with the CPU. Even worse when you think of all the different CPU cores just sitting idle because you're busy "rendering" (more like waiting on buffer transfers).
So what graphics drivers do is they paint the rendering area with a magic color and tell the GPU the position of the scene, and the GPU takes care of overlaying the scene over the displayed screen based on the magic color pixels (sort of a multi-pass pixel shader that takes from the 2nd texture when the 1st texture has a certain color for x,y, but not that slow). You get completely out of sync rendering, but when you ask the OS for its video memory, you get the magic color where the scene is because that's what it actually uses.
Reference: http://en.wikipedia.org/wiki/Hardware_overlay
I believe it is actually due to double buffering. I'm not 100% sure but that was actually the case when I tested screenshots in OpenGL. I would notice that the DC on my window was not the same. It was using two different DC's for this one game.. For other games I wasn't sure what it was doing. The DC was the same but swapbuffers was called so many times that I don't think GDI was even fast enough to screenshot it.. Sometimes I would get half a screenshot and half black..
However, when I hooked into the client, I was able to just ask for the pixels like normal. No GDI or anything. I think there is a reason why we don't use GDI when drawing in games that use DirectX or OpenGL..
You can always look at ways to capture the screen here:http://www.codeproject.com/Articles/5051/Various-methods-for-capturing-the-screen
Anyway, I use the following for grabbing data from DirectX:
HRESULT DXI_Capture(IDirect3DDevice9* Device, const char* FilePath)
{
IDirect3DSurface9* RenderTarget = nullptr;
HRESULT result = Device->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &RenderTarget);
result = D3DXSaveSurfaceToFile(FilePath, D3DXIFF_PNG, RenderTarget, nullptr, nullptr);
SafeRelease(RenderTarget);
return result;
}
Then in my hooked Endscene I call it like so:
HRESULT Direct3DDevice9Proxy::EndScene()
{
DXI_Capture(ptr_Direct3DDevice9, "C:/Ssers/School/Desktop/Screenshot.png");
return ptr_Direct3DDevice9->EndScene();
}
You can either use microsoft detours for hooking EndScene of some external application or you can use a wrapper .dll.

Render a unique video stream in two separate opengl windows

I rendered this video stream in one opengl window (called by the Main window (UnitMainForm.cpp: I am using Borland Builder C++ 6.0)).
In this first openGL window, there is a timer on which timer a boolean "lmutex" is switched and a "DrawScene" function is called followed by a "Yield" function.
In this "DrawScene" function, the video stream frames are drawn by a function called "paintgl".
How can I render this video stream in another borland builder window, preferably with the use of pixel buffers?
This second borland builder is intended to be a preview window, so it can be of a smaller size (mipmap?) and with a slower timer (or the same size, same timer, it's ok too).
Here are the results I had with different techniques:
with pixel buffers, I achieved (all in the DrawScene function) to write the paintgl on a backbuffer and with wglShareLists to render this backbuffer to a texture mapped to a quad; but I can't manage to use this texture in another window, wglShareLists works in the first window but fails in the second window when I try to share the objects of the back_buffer with the new window RC (pixel format problem ?) (C++ problem perhaps? How to keep the buffer without it being released, and render it on a quad in a different DC (or same RC ?):
Access violation on wglBindTexImageARB ; due to WGL_FRONT_LEFT_ARB not defined allthoug wglext.h included?
wglShareLists fails with error 6 : ERROR_INVALID_HANDLE The handle is invalid
with calling two objects of the same class (the opengl window): I see one time on three times the two video streams correctly rendered; but one time on three times there is a constant flicker on one or both window, and one time on three one or the other window is constantly blank or constantly black; perhaps should I synchronize the timers or is there a way to have no flicker? but this solution seems to me sketchy: the video stream sometimes slows on one of the two windows, I think it heavy to call twice the capture video stream.
I tried to use FBO, with GLew, or with wgl functions but I got stuck on access violations on glGenFrameBuffer; perhaps Borland 6 (2002) is perhaps too old to support FBO (~2004 ?); I updated my really recent NVIDIA card (9800GT) drivers and downloaded the nvidia opengl SDK (which is just an exe file : strange) :
Using Frame Buffer Objects (FBO) in Borland C++ Builder 6
Is there a C++ program canvas, or pieces of code which would clarify how I can display in a second window the video I perfectly display in one window?
First of, all the left and right draw buffers are not meant to be used for rendering to two different render contexts, but to allow for stereoscopic rendering in one render context being signalled to some 3D hardware (e.g. shutter glasses) by the driver. Apart from that does your graphics hardware/driver not support that extension - the identifiers being defined in glew or not.
What you want to do is to render your video frames to a VBO and share that VBO with two render contexts. Basically a VBO is just a texture that you can use both as render target (render buffer) or render source (texture).
There are numerous VBO examples out there, most coded in C though. If you can read German, you may however want to check DelphiGL.com; the people there have very good OpenGL knowledge and quite a useful Wiki with docs, examples and tutorials.