I am building a Windows MFC application. During some animations where objects collide at high speeds, my physics engine behaves unpredictably. I believe it has something to do with me dropping frames somehow. I was told that I'm not using double buffering. I thought I was, but I am still fairly new to this. Here is how I draw to the screen in OnPaint:
#include "pch.h"
#include "framework.h"
#include "ChildView.h"
#include "DoubleBufferDC.h"
void CChildView::OnPaint()
CPaintDC paintDC(this); // device context for painting
CDoubleBufferDC dc(&paintDC); // device context for painting
Graphics graphics(dc.m_hDC); // Create GDI+ graphics context
if (mFirstDraw)
mFirstDraw = false;
SetTimer(1, FrameDuration, nullptr);
* Initialize the elapsed time system
LARGE_INTEGER time, freq;
mLastTime = time.QuadPart;
mTimeFreq = double(freq.QuadPart);
* Compute the elapsed time since the last draw
long long diff = time.QuadPart - mLastTime;
double elapsed = double(diff) / mTimeFreq;
mLastTime = time.QuadPart;
void CChildView::OnTimer(UINT_PTR nIDEvent)
When I create the Graphics object from the CDoubleBufferDC object, is this not creating a back buffer? I then pass this Graphics object to OnDraw where it is drawn on. If it is creating a back buffer, I'm confused about where the front buffer is created and when it is drawn on the screen.
Here are my current thoughts on how I think this works:
The CPaintDC object is the front buffer
The CDoubleBufferDC object is the back buffer
A graphics object is created from the CDoubleBufferDC object which I draw the current state of the game on
If this is the case, when is the front buffer ever replaced with the new buffer created in the back? Can someone help me understand, and use double buffering if I'm not already?

To answer your actual question, what is probably happening is that your CDoubleBufferDC() class has a destructor that swaps out the DC's - this is a common idiom (IIRC in 'modern' MFC versions CMemDC does that too). So yes I think you are using double buffering here, even if accidentally. You can do the drawing in GDI(+) for simple games, if this whole thing is a learning exercise, this way is much easier to understand than using DirectX.
However, you do need to decouple your collision detection from your drawing routines, so that any dropped frames don't mess up your timing (that is, if you're things so complex that they take that much time - in which case, you probably shouldn't be using GDI...). In other words, if your collision detection assumes that each OnDraw() call completes in less than 1/framerate seconds but that doesn't always happen, you'll run into problems at some point. There are many articles online about how to structure your game loop; I don't have enough information to link you to a specific one. Also it won't be easy to find one using the technology you're using, in 2020... But I do think that using a simple OnPaint()/GDI one is great for learning, as it will hide much of the complexity you need for more modern approaches.


OpenGL render loop

I have an application which renders a 3d object using OpenGL, allowing the user to rotate and zoom and inspect the object. Currently, this is driven directly by received mouse messages (it's a Windows MFC MDI application). When a mouse movement is received, the viewing matrix is updated, and the scene re-rendered into the back buffer, and then SwapBuffers is called. For a spinning view, I start a 20ms timer and render the scene on the timer, with small updates to the viewing matrix each frame. This is OK, but is not perfectly smooth. It sometimes pauses or skips frames, and is not linked to vsync. I would love to make it smoother and smarter with the rendering.
It's not like a game where it needs to be rendered every frame though. There are long periods where the object is not moved, and does not need to be re-rendered.
I have come across GLFW library and the glfwSwapInterval function. Is this a commonly used solution?
Should I create a separate thread for the render loop, rather than being message/timer driven?
Are there other solutions I should investigate?
Are there any good references for how to structure a suitable render loop? I'm OK with all the rendering code - just looking for a better structure around the rendering code.
So, I consider you are using GLFW for creating / operating your window.
If you don't have to update your window on each frame, suggest using glfwWaitEvents() or glfwWaitEventsTimeout(). The first one tells the system to put this process (not window) on sleep state, until any event happens (mouse press / resize event etc.). The second one is similar, but you can specify a timeout for the sleep state. The function will wait till any event happens OR till specified time runs out.
What's for the glfwSwapInterval(), this is probably not the solution you are looking for. This function sets the amount of frames that videocard has to skip (wait) when glfwSwapBuffers() is called.
If you, for example, use glfwSwapInterval(1) (assuming you have valid OpenGL context), this will sync your context to the framerate of your monitor (aka v-sync, but I'm not sure if it is valid to call it so).
If you use glfwSwapInterval(0), this will basicly unset your syncronisation with monitor, and videocard will swap buffers with glfwSwapBuffers() instanly, without waiting.
If you use glfwSwapInterval(2), this will double up the time that glfwSwapBuffers() waits after (or before?) flushing framebuffer to screen. So, if you have, for instance, 60 fps on your display, using glfwSwapInterval(2) will result in 30 fps in your program (assuming you use glfwSwapBuffers() to flush framebuffer).
The glfwSwapInterval(3) will give you 20 fps, glfwSwapInterval(4) - 15 fps and so on.
As for separate render thread, this is good if you want to divide your "thinking" and rendering processes, but it comes with its own advantages, disadvantages and difficulties. Tip: some window events can't be handled "properly" without having separate thread (See this question).
The usual render loop looks like this (as far as I've learned from learnopengl lessons):
// Setup process before...
while(!window_has_to_close) // <-- Run game loop until window is marked "has to
// close". In GLFW this is done using glfwWindowShouldClose()
// Prepare for handling input events (e. g. callbacks in GLFW)
// Handle events (if there are none, this is just skipped)
glfwPollEvents(); // <-- You can also use glfwWaitEvents()
// "Thinknig step" of your program
// Clear window framebuffer (better also put this in separate func)
glClearColor(0.f, 0.f, 0.f, 1.f);
// Render everything
// Swap buffers (you can also put this in separate function)
glfwSwapBuffers(window); // <-- Flush framebuffer to screen
// Exiting operations after...
See this ("Ready your engines" part) for additional info. Wish you luck!

How to capture windows screen at 60 frames per second using api? [duplicate]

I want to write a screencasting program for the Windows platform, but am unsure of how to capture the screen. The only method I'm aware of is to use GDI, but I'm curious whether there are other ways to go about this, and, if there are, which incurs the least overhead? Speed is a priority.
The screencasting program will be for recording game footage, although, if this does narrow down the options, I'm still open for any other suggestions that fall out of this scope. Knowledge isn't bad, after all.
Edit: I came across this article: Various methods for capturing the screen. It has introduced me to the Windows Media API way of doing it and the DirectX way of doing it. It mentions in the Conclusion that disabling hardware acceleration could drastically improve the performance of the capture application. I'm curious as to why this is. Could anyone fill in the missing blanks for me?
Edit: I read that screencasting programs such as Camtasia use their own capture driver. Could someone give me an in-depth explanation on how it works, and why it is faster? I may also need guidance on implementing something like that, but I'm sure there is existing documentation anyway.
Also, I now know how FRAPS records the screen. It hooks the underlying graphics API to read from the back buffer. From what I understand, this is faster than reading from the front buffer, because you are reading from system RAM, rather than video RAM. You can read the article here.
This is what I use to collect single frames, but if you modify this and keep the two targets open all the time then you could "stream" it to disk using a static counter for the file name. - I can't recall where I found this, but it has been modified, thanks to whoever!
void dump_buffer()
IDirect3DSurface9* pRenderTarget=NULL;
IDirect3DSurface9* pDestTarget=NULL;
const char file[] = "Pickture.bmp";
// sanity checks.
if (Device == NULL)
// get the render target surface.
HRESULT hr = Device->GetRenderTarget(0, &pRenderTarget);
// get the current adapter display mode.
//hr = pDirect3D->GetAdapterDisplayMode(D3DADAPTER_DEFAULT,&d3ddisplaymode);
// create a destination surface.
hr = Device->CreateOffscreenPlainSurface(DisplayMde.Width,
//copy the render target to the destination surface.
hr = Device->GetRenderTargetData(pRenderTarget, pDestTarget);
//save its contents to a bitmap file.
hr = D3DXSaveSurfaceToFile(file,
// clean up.
EDIT: I can see that this is listed under your first edit link as "the GDI way". This is still a decent way to go even with the performance advisory on that site, you can get to 30fps easily I would think.
From this comment (I have no experience doing this, I'm just referencing someone who does):
HDC hdc = GetDC(NULL); // get the desktop device context
HDC hDest = CreateCompatibleDC(hdc); // create a device context to use yourself
// get the height and width of the screen
int height = GetSystemMetrics(SM_CYVIRTUALSCREEN);
int width = GetSystemMetrics(SM_CXVIRTUALSCREEN);
// create a bitmap
HBITMAP hbDesktop = CreateCompatibleBitmap( hdc, width, height);
// use the previously created device context with the bitmap
SelectObject(hDest, hbDesktop);
// copy from the desktop device context to the bitmap device context
// call this once per 'frame'
BitBlt(hDest, 0,0, width, height, hdc, 0, 0, SRCCOPY);
// after the recording is done, release the desktop context you got..
ReleaseDC(NULL, hdc);
// ..delete the bitmap you were using to capture frames..
// ..and delete the context you created
I'm not saying this is the fastest, but the BitBlt operation is generally very fast if you're copying between compatible device contexts.
For reference, Open Broadcaster Software implements something like this as part of their "dc_capture" method, although rather than creating the destination context hDest using CreateCompatibleDC they use an IDXGISurface1, which works with DirectX 10+. If there is no support for this they fall back to CreateCompatibleDC.
To change it to use a specific application, you need to change the first line to GetDC(game) where game is the handle of the game's window, and then set the right height and width of the game's window too.
Once you have the pixels in hDest/hbDesktop, you still need to save it to a file, but if you're doing screen capture then I would think you would want to buffer a certain number of them in memory and save to the video file in chunks, so I will not point to code for saving a static image to disk.
I wrote a video capture software, similar to FRAPS for DirectX applications. The source code is available and my article explains the general technique. Look at
Respect to your questions related to performance,
DirectX should be faster than GDI except when you are reading from the frontbuffer which is very slow. My approach is similar to FRAPS (reading from backbuffer). I intercept a set of methods from Direct3D interfaces.
For video recording in realtime (with minimal application impact), a fast codec is essential. FRAPS uses it's own lossless video codec. Lagarith and HUFFYUV are generic lossless video codecs designed for realtime applications. You should look at them if you want to output video files.
Another approach to recording screencasts could be to write a Mirror Driver. According to Wikipedia: When video mirroring is active, each time the system draws to the primary video device at a location inside the mirrored area, a copy of the draw operation is executed on the mirrored video device in real-time. See mirror drivers at MSDN:
I use d3d9 to get the backbuffer, and save that to a png file using the d3dx library:
IDirect3DSurface9 *surface ;
// GetBackBuffer
idirect3ddevice9->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &surface ) ;
// save the surface
D3DXSaveSurfaceToFileA( "filename.png", D3DXIFF_PNG, surface, NULL, NULL ) ;
SAFE_RELEASE( surface ) ;
To do this you should create your swapbuffer with
d3dpps.SwapEffect = D3DSWAPEFFECT_COPY ; // for screenshots.
(So you guarantee the backbuffer isn't mangled before you take the screenshot).
In my Impression, the GDI approach and the DX approach are different in its nature.
painting using GDI applies the FLUSH method, the FLUSH approach draws the frame then clear it and redraw another frame in the same buffer, this will result in flickering in games require high frame rate.
WHY DX quicker?
in DX (or graphics world), a more mature method called double buffer rendering is applied, where two buffers are present, when present the front buffer to the hardware, you can render to the other buffer as well, then after the frame 1 is finished rendering, the system swap to the other buffer( locking it for presenting to hardware , and release the previous buffer ), in this way the rendering inefficiency is greatly improved.
WHY turning down hardware acceleration quicker?
although with double buffer rendering, the FPS is improved, but the time for rendering is still limited. modern graphic hardware usually involves a lot of optimization during rendering typically like anti-aliasing, this is very computation intensive, if you don't require that high quality graphics, of course you can just disable this option. and this will save you some time.
I think what you really need is a replay system, which I totally agree with what people discussed.
I wrote a class that implemented the GDI method for screen capture. I too wanted extra speed so, after discovering the DirectX method (via GetFrontBuffer) I tried that, expecting it to be faster.
I was dismayed to find that GDI performs about 2.5x faster. After 100 trials capturing my dual monitor display, the GDI implementation averaged 0.65s per screen capture, while the DirectX method averaged 1.72s. So GDI is definitely faster than GetFrontBuffer, according to my tests.
I was unable to get Brandrew's code working to test DirectX via GetRenderTargetData. The screen copy came out purely black. However, it could copy that blank screen super fast! I'll keep tinkering with that and hope to get a working version to see real results from it.
For C++ you can use:
This may hower not work on all types of 3D applications/video apps. Then this link may be more useful as it describes 3 different methods you can use.
Old answer (C#):
You can use System.Drawing.Graphics.Copy, but it is not very fast.
A sample project I wrote doing exactly this:
I'm planning to update this sample using a faster method like Direct3D:
And here is a link for capturing to video: How to capture screen to be video using C# .Net?
You want the Desktop Duplication API (available since Windows 8). That is the officially recommended way of doing it, and it's also the most CPU efficient.
One nice feature it has for screencasting is that it detects window movement, so you can transmit block deltas when windows get moved around, instead of raw pixels. Also, it tells you which rectangles have changed, from one frame to the next.
The Microsoft example code is quite complex, but the API is actually simple and easy to use. I've put together an example project that is much simpler:
Simplified Sample Code
Microsoft References
Desktop Duplication API
Official example code (my example above is a stripped down version of this)
A few things I've been able to glean: apparently using a "mirror driver" is fast though I'm not aware of an OSS one.
Why is RDP so fast compared to other remote control software?
Also apparently using some convolutions of StretchRect are faster than BitBlt
And the one you mentioned (fraps hooking into the D3D dll's) is probably the only way for D3D applications, but won't work with Windows XP desktop capture. So now I just wish there were a fraps equivalent speed-wise for normal desktop windows...anybody?
(I think with aero you might be able to use fraps-like hooks, but XP users would be out of luck).
Also apparently changing screen bit depths and/or disabling hardware accel. might help (and/or disabling aero). includes a reasonably fast BitBlt based capture utility, and a benchmarker as part of its install, which can let you benchmark BitBlt speeds to optimize them.
VirtualDub also has an "opengl" screen capture module that is said to be fast and do things like change detection
You can try the c++ open source project WinRobot #git, a powerful screen capturer
CComPtr<IWinRobotService> pService;
hr = pService.CoCreateInstance(__uuidof(ServiceHost) );
//get active console session
CComPtr<IUnknown> pUnk;
hr = pService->GetActiveConsoleSession(&pUnk);
CComQIPtr<IWinRobotSession> pSession = pUnk;
// capture screen
pUnk = 0;
hr = pSession->CreateScreenCapture(0,0,1280,800,&pUnk);
// get screen image data(with file mapping)
CComQIPtr<IScreenBufferStream> pBuffer = pUnk;
Support :
UAC Window
Screen Recording can be done in C# using VLC API. I have done a sample program to demonstrate this. It uses LibVLCSharp and VideoLAN.LibVLC.Windows libraries. You could achieve many more features related to video rendering using this cross platform API.
For API documentation see: LibVLCSharp API Github
using System;
using System.IO;
using System.Reflection;
using System.Threading;
using LibVLCSharp.Shared;
namespace ScreenRecorderNetApp
class Program
static void Main(string[] args)
using (var libVlc = new LibVLC())
using (var mediaPlayer = new MediaPlayer(libVlc))
var media = new Media(libVlc, "screen://", FromType.FromLocation);
This might not be the fastest method, but it is leightweight and easy to use. The image is returned as an integer array containing the RGB colors.
#include <Windows.h>
int* screenshot(int& width, int& height) {
HDC hdc = GetDC(NULL); // get the desktop device context
HDC cdc = CreateCompatibleDC(hdc); // create a device context to use yourself
height = (int)GetSystemMetrics(SM_CYVIRTUALSCREEN); // get the width and height of the screen
width = 16*height/9; // only capture left monitor for dual screen setups, for both screens use (int)GetSystemMetrics(SM_CXVIRTUALSCREEN);
HBITMAP hbitmap = CreateCompatibleBitmap(hdc, width, height); // create a bitmap
SelectObject(cdc, hbitmap); // use the previously created device context with the bitmap
bmi.biSize = sizeof(BITMAPINFOHEADER);
bmi.biPlanes = 1;
bmi.biBitCount = 32;
bmi.biWidth = width;
bmi.biHeight = -height; // flip image upright
bmi.biCompression = BI_RGB;
bmi.biSizeImage = 3*width*height;
BitBlt(cdc, 0, 0, width, height, hdc, 0, 0, SRCCOPY); // copy from desktop device context to bitmap device context
ReleaseDC(NULL, hdc);
int* image = new int[width*height];
GetDIBits(cdc, hbitmap, 0, height, image, (BITMAPINFO*)&bmi, DIB_RGB_COLORS);
return image;
The above code combines this answer and this answer.
Example on how to use it:
int main() {
int width=0, height=0;
int* image = screenshot(width, height);
// access pixel colors for position (x|y)
const int x=0, y=0;
const int color = image[x+y*width];
const int red = (color>>16)&255;
const int green = (color>> 8)&255;
const int blue = color &255;
delete[] image;
i myself do it with directx and think it's as fast as you would want it to be. i don't have a quick code sample, but i found this which should be useful. the directx11 version should not differ a lot, directx9 maybe a little more, but thats the way to go
DXGI Desktop Capture
Project that captures the desktop image with DXGI duplication. Saves the captured image to the file in different image formats (*.bmp; *.jpg; *.tif).
This sample is written in C++. You also need some experience with DirectX (D3D11, D2D1).
What the Application Can Do
If you have more than one desktop monitor, you can choose.
Resize the captured desktop image.
Choose different scaling modes.
You can show or hide the mouse icon in the output image.
You can rotate the image for the output picture, or leave it as default.
I realize the following suggestion doesn't answer your question, but the simplest method I have found to capture a rapidly-changing DirectX view, is to plug a video camera into the S-video port of the video card, and record the images as a movie. Then transfer the video from the camera back to an MPG, WMV, AVI etc. file on the computer.
Enables apps to capture environments, application windows, and displays in a secure, easy to use way with the use of a system picker UI control.

Qt: vsync - missing rendered frames

for a scientific task, flickering areas with a stable frequency (max. 60 Hz), shall be displayed on the screen. I tried to achieve a stable stimulus visualization using Qt 5.6.
According to this blog entry and many other online recommendations, I realized three different approaches: Inheriting from QWindow Class, QOpenGLWindow Class and QRasterWindow Class. I wanted to get the advantage of vsync and avoid the usage of QTimer.
The flickering area can be displayed. Also a stable time period between the frames has been measured with 16 up to 17 ms.
But every few seconds some missed frames are spotted. It can be seen very clearly that there is no stable visualization of the stimulus. The same effect occurs on all three approaches.
Have I done the implementation of my code properly or do better solutions exist? If the code is adequate for its purpose do I have to assume that it is a hardware problem? Could it be that difficult then, to display a simple flickering area?
Thank you very much for helping me!
As Example you can see my code for QWindow Class here:
Window::Window(QWindow *parent)
: m_context(0)
, m_paintDevice(0)
, m_bFlickerState(true){
QSurfaceFormat format;
The render() function, which is called by overwritten event functions, is:
void Window::render(){
//calculating exposed time between frames
m_t1 = QTime::currentTime();
int curDelta = m_t0.msecsTo(m_t1);
m_t0 = m_t1;
qDebug()<< curDelta;
if (!m_paintDevice)
m_paintDevice = new QOpenGLPaintDevice;
if (m_paintDevice->size() != size())
QPainter p(m_paintDevice);
// draw using QPainter
m_bFlickerState = !m_bFlickerState;
// animate continuously: schedule an update
QCoreApplication::postEvent( this, new QEvent(QEvent::UpdateRequest));}
I got help of some experts from the qt-forum. You can follow the whole discussion here. At the end, this was the result:
V-sync is hard ;) Basically it's fighting with the inherent noisiness of the system. If the output shows 16-17 ms then that's the problem. 17 ms is too much. That's the skipping you see.
Couple of things to reduce that noise:
Don't do I/O in the render loop! qDebug()is I/O and it can block on all kinds of buffering shenanigans.
Testing V-sync under a debugger is useless. Debugging introduces all kinds of noise into your app. You should be testing v-sync in Release mode without debugger attached.
try not to use signals/slots/events if you can help it. They can be noisy i.e. call update() manually at the end of paintGL. You skip some overhead this way (not much but every bit counts).
If all you need is a flickering screen avoid QPainter. It's not exactly slow, but drop into the begin() method of it and see how much it actually does. OpenGL has fast, dedicated facilities to fill the buffer with a color. You might as well use it.
Not directly related, but it will make your code cleaner:
Use QElapsedTimer instead of manually calculating time intervals. Why re-invent the wheel.
Applying these bits I was able to remove the skipping from your example. Note that the skipping will occur in some circumstances, e.g. when you move/resize the window or when OS/other apps are busy doing something . You have no control over that.

Problem with CreateDC and wglMakeCurrent

PIXELFORMATDESCRIPTOR pfd = { /* otherwise fine for a window with 32-bit color */ };
HDC hDC = CreateDC(TEXT("Display"),NULL,NULL,NULL); // always OK
int ipf = ChoosePixelFormat(hDC,&pfd); // always OK
SetPixelFormat(hDC,ipf,&pfd); // always OK
HGLRC hRC = wglCreateContext(hDC); // always OK
wglMakeCurrent(hDC,hRC); // ! read error: 0xbaadf039 (debug, obviously)
But the following works with the same hRC:
The above is part of an OpenGL 3.0+ initialization system for Windows.
I am trying to avoid creating a dummy window for the sake of aesthetics.
I have never used CreateDC before, so perhaps I've missed something.
edit: hSomeWindowDC would point to a window DC with an appropriate pixel format.
More info:
I wish to create a window-independent OpenGL rendering context.
Due to the answer selected, it seems I need to use a dummy window (not really a big deal, just a handle to pass around all the same).
Why I would want to do this: Since it is possible to use the same rendering context for multiple windows with the same pixel format in the same thread, it is possible to create a rendering context (really, just a container for gl-related objects) that is independent of a particular window. In this way, one can create a clean separation between the graphics and UI initializations.The purpose of the context initially isn't for rendering (although I believe one could render into textures using it). If one wanted to change the contents of a buffer within a particular context, the desired context object itself would just need to be made current (since it's carrying the dummy window around with it, this is possible). Rendering into a window is simple: As implied by the above, the window's DC only needs to have the same pixel format. Simply make the rendering context and the window's DC current, and render.Please note that, at the time of this writing, this idea is still in testing. I will update this post should this change (or if I can remember :P ).
I've got a dormant brain cell from reading Petzold 15 years ago that just sprang back to life. The DC from CreateDC() is restricted. Good for getting info about the display device, measurement, that sort of stuff. Not good to use as a regular painting DC. You almost certainly need GetDC().
My current OpenGL 3+ initialization routine doesn't require a dummy window. You can simply attempt to make a second RC and make it current using the DC of the real window. Take a look at the OpenGL wiki Tutorial: OpenGL 3.1 The First Triangle (C++/Win)