Why do DirectX fullscreen applications give black screenshots? - c++

You may know that trying to capture DirectX fullscreen applications the GDI way (using BitBlt()) gives a black screenshot.
My question is rather simple but I couldn't find any answer: why? I mean technically, why does it give a black screenshot?
I'm reading a DirectX tutorial here: http://www.directxtutorial.com/Lesson.aspx?lessonid=9-4-1. It's written:
[...] the function BeginScene() [...] does something called locking, where the buffer in the video RAM is 'locked', granting you exclusive access to this memory.
Is this the reason? VRAM is locked so GDI can't access it and it gives a black screenshot?
Or is there another reason? Like DirectX directly "talks" to the graphic card and GDI doesn't get it?
Thank you.

The reason is simple: performance.
The idea is to render a scene as much as possible on the GPU out of lock-step with the CPU. You use the CPU to send the rendering buffers to the GPU (vertex, indices, shaders etc), which is overall really cheap because they're small, then you do whatever you want, physics, multiplayer sync etc. The GPU can just crunch the data and render it on its own.
If you require the scene to be drawn on the window, you have to interrupt the GPU, ask for the rendering buffer bytes (LockRect), ask for the graphics object for the window (more interference with the GPU), render it and free every lock. You just lost any sort of gain you had by rendering on the GPU out of sync with the CPU. Even worse when you think of all the different CPU cores just sitting idle because you're busy "rendering" (more like waiting on buffer transfers).
So what graphics drivers do is they paint the rendering area with a magic color and tell the GPU the position of the scene, and the GPU takes care of overlaying the scene over the displayed screen based on the magic color pixels (sort of a multi-pass pixel shader that takes from the 2nd texture when the 1st texture has a certain color for x,y, but not that slow). You get completely out of sync rendering, but when you ask the OS for its video memory, you get the magic color where the scene is because that's what it actually uses.
Reference: http://en.wikipedia.org/wiki/Hardware_overlay

I believe it is actually due to double buffering. I'm not 100% sure but that was actually the case when I tested screenshots in OpenGL. I would notice that the DC on my window was not the same. It was using two different DC's for this one game.. For other games I wasn't sure what it was doing. The DC was the same but swapbuffers was called so many times that I don't think GDI was even fast enough to screenshot it.. Sometimes I would get half a screenshot and half black..
However, when I hooked into the client, I was able to just ask for the pixels like normal. No GDI or anything. I think there is a reason why we don't use GDI when drawing in games that use DirectX or OpenGL..
You can always look at ways to capture the screen here:http://www.codeproject.com/Articles/5051/Various-methods-for-capturing-the-screen
Anyway, I use the following for grabbing data from DirectX:
HRESULT DXI_Capture(IDirect3DDevice9* Device, const char* FilePath)
{
IDirect3DSurface9* RenderTarget = nullptr;
HRESULT result = Device->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &RenderTarget);
result = D3DXSaveSurfaceToFile(FilePath, D3DXIFF_PNG, RenderTarget, nullptr, nullptr);
SafeRelease(RenderTarget);
return result;
}
Then in my hooked Endscene I call it like so:
HRESULT Direct3DDevice9Proxy::EndScene()
{
DXI_Capture(ptr_Direct3DDevice9, "C:/Ssers/School/Desktop/Screenshot.png");
return ptr_Direct3DDevice9->EndScene();
}
You can either use microsoft detours for hooking EndScene of some external application or you can use a wrapper .dll.

Related

How to best render to a window, when pixels could be written directly to display buffer?

I have used OpenGL pretty exclusively for all my rendering, to the point that I'm unaware of any other way to write pixels to a window unfortunately.
This is a problem because my current project is a work tool that emulates an LCD display (pixel perfect, 2D, very few pixels are touched each frame, all 'drawing' can be done with memcpy() to a pixel buffer) and I feel that OpenGL might be too heavy for this, but I could absolutely be wrong in that assumption.
My goal is to borrow as little CPU time as possible. What's the best way to draw pixels to a window, in this limited way, on a modern typical machine running windows 10 circa 2019? Is OpenGL suited for this type of rendering, or should I adopt another rendering method in this case? And if so, what would that method be?
edit:
I should also mention, OpenGL can be used right away for me. If rendering textured triangles with an optimal setup is the fastest method, then I can already do that. Anything that just acts as an API over OpenGL or DirectX will likely be worse in my case.
edit2:
After some research, and thanks to the comments, I think I may just use OpenGL with Pixel Buffer Objects to optimize pixel uploads and keep rendering inexpensive.

Real time drawing in GDI

I'm currently writing a 3D renderer (for fun and research), so I need a way to draw my framebuffer to a window. Since I'm doing all of my calculations on CPU, the drawing needs to be as fast as possible.
One of my goals is to use no existing graphics library (OpenGL/DirectX) so the drawing to the screen is pure Win32. In my research I've found a couple of ways to create and draw bitmaps and now I'm looking for the best one.
My current implementation uses a bitmap created with CreateDIBSection(), which is drawn to my window DC using BitBlt().
CreateDIBSection() give me a pointer to my bitmap bytes so I can manipulate it without copying. Using this method I achieve an update rate of about 260 FPS (without any rendering done).
This seems a bit slow, so I'm looking for optimizations.
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
How can I make sure my DIB bitmap and window are compatible?
Are there methods of drawing an bitmap which are faster than my current implementation?
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
Very few systems run in a palette mode any more, so it seems unlikely this is an issue for you.
Aside from palettes, some GDI functions also cause a color matching conversion to be applied if the source bitmap and the destination have different gamuts. BitBlt, however, does not do this type of color matching, so you're not paying a price for that.
How can I make sure my DIB bitmap and window are compatible?
You don't. You can use DIBs (which are Device-Independent Bitmaps) or compatible (device-dependent) bitmaps. It's possible that your DIB bitmap matches the current mode of your device. For example, if you're using a 32 bpp DIB, and your display is in that same mode, then no conversion is necessary. If you want a bitmap that's guaranteed to be in the same mode as your device, then you can't use a DIB and all the nice properties it provides for predictable pixel layout and format.
Are there methods of drawing an bitmap which are faster than my current implementation?
The limitation is most likely in getting the data from system memory to graphics adapter memory. To get around that limitation, you need a faster graphics bus, or you need to render directly into graphic memory, which means you'd need to do your computation on the GPU rather than the CPU.
If you're rendering a 1920 x 1080 pixel image at 24 bits per pixel, that's close to 6 MB for your frame buffer. That's an awful lot of data. If you're doing that 260 times per second, that's actually pretty impressive.
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
It's conceivable, but the only way to know would be to measure it. And the results might vary from machine to machine because of differences in the graphics adapter (and which bus they use).

Perfect V-sync implementation for a lightweight OpenGL game: need one tidbit of information

In the game our Internet-assembled team is programming, we're assuming everybody from our audience will have WAY over fullspeed in the game.
So, to save video RAM, and hopefully give a little more idle time to the graphics card, using V-sync without double buffering would be our best option. So, in OpenGL, we need to know how to do that.
From my understanding, V-sync is when the graphics card is paused once it's done rendering a single frame until that frame has finished being sent to the display device. Double buffering doesn't pause render operations (or maybe it does, or maybe it's implementation-specific; not sure), because it instead draws to a second buffer before copying to the framebuffer, so that the monitor either gets the full frame or no new frame at all (specifically, the last stored image in the framebuffer). Well, we don't need that feature, as long as the graphics card just writes to the framebuffer ONLY when it damn needs to.
This is a pretty slow online game (But it's VERY creative ^_^). There's very little realtime action. Therefore, extremely precise user input is not a necessity; it can be captured from the OS as a single unit any time before rendering a frame.
So, in order to do EXACTLY this, I need to be able to get a "Frame has finished sending to monitor" message from OpenGL. Is it possible? If not, what is the best alternative?
The game is being programmed for Windows only at the moment but should have work done for Linux in a few months.
You suffer from a misconception what V-Sync does. There's a part in video RAM that's continously sent to the display device at a constant rate, the frame refresh rate. So immediately after a full frame has been sent the next frame gets sent, after a very short blank time. But the time between sending frames is far shorter than the time it takes to send the full frame.
What happens without V-Sync is, that operations on the contents of the framebuffer get visible, for example if the frame is filled alternating with red and green and there's no V-Sync you'll see red and green bands on the monitor. To avoid this, V-Sync swaps the pointer the display driver uses to access the framebuffer just after a full frame has been sent.
Which brings us to what doublebuffering does. Without doublebuffering there's little use for a V-Sync. The action triggered by V-Sync must happen very, very fast. So this boils down to swapping a pointer or a very fast blitting operation (potentially by simply setting CoW attributes for the GPU's MMU).
Without doublebuffering and no V-Sync the effect is, that one can see the process in which the picture is rendered piece by piece to the framebuffer. Of course if rendering happens faster than a frame period this has the effect that top-down you'll see a only sparsely populated image with more and more content being visible toward the bottem, and somewhere inbetween it'll hit the lower screen edge, wapping around to the top. The intersection line will be moving.
TL;DR: Just use double buffering and enable V-Sync for buffer swap. Don't be afraid of memory consumption. All GPUs in circulation today have more than enough RAM to easily provide the memory for doublebuffered colour planes. Just do the math: 1920x1200 * RGB = 6MiB, even the smallest GPUs in PCs today deliver at least 128MiB of RAM. Mobile devices, let's say iPad 1024*768 * RGB = 2MiB vs. 32MiB for graphics. The UI of the iPad is doublebuffered anyway.
You can use wglGetProcAddress to get the address of wglSwapIntervalEXT, and then call wglSwapIntervalEXT(1); to synchronize updates with the vertical synch. When you do this, you don't get a message at the vertical synch -- instead glFlush simply doesn't return until a vertical retrace has happened, and the screen has been updated. So, you have a WM_PAINT handler that looks something like this:
BeginPaint
wglMakeCurrent
do drawing
glFlush
EndPaint
The glFlush is needed in any case, to ensure the drawing you've done gets sent to the screen.

Which of these is faster?

I was wondering if it was faster to render a single quad the size of the window with a texture the size of a window than to draw the bitmap directly to the window using double buffering coupled with the platform specific way of drawing to a window.
The initial setup for textures tends to be relatively slow, but once that's done the drawing is quite fast -- in a typical case where graphics memory is available, it'll upload the texture to the memory on the graphics cards during initial setup, and after that, all the drawing will happen from there. At the same time, that initial upload will also typically include full a full mipmap down to 1x1 resolution, so you're uploading a bit more than just the full-resolution texture.
With platform specific drawing, you usually don't have quite as much work up-front. If only part of the bitmap is visible, only the visible part will be uploaded. If the bitmap is going to be scaled, it'll typically scale it on the CPU and send it to the card at the current scale (and never upload anything resembling a mipmap). OTOH, virtually every time something needs to be redrawn, it'll end up re-sending the bitmap data for the newly exposed area. It doesn't take much of that to lose the (often minor anyway) advantage of minimizing what was sent to start with.
Using textures is usually a lot faster, since most native drawing APIs aren't hardware accelerated.
It will very probably depend on the graphics card and driver.

How does Photoshop (Or drawing programs) blit?

I'm getting ready to make a drawing application in Windows. I'm just wondering, do drawing programs have a memory bitmap which they lock, then set each pixel, then blit?
I don't understand how Photoshop can move entire layers without lag or flicker without using hardware acceleration. Also in a program like Expression Design, I could have 200 shapes and move them around all at once with no lag. I'm really wondering how this can be done without GPU help.
Also, I don't think super efficient algorithms could justify that?
Look at this question:
Reduce flicker with GDI+ and C++
All you can do about DC drawing without GPU is to reduce flickering. Anything else depends on the speed of filling your memory bitmap. And here you can use efficient algorithms, multithreading and whatever you need.
Certainly modern Photoshop uses GPU acceleration if available. Another possible tool is DMA. You may also find it helpful to read the source code of existing programs like GIMP.
Double (or more) buffering is the way it's done in games, where we're drawing a ton of crap into a "back" buffer while the "front" buffer is being displayed. Then when the draw is done, the buffers are swapped (a pointer swap, not copies!) and the process continues in the new front and back buffers.
Triple buffering offers another bonus, in that you can start drawing two-frames-from-now when next-frame is done, but without forcing a buffer swap in the middle of the screen refresh. Many games do the buffer swap in the middle of the refresh, but you can sometimes see it as visible artifacts (tearing) on the screen.
Anyway- for an app drawing bitmaps into a window, if you've got some "slow" operation, do it into a not-displayed buffer while presenting the displayed version to the rendering API, e.g. GDI. Let the system software handle all of the fancy updating.