Draw on DeviceContext from COLORREF[] - c++

I have a pointer to a COLORREF buffer, something like: COLORREF* buf = new COLORREF[x*y];
A subroutine fills this buffer with color-information. Each COLORREF represents one pixel.
Now I want to draw this buffer to a device context. My current approach works, but is pretty slow (== ~200ms per drawing, depending on the size of the image):
for (size_t i = 0; i < pixelpos; ++i)
{
// Get X and Y coordinates from 1-dimensional buffer.
size_t y = i / wnd_size.cx;
size_t x = i % wnd_size.cx;
::SetPixelV(hDC, x, y, buf[i]);
}
Is there a way to do this faster; all at once, not one pixel after another?
I am not really familiar with the GDI. I heard about al lot of APIs like CreateDIBitmap(), BitBlt(), HBITMAP, CImage and all that stuff, but have no idea how to apply it. It seems all pretty complicated...
MFC is also welcome.
Any ideas?
Thanks in advance.
(Background: the subroutine I mentioned above is an OpenCL kernel - the GPU calculates an Mandelbrot image and safes it in the COLORREF buffer.)
EDIT:
Thank you all for your suggestions. The answers (and links) gave me some insight into Windows graphics programming. The performance is now acceptable (semi-realtime-scrolling into the Mandelbrot works :)
I ended up with the following solution (MFC):
...
CDC dcMemory;
dcMemory.CreateCompatibleDC(pDC);
CBitmap mandelbrotBmp;
mandelbrotBmp.CreateBitmap(clientRect.Width(), clientRect.Height(), 1, 32, buf);
CBitmap* oldBmp = dcMemory.SelectObject(&mandelbrotBmp);
pDC->BitBlt(0, 0, clientRect.Width(), clientRect.Height(), &dcMemory, 0, 0, SRCCOPY);
dcMemory.SelectObject(oldBmp);
mandelbrotBmp.DeleteObject();
So basically CBitmap::CreateBitmap() saved me from using the raw API (which I still do not fully understand). The example in the documentation of CDC::CreateCompatibleDC was also helpful.
My Mandelbrot is now blue - using SetPixelV() it was red. But I guess that has something to do with CBitmap::CreateBitmap() interpreting my buffer, not really important.
I might try the OpenGL suggestion because it would have been the much more logical choice and I wanted to try OpenCL under Linux anyway.

Under the circumstances, I'd probably use a DIB section (which you create with CreateDIBSection). A DIB section is a bitmap that allows you to access the contents directly as an array, but still use it with all the usual GDI functions.
I think that'll give you the best performance of anything based on GDI. If you need better, then #Kornel is basically correct -- you'll need to switch to something that has more direct support for hardware acceleration (DirectX or OpenGL -- though IMO, OpenGL is a much better choice for the job than DirectX).
Given that you're currently doing the calculation in OpenCL and depositing the output in a color buffer, OpenGL would be the really obvious choice. In particular, you can have OpenCL deposit the output in an OpenGL texture, then you have OpenGL draw a quad using that texture. Alternatively, since you're just putting the output on screen anyway, you could just do the calculation in an OpenGL fragment shader (or, of course, a DirectX pixel shader), so you wouldn't put the output into memory off-screen just so you can copy the result onto the screen. If memory serves, the Orange book has a Mandelbrot shader as one of its examples.

Yes, sure, that's slow. You are making a round-trip through the kernel and video device driver for each individual pixel. You make it fast by drawing to memory first, then update the screen in one fell swoop. That takes, say, CreateDIBitmap, CreateCompatibleDc() and BitBlt().
This isn't a good time and place for an extensive tutorial on graphics programming. It is well covered by any introductory text on GDI and/or Windows API programming. Everything you'll need to know you can find in Petzold's seminal Programming Windows.

Since you already have an array of pixels, you can directly use BitBlt to transfer it to the window's DC. See this link for a partial example:
http://msdn.microsoft.com/en-us/library/aa928058.aspx

Related

How to best render to a window, when pixels could be written directly to display buffer?

I have used OpenGL pretty exclusively for all my rendering, to the point that I'm unaware of any other way to write pixels to a window unfortunately.
This is a problem because my current project is a work tool that emulates an LCD display (pixel perfect, 2D, very few pixels are touched each frame, all 'drawing' can be done with memcpy() to a pixel buffer) and I feel that OpenGL might be too heavy for this, but I could absolutely be wrong in that assumption.
My goal is to borrow as little CPU time as possible. What's the best way to draw pixels to a window, in this limited way, on a modern typical machine running windows 10 circa 2019? Is OpenGL suited for this type of rendering, or should I adopt another rendering method in this case? And if so, what would that method be?
edit:
I should also mention, OpenGL can be used right away for me. If rendering textured triangles with an optimal setup is the fastest method, then I can already do that. Anything that just acts as an API over OpenGL or DirectX will likely be worse in my case.
edit2:
After some research, and thanks to the comments, I think I may just use OpenGL with Pixel Buffer Objects to optimize pixel uploads and keep rendering inexpensive.

Why is glReadPixels so slow and are there any alternative?

I need to take sceenshots at every frame and I need very high performance (I'm using freeGlut). What I figured out is that it can be done like this inside glutIdleFunc(thisCallbackFunction)
GLubyte *data = (GLubyte *)malloc(3 * m_screenWidth * m_screenHeight);
glReadPixels(0, 0, m_screenWidth, m_screenHeight, GL_RGB, GL_UNSIGNED_BYTE, data);
// and I can access pixel values like this: data[3*(x*512 + y) + color] or whatever
It does work indeed but I have a huge issue with it, it's really slow. When my window is 512x512 it runs no faster than 90 frames per second when only cube is being rendered, without these two lines it runs at 6500 FPS! If we compare it to irrlicht graphics engine, there I can do this
// irrlicht code
video::IImage *screenShot = driver->createScreenShot();
const uint8_t *data = (uint8_t*)screenShot->lock();
// I can access pixel values from data in a similar manner here
and 512x512 window runs at 400 FPS even with a huge mesh (Quake 3 Map) loaded! Take into account that I'm using openGL as driver inside irrlicht. To my inexperienced eye it seems like glReadPixels is copying every pixel data from one place to another while (uint8_t*)screenShot->lock() is just copying a pointer to already existent array. Can I do something similar to latter using freeGlut? I expect it to be faster than irrlicht.
Note that irrlicht uses openGL too (well it offers directX and other options as well but in the example I gave above I used openGL and by the way it was the fastest compared to other options)
OpenGL methods are used to manage the rendering pipeline. In its nature, while the graphics card is showing image to the viewer, computations of the next frame are being done. When you call glReadPixels; graphics card wait for the current frame to be done, reads the pixels and then starts computing the next frame. Therefore pipeline becomes stalled and becomes sequential.
If you can hold two buffers and tell to the graphics card to read data into these buffers interchanging each frame; then you can read-back from your buffer 1-frame late but without stalling the pipeline. This is called double buffering. You can also do triple buffering with 2 frame late read-back and so on.
There is a relatively old web page describing the phenomenon and implementation here: http://www.songho.ca/opengl/gl_pbo.html
Also there are a lot of tutorials about framebuffers and rendering into a texture on the web. One of them is here: http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/

Why do DirectX fullscreen applications give black screenshots?

You may know that trying to capture DirectX fullscreen applications the GDI way (using BitBlt()) gives a black screenshot.
My question is rather simple but I couldn't find any answer: why? I mean technically, why does it give a black screenshot?
I'm reading a DirectX tutorial here: http://www.directxtutorial.com/Lesson.aspx?lessonid=9-4-1. It's written:
[...] the function BeginScene() [...] does something called locking, where the buffer in the video RAM is 'locked', granting you exclusive access to this memory.
Is this the reason? VRAM is locked so GDI can't access it and it gives a black screenshot?
Or is there another reason? Like DirectX directly "talks" to the graphic card and GDI doesn't get it?
Thank you.
The reason is simple: performance.
The idea is to render a scene as much as possible on the GPU out of lock-step with the CPU. You use the CPU to send the rendering buffers to the GPU (vertex, indices, shaders etc), which is overall really cheap because they're small, then you do whatever you want, physics, multiplayer sync etc. The GPU can just crunch the data and render it on its own.
If you require the scene to be drawn on the window, you have to interrupt the GPU, ask for the rendering buffer bytes (LockRect), ask for the graphics object for the window (more interference with the GPU), render it and free every lock. You just lost any sort of gain you had by rendering on the GPU out of sync with the CPU. Even worse when you think of all the different CPU cores just sitting idle because you're busy "rendering" (more like waiting on buffer transfers).
So what graphics drivers do is they paint the rendering area with a magic color and tell the GPU the position of the scene, and the GPU takes care of overlaying the scene over the displayed screen based on the magic color pixels (sort of a multi-pass pixel shader that takes from the 2nd texture when the 1st texture has a certain color for x,y, but not that slow). You get completely out of sync rendering, but when you ask the OS for its video memory, you get the magic color where the scene is because that's what it actually uses.
Reference: http://en.wikipedia.org/wiki/Hardware_overlay
I believe it is actually due to double buffering. I'm not 100% sure but that was actually the case when I tested screenshots in OpenGL. I would notice that the DC on my window was not the same. It was using two different DC's for this one game.. For other games I wasn't sure what it was doing. The DC was the same but swapbuffers was called so many times that I don't think GDI was even fast enough to screenshot it.. Sometimes I would get half a screenshot and half black..
However, when I hooked into the client, I was able to just ask for the pixels like normal. No GDI or anything. I think there is a reason why we don't use GDI when drawing in games that use DirectX or OpenGL..
You can always look at ways to capture the screen here:http://www.codeproject.com/Articles/5051/Various-methods-for-capturing-the-screen
Anyway, I use the following for grabbing data from DirectX:
HRESULT DXI_Capture(IDirect3DDevice9* Device, const char* FilePath)
{
IDirect3DSurface9* RenderTarget = nullptr;
HRESULT result = Device->GetBackBuffer(0, 0, D3DBACKBUFFER_TYPE_MONO, &RenderTarget);
result = D3DXSaveSurfaceToFile(FilePath, D3DXIFF_PNG, RenderTarget, nullptr, nullptr);
SafeRelease(RenderTarget);
return result;
}
Then in my hooked Endscene I call it like so:
HRESULT Direct3DDevice9Proxy::EndScene()
{
DXI_Capture(ptr_Direct3DDevice9, "C:/Ssers/School/Desktop/Screenshot.png");
return ptr_Direct3DDevice9->EndScene();
}
You can either use microsoft detours for hooking EndScene of some external application or you can use a wrapper .dll.

Real time drawing in GDI

I'm currently writing a 3D renderer (for fun and research), so I need a way to draw my framebuffer to a window. Since I'm doing all of my calculations on CPU, the drawing needs to be as fast as possible.
One of my goals is to use no existing graphics library (OpenGL/DirectX) so the drawing to the screen is pure Win32. In my research I've found a couple of ways to create and draw bitmaps and now I'm looking for the best one.
My current implementation uses a bitmap created with CreateDIBSection(), which is drawn to my window DC using BitBlt().
CreateDIBSection() give me a pointer to my bitmap bytes so I can manipulate it without copying. Using this method I achieve an update rate of about 260 FPS (without any rendering done).
This seems a bit slow, so I'm looking for optimizations.
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
How can I make sure my DIB bitmap and window are compatible?
Are there methods of drawing an bitmap which are faster than my current implementation?
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
Very few systems run in a palette mode any more, so it seems unlikely this is an issue for you.
Aside from palettes, some GDI functions also cause a color matching conversion to be applied if the source bitmap and the destination have different gamuts. BitBlt, however, does not do this type of color matching, so you're not paying a price for that.
How can I make sure my DIB bitmap and window are compatible?
You don't. You can use DIBs (which are Device-Independent Bitmaps) or compatible (device-dependent) bitmaps. It's possible that your DIB bitmap matches the current mode of your device. For example, if you're using a 32 bpp DIB, and your display is in that same mode, then no conversion is necessary. If you want a bitmap that's guaranteed to be in the same mode as your device, then you can't use a DIB and all the nice properties it provides for predictable pixel layout and format.
Are there methods of drawing an bitmap which are faster than my current implementation?
The limitation is most likely in getting the data from system memory to graphics adapter memory. To get around that limitation, you need a faster graphics bus, or you need to render directly into graphic memory, which means you'd need to do your computation on the GPU rather than the CPU.
If you're rendering a 1920 x 1080 pixel image at 24 bits per pixel, that's close to 6 MB for your frame buffer. That's an awful lot of data. If you're doing that 260 times per second, that's actually pretty impressive.
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
It's conceivable, but the only way to know would be to measure it. And the results might vary from machine to machine because of differences in the graphics adapter (and which bus they use).

Simple bitmap graphics in c++?

I'm a Flash programmer and I'm currently exploring C++. In flash, you can create a bitmap and place it on the screen, and then use methods like getPixel(x, y), setPixel(x, y, c) ect. Press ctrl+enter and you can get started with what ever you want to do.
I use Visual C++ 2010. Since I've used Flash alot I'm used to simple and short commands. In C++ though, it's harder to figure out how to get a bitmap where you can manipulate pixels.
I don't know much about graphics enginges or 3D engines, it would be very useful information, but first I'd like to see what I can create with pixels, so do you know a simple way to create a manipulatable bitmap in C++? As optimized as possible, then I can write my own drawLine, drawCurve ect functions. :)
Because you mentioned Visual C++ 2010, I will assume that you are using Vista or higher and you want to first draw 2D graphics using the native Windows C++ approach. If this is the case, you want to use Direct2D. You may find old articles that use GDI, but this is the old way, so don't use it. Here is the link to MSDN introducing Direct2D.
If you really want to use C++ you should look at using GDI+ which is the standard windows/VS way of doing graphics. This is a slight highter level (friendlier) api than the older GDI api for doing graphics that dates back to (pre?) MFC days. You will need to get a basic grasp of device contexts etc... in order to understand how to load your bitmap from a file and get it on the screen.
CGI+ allows easy manipulation of a bitmap on a per pixel basis using the LockBits method. It can read most common image formats (bmp, jpg, png etc).
The example code below shows a typical load bitmap and read some pixels type code (it is taken verbatim from this msdn gdi+ article
#include <windows.h>
#include <gdiplus.h>
#include <stdio.h>
using namespace Gdiplus;
INT main() {
GdiplusStartupInput gdiplusStartupInput;
ULONG_PTR gdiplusToken;
GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL);
Bitmap* bitmap = new Bitmap(L"LockBitsTest1.bmp");
BitmapData* bitmapData = new BitmapData;
Rect rect(20, 30, 5, 3);
// Lock a 5x3 rectangular portion of the bitmap for reading.
bitmap->LockBits(
&rect,
ImageLockModeRead,
PixelFormat32bppARGB,
bitmapData);
printf("The stride is %d.\n\n", bitmapData->Stride);
// Display the hexadecimal value of each pixel in the 5x3 rectangle.
UINT* pixels = (UINT*)bitmapData->Scan0;
for(UINT row = 0; row < 3; ++row) {
for(UINT col = 0; col < 5; ++col)
printf("%x\n", pixels[row * bitmapData->Stride / 4 + col]);
printf("- - - - - - - - - - \n");
}
bitmap->UnlockBits(bitmapData);
delete bitmapData;
delete bitmap;
GdiplusShutdown(gdiplusToken);
return 0;
}
As for draw line, draw curve etc routines - these are all found on the Graphics object in GDI+. Its the main object that sits between your code and the screen. A Graphics object would be used to render the above bitmap using Graphics.DrawImage.
Well the answer that is about to come is probably one you won't like, but I'll give it anyways :)
C++ has no notion of graphics output at all. Luckily, we have gotten a standardized way of printing text to the screen - but thats it. Really. No joking here.
However, most operating systems provide a means of graphics output (mostly via a C interface, as C is binary compatible to almost everything), and there are also C++ wrapper libraries that incorporate access to those. Dealing with OS issues is, however, a very different beast than C++ is (and thats kinda beast, I can tell ;).
Luckily, if you don't want to/have to understand the full story, I can recommend Qt (http://qt.nokia.com/products/) as a very decent library for GUI programming. And once you have a GUI window, you will also have the possibility to draw a bitmap into that GUI (I believe Qt has direct bitmap support and also the possibility to load images).
However, I will have to refer you to the tutorial docs of Qt, as even a simple introduction would go way over what would be a still understandable forum answer.
And this, I'm afraid, will be as easy as it can get drawing something onto the screen using C++. Nowhere near the simplicity of Flash for that purpose but in the end way more powerful.
Good luck.
You need external library to read images from file. Two libraries I like are SDL_image and stb_image. Both give you image in a format that lets you access and manipulate pixels.
Also to display the image you will need external library. SDL is popular and simple one.
Use PixelToaster if you just want easy access to a framebuffer.