Real time drawing in GDI - c++

I'm currently writing a 3D renderer (for fun and research), so I need a way to draw my framebuffer to a window. Since I'm doing all of my calculations on CPU, the drawing needs to be as fast as possible.
One of my goals is to use no existing graphics library (OpenGL/DirectX) so the drawing to the screen is pure Win32. In my research I've found a couple of ways to create and draw bitmaps and now I'm looking for the best one.
My current implementation uses a bitmap created with CreateDIBSection(), which is drawn to my window DC using BitBlt().
CreateDIBSection() give me a pointer to my bitmap bytes so I can manipulate it without copying. Using this method I achieve an update rate of about 260 FPS (without any rendering done).
This seems a bit slow, so I'm looking for optimizations.
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
How can I make sure my DIB bitmap and window are compatible?
Are there methods of drawing an bitmap which are faster than my current implementation?
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?

I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
Very few systems run in a palette mode any more, so it seems unlikely this is an issue for you.
Aside from palettes, some GDI functions also cause a color matching conversion to be applied if the source bitmap and the destination have different gamuts. BitBlt, however, does not do this type of color matching, so you're not paying a price for that.
How can I make sure my DIB bitmap and window are compatible?
You don't. You can use DIBs (which are Device-Independent Bitmaps) or compatible (device-dependent) bitmaps. It's possible that your DIB bitmap matches the current mode of your device. For example, if you're using a 32 bpp DIB, and your display is in that same mode, then no conversion is necessary. If you want a bitmap that's guaranteed to be in the same mode as your device, then you can't use a DIB and all the nice properties it provides for predictable pixel layout and format.
Are there methods of drawing an bitmap which are faster than my current implementation?
The limitation is most likely in getting the data from system memory to graphics adapter memory. To get around that limitation, you need a faster graphics bus, or you need to render directly into graphic memory, which means you'd need to do your computation on the GPU rather than the CPU.
If you're rendering a 1920 x 1080 pixel image at 24 bits per pixel, that's close to 6 MB for your frame buffer. That's an awful lot of data. If you're doing that 260 times per second, that's actually pretty impressive.
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
It's conceivable, but the only way to know would be to measure it. And the results might vary from machine to machine because of differences in the graphics adapter (and which bus they use).

Related

How are pixels drawn at the lowest level

I can use setpixel (GDI) to set any pixel on the screen a colour.
So how would I reproduce Setpixel in in the lowest assembly level. What actually is happening that triggers the instructions that say, ok sens a byte a position x in the framebuffer.
setpixel most probably just calculates address of given pixel using formula:
pixel = (frame_start + y * frame_width) + x
then it simply *pixel = COLOR
You can actually use CreateDIBSection to create your own buffers and associate it with DeviceContext, then you can modify pixels at the low level using formula as above. This is usefull if you have your own graphics library like AGG.
Learning about GDI I like to look into WINE source code, here you can see how complicated it actually is (dibdrv_SetPixel):
http://fossies.org/dox/wine-1.6.1/gdi32_2dibdrv_2graphics_8c_source.html
it must take into account also clipping regions, and also different pixel sizes and probably other features. Also it is possible that some drivers might somehow accelerate this in hardware, but I have not heard of it.
If you want to recreate setpixel you need to know how your graphics hardware works. Most hardware manufacutrres follow at least the VESA standard, see here. This standard specifies that you can set the display mode using interrupt 0x10.
Once the display mode is set the memory region displayed is defined in the standard and you can simply write directly to display memory.
Advanced graphics hardware deviates from the standard (because it only covers the basics). So the above does not work for advanced features. You'll have to resort to the gpu documentation.
The "how" is always depends on "what", what I mean is that for different setups there are different methods, different systems different methods, what is common is that they are usually not allowing you to do it directly i.e. write to a memory address that will be displayed.
Some devices with dedicated setup may allow you to do that ( like some consoles do as far as I know ) but there you will have to do some locking or other utility work to make it work as it should.
Since in modern PCs Graphics Accelerators are fused into the video cards ( one counter example is the Voodoo 1 which needed a video card in order to operate, since it was just a 3D accelerator ) the GPU usually holds the memory that it will draw from the framebuffer in it's own memory making it inaccessible from the outside.
So generally you would say here is a memory address "download" the data into your own GPU memory and show it on screen, and this is where the desktop composition comes in. Since video cards suffer from this transfer all the time it is in fact faster to send the data required to draw and let the GPU do the drawing. So Aero is just a visual but as far as I know the desktop compositor works regardless of Aero making the drawing GPU dependent.
So technically low level functions such as SetPixel are software since Windows7 because of the things I mentioned above so solely because you just cant access the memory directly. So what I think it probably does is that for every HDC there is a bitmap and when you use the set pixel you simply set a pixel in that bitmap that is later sent to the GPU for display.
In case of DOS or other old tech. it is probably just an emulation in the same way it is done for GDI.
So in the light of these,
So how would I reproduce Setpixel in in the lowest assembly level.
it is just probably a copy to a memory location, but windows integrates the window surfaces and it's frambuffer that you will never get direct access. One way to emulate what it does is to make a bitmap get it memory pointer and simple set the pixel manually then tell windows to show this bitmap on screen.
What actually is happening that triggers the instructions that say, ok sens a byte a position x in the framebuffer.
Like I said before it really depends on the environment what is done at the moment you make this call, and the code that needs to be executed comes from different places , some are done by Microsoft some are done by the GPU's manufacturer and these all together produce the result that pixel you see on your screen.
For to set a pixel to the framebuffer using a videomode with 32 bit color we need the address of the pixel and the color of the pixel.
With the address and the color we can simple use a move instruction to set the color to the framebuffer.
Sample with using the EDI-Register as a 32bit addressregister(default segmentregister is DS) for to address the framebuffer with the move instruction.
x86 intel syntax:
mov edi, Framebuffer ; load the address(upper left corner) into the EDI-Register
mov DWORD [edi], Color ; write the color to the address of DS:EDI
The first instruction load the EDI-Register with the address of the framebuffer and the second instruction write the color to the framebuffer.
Hint for to calculate the address of a pixel inside of the frambuffer:
Some Videomodes are using maybe a lorger scanline with more bytes for the horizontal resolution, with a part outside of the visible view.
Dirk

How to create memory DC with 24 bits per pixel?

I need it to work with RGB24 data using GDI functions (specifically StretchBlt() which is pretty fast) and I can't use CreateCompatibleDC() since it can create memory DC only with color depth of other DC. Usually it's used with screen DC (by transmitting NULL pointer to function) and usually screen has color depth of value 32. In addition I can't rely on it, 'coz if screen settings are changed my application probably won't work.
So I need some way to create memory DC with specific certain color depth. So far I've found only one way with using CreateDC() function but it requires many device specific parameters and seems somewhat unreliable for me. Moreover there are too many fields to be filled with appropriate values to call CreateDC().
Is there some easier way to create specific memory DC and not rely on some devices? Or even if to create memory DC with 24 bpp?
P.S. I need it for some fast graphics. I've tried manual adding alpha channel to bitmap for using it with compatible to screen 32bpp memory DC and it worked out, but was too slow. And as I said above, I can't rely on screen settings which can be changed.
Bits-per-pixel does not really depend on a DC, but on the bitmap selected into it. Create a 24bpp bitmap with CreateDIBSection then select it into a memory DC.

C++ GUI Development - Bitmap vs. Vector Graphics CPU Usage

I'm currently in the process of designing and developing GUI's for some audio applications made in C++ (using the Juce framework).
So far I've been playing with using bitmap graphics to create custom sliders and dials, by using 'film strip' style images to animate the components (meaning when the user interacts with a slider it triggers a method that changes the offset of a film-strip image to change the components appearance). Depending on the size of the original image and the number of 'frames', the CPU usage level changes quite dramatically.
Firstly, what would be the most efficient bitmap file format to use in terms of CPU consumption? At the moment I'm using PNG images.
Secondly, would it be more efficient to use vector graphics for these kind of graphical components? I understand the main differences between bitmap and vector graphics, but I haven't found any information regarding their CPU usage levels with regard to GUI interaction.
Or would CPU usage be down to the particular methods/functions/libraries/frameworks being used?
Thanks!
Or would CPU consumption be down to the particular methods/functions/libraries/frameworks being used?
Any of these things could influence it.
Pixel based images might take a while to read off of disk the bigger they are. Compressed types might take more time to uncompress. Vector might take more time to render when are loaded.
That being said, I would definitely not expect that your choice of image type to have any impact on its performance. Since you didn't provide a code example it is hard to speculate beyond that.
In general, you would expect that the run-time costs of the images to happen when they are loaded. So whenever you create an image object. If you create an images all over the place, then maybe its expensive. It is possible that your film strip is recreating the images instead of loading them once and caching them.
Before choosing bitmap vs. vector graphics, investigate if your graphics processor supports vector or bitmap graphics. Some things take a long time to draw as vectors.
Have you tried double-bufferring?
This is where you write to a buffer in memory while the display (graphics processor) is loading another.
Load your bitmaps from the resource once. Store them as memory snapshots to avoid the additional cost of translating them from a format.
Does your graphic processor support "blitting"?
Blitting is where the graphics processor can copy a rectangular area in memory (bitmap) and display it along with apply optional operations before displaying (such as XOR with existing bits).
Summary:
To improve your rendering speed, only convert images from the file into a bitmap form once. Store this somewhere. Refer to this converted bitmap as needed. Next, investigate and implement double buffering. Lastly, investigate and use bit-blitting or blitting.
Other optimization rules apply here too, such as reviewing the design, removing requirements, loop unrolling, passing images via pointer vs. copying them, and reduce "if" statements by using boolean logic and Karnaugh (sp?) maps.
In general, calculations for rendering vector graphics are going to take longer than blitting a rectangular region of a bitmap to the screen. But for basic UI stuff, neither should be particularly intensive.
You probably should do some profiling. Perhaps you're redrawing much more frequently than necessary. Or perhaps the PNG is being decoded each time you try to draw from it. (I'm not familiar with Juce.)
For a straight Windows app, I'd probably render vector graphics into a device-dependent bitmap once on startup and then just blit from the bitmap to the screen. Using vector gives you DPI independence, and blitting from a device-dependent bitmap is about the fastest way to paint a block of pixels. I believe the color matching is done when you render to the device-dependent bitmap, so you don't even have the ICM overhead on the screen drawing.
Vector graphics was ditched long ago - bitmap graphics are more performant. The thing is that you can send a bitmap to the GPU once and then render it forever more by a simple copy.
Secondly, the GPU uses it's own texture compression. DirectX is DXT5, I believe, but when the GPU sees the texture, it doesn't care what you loaded it from.
However, a modern CPU even with a crappy integrated GPU should have absolutely no problem with simple GUI rendering. If you're struggling, then it's time to look again at the technique you're using. Perhaps your framework is slow or your use of it is suboptimal.

Which of these is faster?

I was wondering if it was faster to render a single quad the size of the window with a texture the size of a window than to draw the bitmap directly to the window using double buffering coupled with the platform specific way of drawing to a window.
The initial setup for textures tends to be relatively slow, but once that's done the drawing is quite fast -- in a typical case where graphics memory is available, it'll upload the texture to the memory on the graphics cards during initial setup, and after that, all the drawing will happen from there. At the same time, that initial upload will also typically include full a full mipmap down to 1x1 resolution, so you're uploading a bit more than just the full-resolution texture.
With platform specific drawing, you usually don't have quite as much work up-front. If only part of the bitmap is visible, only the visible part will be uploaded. If the bitmap is going to be scaled, it'll typically scale it on the CPU and send it to the card at the current scale (and never upload anything resembling a mipmap). OTOH, virtually every time something needs to be redrawn, it'll end up re-sending the bitmap data for the newly exposed area. It doesn't take much of that to lose the (often minor anyway) advantage of minimizing what was sent to start with.
Using textures is usually a lot faster, since most native drawing APIs aren't hardware accelerated.
It will very probably depend on the graphics card and driver.

Draw scaled images using CImageList

If you have images stored in a CImageList, is there an easy way to render them (with proper transparency) scaled to fit a given target rectangle? CImageList::DrawEx takes size information but I don't believe it does scaling, only cropping?
I guess you could render them to an offscreen bitmap, then StretchBlt() them to either your device or another offscreen bitmap, letting StretchBlt() do the scaling... Getting the transparency to carry over correctly will require some fiddling though, depending on your circumstances you may need to use AlphaBlend() instead.
My opinion is that most of the Win32 image handling code, and therefore by extension their MFC equivalents, like CImageList, CIcon, CImage, CBitmap, ... are inadequate for today's graphics needs. Especially handling per-pixel transparency hardly ever works consistently. I usually store my images in a CImage and use ::AlphaBlend() everywhere to get them to DC, or I use GetDIBits()/SetDIBits() and directly manipulate the RGBA entries (not very practical for doing scaling and similar operations, I admit). On the other hand I understand what it's like having to maintain code that uses these things already and wanting to update them to give them a bit of a modern look...