Fastest way of plotting a point on screen in MFC C++ app - c++

I have an application that contains many millions of 3d rgb points that form an image when plotted. What is the fastest way of getting them to screen in a MFC application? I've tried CDC.SetPixelV in conjunction with a bitmap, which seems quite slow, and am looking towards a Direct3D or OpenGL window in my MFC view class. Any other good places to look?

Double buffering is your solution. There are many examples on codeproject. Check this one for example

Sounds like a point cloud. You might find some good information searching on that term.
3D hardware is the fastest way to take 3D points and get them into a 2D display, so either Direct3D or OpenGL seem like the obvious choices.
If the number of points is much greater than the number of pixels in your display, then you'll probably first want to cull points that are trivially outside the view. You put all your points in some sort of spatial partitioning structure (like an octree) and omit the points inside any node that's completely outside the viewing frustrum. This reduces the amount of data you have to push from system memory to GPU memory, which will likely be the bottleneck. (If your point cloud is static, and you're just building a fly through, and if your GPU has enough memory, you could skip the culling, send all the data at once, and then just update the transforms for each frame.)
If you don't want to use the GPU and instead write a software renderer, you'll want to render to a bitmap that's in the same pixel format as your display (to eliminate the chance of the blit need to do any pixels formatting as it blasts the bitmap to the display). For reasonable window sizes, blitting at 30 frames per second is feasible, but it might not leave much time for the CPU to do the rendering.

Related

drawing time series of millions of vertices using OpenGL

I'm working on a data visualisation application where I need to draw about 20 different time series overlayed in 2D, each consisting of a few million data points. I need to be able to zoom and pan the data left and right to look at it and need to be able to place cursors on the data for measuring time intervals and to inspect data points. It's very important that when zoomed out all the way, I can easily spot outliers in the data and zoom in to look at them. So averaging the data can be problematic.
I have a naive implementation using a standard GUI framework on linux which is way too slow to be practical. I'm thinking of using OpenGL instead (testing on a Radeon RX480 GPU), with orthogonal projection. I searched around and it seems VBOs to draw line strips might work, but I have no idea if this is the best solution (would give me the best frame rate).
What is the best way to send data sets consisting of millions of vertices to the GPU, assuming the data does not change, and will be redrawn each time the user interacts with it (pan/zoom/click on it)?
In modern OpenGL (versions 3/4 core profile) VBOs are the standard way to transfer geometric / non-image data to the GPU, so yes you will almost certainly end up using VBOs.
Alternatives would be uniform buffers, or texture buffer objects, but for the application you're describing I can't see any performance advantage in using them - might even be worse - and it would complicate the code.
The single biggest speedup will come from having all the data points stored on the GPU instead of being copied over each frame as a 2D GUI might be doing. So do the simplest thing that works and then worry about speed.
If you are new to OpenGL, I recommend the book "OpenGL SuperBible" 6th or 7th edition. There are many good online OpenGL guides and references, just make sure you avoid the older ones written for OpenGL 1/2.
Hope this helps.

Infinite cube world engine (like Minecraft) optimization suggestions?

Voxel engine (like Minecraft) optimization suggestions?
As a fun project (and to get my Minecraft-adict son excited for programming) I am building a 3D Minecraft-like voxel engine using C# .NET4.5.1, OpenGL and GLSL 4.x.
Right now my world is built using chunks. Chunks are stored in a dictionary, where I can select them based on a 64bit X | Z<<32 key. This allows to create an 'infinite' world that can cache-in and cache-out chunks.
Every chunk consists of an array of 16x16x16 block segments. Starting from level 0, bedrock, it can go as high as you want (unlike minecraft where the limit is 256, I think).
Chunks are queued for generation on a separate thread when they come in view and need to be rendered. This means that chunks might not show right away. In practice you will not notice this. NOTE: I am not waiting for them to be generated, they will just not be visible immediately.
When a chunk needs to be rendered for the first time a VBO (glGenBuffer, GL_STREAM_DRAW, etc.) for that chunk is generated containing the possibly visible/outside faces (neighboring chunks are checked as well). [This means that a chunk potentially needs to be re-tesselated when a neighbor has been modified]. When tesselating first the opaque faces are tesselated for every segment and then the transparent ones. Every segment knows where it starts within that vertex array and how many vertices it has, both for opaque faces and transparent faces.
Textures are taken from an array texture.
When rendering;
I first take the bounding box of the frustum and map that onto the chunk grid. Using that knowledge I pick every chunk that is within the frustum and within a certain distance of the camera.
Now I do a distance sort on the chunks.
After that I determine the ranges (index, length) of the chunks-segments that are actually visible. NOW I know exactly what segments (and what vertex ranges) are 'at least partially' in view. The only excess segments that I have are the ones that are hidden behind mountains or 'sometimes' deep underground.
Then I start rendering ... first I render the opaque faces [culling and depth test enabled, alpha test and blend disabled] front to back using the known vertex ranges. Then I render the transparent faces back to front [blend enabled]
Now... does anyone know a way of improving this and still allow dynamic generation of an infinite world? I am currently reaching ~80fps#1920x1080, ~120fps#1024x768 (screenshots: http://i.stack.imgur.com/t4k30.jpg, http://i.stack.imgur.com/prV8X.jpg) on an average 2.2Ghz i7 laptop with a ATI HD8600M gfx card. I think it must be possible to increase the number of frames. And I think I have to, as I want to add entity AI, sound and do bump and specular mapping. Could using Occlusion Queries help me out? ... which I can't really imagine based on the nature of the segments. I already minimized the creation of objects, so there is no 'new Object' all over the place. Also as the performance doesn't really change when using Debug or Release mode, I don't think it's the code but more the approach to the problem.
edit: I have been thinking of using GL_SAMPLE_ALPHA_TO_COVERAGE but it doesn't seem to be working?
gl.Enable(GL.DEPTH_TEST);
gl.Enable(GL.BLEND); // gl.Disable(GL.BLEND);
gl.Enable(GL.MULTI_SAMPLE);
gl.Enable(GL.SAMPLE_ALPHA_TO_COVERAGE);
To render a lot of similar objects, I strongly suggest you take a look into instanced draw : glDrawArraysInstanced and/or glDrawElementsInstanced.
It made a huge difference for me. I'm talking from 2 fps to over 60 fps to render 100000 similar icosahedrons.
You can parametrize your cubes by using Attribs ( glVertexAttribDivisor and friends ) to make them differents. Hope this helps.
It's on ~200fps currently, should be OK. The 3 main things that I've done are:
1) generation of both chunks on a separate thread.
2) tessellation the chunks on a separate thread.
3) using a Deferred Rendering Pipeline.
Don't really think the last one contributed much to the overall performance but had to start using it because of some of the shaders. Now the CPU is sort of falling asleep # ~11%.
This question is pretty old, but I'm working on a similar project. I approached it almost exactly the same way as you, however I added in one additional optimization that helped out a lot.
For each chunk, I determine which sides are completely opaque. I then use that information to do a flood fill through the chunks to cull out the ones that are underground. Note, I'm not checking individual blocks when I do the flood fill, only a precomputed bitmask for each chunk.
When I'm computing the bitmask, I also check to see if the chunk is entirely empty, since empty chunks can obviously be ignored.

C++ GUI Development - Bitmap vs. Vector Graphics CPU Usage

I'm currently in the process of designing and developing GUI's for some audio applications made in C++ (using the Juce framework).
So far I've been playing with using bitmap graphics to create custom sliders and dials, by using 'film strip' style images to animate the components (meaning when the user interacts with a slider it triggers a method that changes the offset of a film-strip image to change the components appearance). Depending on the size of the original image and the number of 'frames', the CPU usage level changes quite dramatically.
Firstly, what would be the most efficient bitmap file format to use in terms of CPU consumption? At the moment I'm using PNG images.
Secondly, would it be more efficient to use vector graphics for these kind of graphical components? I understand the main differences between bitmap and vector graphics, but I haven't found any information regarding their CPU usage levels with regard to GUI interaction.
Or would CPU usage be down to the particular methods/functions/libraries/frameworks being used?
Thanks!
Or would CPU consumption be down to the particular methods/functions/libraries/frameworks being used?
Any of these things could influence it.
Pixel based images might take a while to read off of disk the bigger they are. Compressed types might take more time to uncompress. Vector might take more time to render when are loaded.
That being said, I would definitely not expect that your choice of image type to have any impact on its performance. Since you didn't provide a code example it is hard to speculate beyond that.
In general, you would expect that the run-time costs of the images to happen when they are loaded. So whenever you create an image object. If you create an images all over the place, then maybe its expensive. It is possible that your film strip is recreating the images instead of loading them once and caching them.
Before choosing bitmap vs. vector graphics, investigate if your graphics processor supports vector or bitmap graphics. Some things take a long time to draw as vectors.
Have you tried double-bufferring?
This is where you write to a buffer in memory while the display (graphics processor) is loading another.
Load your bitmaps from the resource once. Store them as memory snapshots to avoid the additional cost of translating them from a format.
Does your graphic processor support "blitting"?
Blitting is where the graphics processor can copy a rectangular area in memory (bitmap) and display it along with apply optional operations before displaying (such as XOR with existing bits).
Summary:
To improve your rendering speed, only convert images from the file into a bitmap form once. Store this somewhere. Refer to this converted bitmap as needed. Next, investigate and implement double buffering. Lastly, investigate and use bit-blitting or blitting.
Other optimization rules apply here too, such as reviewing the design, removing requirements, loop unrolling, passing images via pointer vs. copying them, and reduce "if" statements by using boolean logic and Karnaugh (sp?) maps.
In general, calculations for rendering vector graphics are going to take longer than blitting a rectangular region of a bitmap to the screen. But for basic UI stuff, neither should be particularly intensive.
You probably should do some profiling. Perhaps you're redrawing much more frequently than necessary. Or perhaps the PNG is being decoded each time you try to draw from it. (I'm not familiar with Juce.)
For a straight Windows app, I'd probably render vector graphics into a device-dependent bitmap once on startup and then just blit from the bitmap to the screen. Using vector gives you DPI independence, and blitting from a device-dependent bitmap is about the fastest way to paint a block of pixels. I believe the color matching is done when you render to the device-dependent bitmap, so you don't even have the ICM overhead on the screen drawing.
Vector graphics was ditched long ago - bitmap graphics are more performant. The thing is that you can send a bitmap to the GPU once and then render it forever more by a simple copy.
Secondly, the GPU uses it's own texture compression. DirectX is DXT5, I believe, but when the GPU sees the texture, it doesn't care what you loaded it from.
However, a modern CPU even with a crappy integrated GPU should have absolutely no problem with simple GUI rendering. If you're struggling, then it's time to look again at the technique you're using. Perhaps your framework is slow or your use of it is suboptimal.

Which of these is faster?

I was wondering if it was faster to render a single quad the size of the window with a texture the size of a window than to draw the bitmap directly to the window using double buffering coupled with the platform specific way of drawing to a window.
The initial setup for textures tends to be relatively slow, but once that's done the drawing is quite fast -- in a typical case where graphics memory is available, it'll upload the texture to the memory on the graphics cards during initial setup, and after that, all the drawing will happen from there. At the same time, that initial upload will also typically include full a full mipmap down to 1x1 resolution, so you're uploading a bit more than just the full-resolution texture.
With platform specific drawing, you usually don't have quite as much work up-front. If only part of the bitmap is visible, only the visible part will be uploaded. If the bitmap is going to be scaled, it'll typically scale it on the CPU and send it to the card at the current scale (and never upload anything resembling a mipmap). OTOH, virtually every time something needs to be redrawn, it'll end up re-sending the bitmap data for the newly exposed area. It doesn't take much of that to lose the (often minor anyway) advantage of minimizing what was sent to start with.
Using textures is usually a lot faster, since most native drawing APIs aren't hardware accelerated.
It will very probably depend on the graphics card and driver.

How does Photoshop (Or drawing programs) blit?

I'm getting ready to make a drawing application in Windows. I'm just wondering, do drawing programs have a memory bitmap which they lock, then set each pixel, then blit?
I don't understand how Photoshop can move entire layers without lag or flicker without using hardware acceleration. Also in a program like Expression Design, I could have 200 shapes and move them around all at once with no lag. I'm really wondering how this can be done without GPU help.
Also, I don't think super efficient algorithms could justify that?
Look at this question:
Reduce flicker with GDI+ and C++
All you can do about DC drawing without GPU is to reduce flickering. Anything else depends on the speed of filling your memory bitmap. And here you can use efficient algorithms, multithreading and whatever you need.
Certainly modern Photoshop uses GPU acceleration if available. Another possible tool is DMA. You may also find it helpful to read the source code of existing programs like GIMP.
Double (or more) buffering is the way it's done in games, where we're drawing a ton of crap into a "back" buffer while the "front" buffer is being displayed. Then when the draw is done, the buffers are swapped (a pointer swap, not copies!) and the process continues in the new front and back buffers.
Triple buffering offers another bonus, in that you can start drawing two-frames-from-now when next-frame is done, but without forcing a buffer swap in the middle of the screen refresh. Many games do the buffer swap in the middle of the refresh, but you can sometimes see it as visible artifacts (tearing) on the screen.
Anyway- for an app drawing bitmaps into a window, if you've got some "slow" operation, do it into a not-displayed buffer while presenting the displayed version to the rendering API, e.g. GDI. Let the system software handle all of the fancy updating.