How much does glCallList boost the performance? - opengl

I'm new to OpenGL. I have written a programm before getting to know OpenGL display lists. By now, it's pretty difficult to exploit them as my code contains many not 'gl' lines everywhere in between. So I'm curious how much I've lost in performance..

It depends, for static geometry, it can help quite a lot, but display lists are being deprecated in favor of VBOs (vertex buffer objects) and similar techniques that allow you to upload geometry data to the card once then just re-issue draw calls that use that data.
Display-lists (which was a nice idea in OpenGL 1.0) have turned out to be difficult to implement correctly in all corner cases, and are being phased out for more straightforward approaches.
However, if all you cache are geometry data, then by all means, lists are ok. But VBOs are the future so I'd suggest to learn those.

Call lists are boosting performance. If your users graphics card has enough memory, and supports Call lists (most of them):
There are several "performance bottlenecks", of which one of them is the pushing speed of data to the graphics card. When you call a OpenGL function, it will either calculate on the CPU or on the GPU (graphics card's processor). The GPU is optimized for graphics operations, but, the CPU must send it's data over to the GPU. This requires a lot of time.
If you use call lists; the data is sent (prematurely) to the GPU, if possible. Then when you call a call list, the CPU won't have to push data to the GPU.
Huge performance boost if you use them.

Corner case:
If you plan to run your app remotely over GLX then display lists can be a huge win because "pushing to the card" involves a costly libgl->network->xserver->libgl trip.

Related

drawing time series of millions of vertices using OpenGL

I'm working on a data visualisation application where I need to draw about 20 different time series overlayed in 2D, each consisting of a few million data points. I need to be able to zoom and pan the data left and right to look at it and need to be able to place cursors on the data for measuring time intervals and to inspect data points. It's very important that when zoomed out all the way, I can easily spot outliers in the data and zoom in to look at them. So averaging the data can be problematic.
I have a naive implementation using a standard GUI framework on linux which is way too slow to be practical. I'm thinking of using OpenGL instead (testing on a Radeon RX480 GPU), with orthogonal projection. I searched around and it seems VBOs to draw line strips might work, but I have no idea if this is the best solution (would give me the best frame rate).
What is the best way to send data sets consisting of millions of vertices to the GPU, assuming the data does not change, and will be redrawn each time the user interacts with it (pan/zoom/click on it)?
In modern OpenGL (versions 3/4 core profile) VBOs are the standard way to transfer geometric / non-image data to the GPU, so yes you will almost certainly end up using VBOs.
Alternatives would be uniform buffers, or texture buffer objects, but for the application you're describing I can't see any performance advantage in using them - might even be worse - and it would complicate the code.
The single biggest speedup will come from having all the data points stored on the GPU instead of being copied over each frame as a 2D GUI might be doing. So do the simplest thing that works and then worry about speed.
If you are new to OpenGL, I recommend the book "OpenGL SuperBible" 6th or 7th edition. There are many good online OpenGL guides and references, just make sure you avoid the older ones written for OpenGL 1/2.
Hope this helps.

How can I draw to the display, without OpenGL?

I've been learning OpenGL, and as I sit trying to write my VBOs, PBOs, VAOs, textures, quads, bindings, fragment shaders, vertex shaders, and a whole suite of other modern abstractions upon abstractions built after decades of evolution, I wonder: Isn't the display nothing but a large block of memory?
I've heard of tales, that in the "good ol' days" (such as the Commodore 64), all you had to do was assign a value to an arbitrary byte in memory, and the screen would change a pixel. Extremely simple and elegant. In the modern day, this has changed with layers upon layers of abstractions and safeguards, such that changing a pixel on your display is several hundred feet away.
This begs the question, is it possible in the modern day to just "update a pixel of the screen"? Is it possible to write my own graphics driver or something, where I can send commands to some C wrapper which interfaces with the GPU to change those pixels? This is an extremely broad question, but I'm curious. The answer I'm looking for to this question would provide a rough outline of what you'd have to do in order to be able to arbitrarily get some C code to set a pixel on the screen, as well as a rough outline of why OpenGL has progressed the way it has - what problems did VBOs, PBOs, VAOs, bindings, shaders, etc. solve, and how we got to where we are today.
Isn't the display nothing but a large block of memory?
Yes, it is called a framebuffer.
I've heard of tales, that in the "good ol' days" (such as the Commodore 64)
Your current PC works like that right when you power it up! If you use the CPU to write into video memory, that is called a software renderer.
In the modern day, this has changed with layers upon layers of abstractions and safeguards, such that changing a pixel on your display is several hundred feet away.
No, they are not abstractions/safeguards for "changing pixels". Nowadays software renderers are not used anymore. Instead, you have to tell the GPU (which is another computer on its own) how to draw. That "talk" is what the APIs (like OpenGL) do for you.
Now, the GPUs are meant to be fast at drawing, and that requires specialized code and data structures. Those are all the things you mention: VBOs, PBOs, VAOs, shaders, etc. (in OpenGL parlance). There is no way around that, because GPUs are different hardware.
is it possible in the modern day to just "update a pixel of the screen"?
Yes, but that will end up being drawn somehow by the GPU, even if it looks to you like a memory write.
Is it possible to write my own graphics driver or something, where I can send commands to some C wrapper which interfaces with the GPU to change those pixels?
Yes, but that "C wrapper" is the graphics driver. A graphics driver for a modern GPU is very complex.
what you'd have to do in order to be able to arbitrarily get some C code to set a pixel on the screen
You cannot write a "C program" to write to a graphical screen because the C standard does not concern itself with graphical displays.
So it depends on your operating system, your hardware, whether you want 2D or 3D acceleration support, the API you choose...
as well as a rough outline of why OpenGL has progressed the way it has - what problems did VBOs, PBOs, VAOs, bindings, shaders, etc. solve, and how we got to where we are today.
See above.
You can make your own frame buffer - that is just an integer array - and do rasterization on it, then use for example the Windows GDI function SetBitmapBits() to draw it to the display in one go. The final draw-to-display command depends on the operating system.
How you do the rasterization on your framebuffer is completely up to you. You can use the CPU to draw individual pixels or rasterize lines and triangles, see for example this demo of my old CPU graphics engine using Windows GDI: https://youtu.be/GFzisvhtRS4.
Using the CPU is fine as long as you do not rasterize large datasets. From my experience, the limit to real-time 60fps rendering on the CPU is ~50k lines per frame.
If you want to rasterize really large datasets, you have to use a GPU in some way. Since the framebuffer is just an integer array, you can transfer it to/from the GPU using OpenCL or CUDA and on the GPU - if your dataset happens to already be in video memory - do all the rasterization extremely fast in parallel. For this you will need an additional z-buffer to decide which pixels to overdraw by occluding geometries. This way you can rasterize approximately 30 Million lines per frame at 60fps. This demo is rendered on the GPU in real time using OpenCL: https://youtu.be/lDsz2maaZEo
Is it possible in the modern day to just "update a pixel of the screen"?
Yes. In Windows for example, you can use SetPixel() to draw a pixel or BitBlt() to draw in bulk. See this Q/A
This works fine, but this means you're using the CPU for rendering and you'll find the GPU is much more effective for this task, especially if you require decent framerate and non-trivial graphics. The reason there's these "whole suite of other modern abstractions upon abstractions" is to serve as an interface to the GPU since it has an independent set of memory and totally different execution model. Other GPU libraries (OpenCL, DirectX, Vulkan, etc) all have the same kind of abstractions.
I've glossed over many nuances but I hope the point gets across.

OpenGL rendering/updating loop issues

I'm wondering how e.g. graphic (/game) engines do their job with lot's of heterogeneous data while a customized simple rendering loop turns into a nightmare when you have some small changes.
Example:
First, let's say we have some blocks in our scene.
Graphic-Engine: create cubes and move them
Customized: create cube template for vertices, normals, etc. copy and translate them to the position and copy e.g. in a vbo. One glDraw* call does the job.
Second, some weird logic. We want block 1, 4, 7, ... to rotate on x-axis, 2, 5, 8, ... on y-axis and 3, 6, 9 on z-axis with a rotation speed linear to the camera distance.
Graphic-Engine: manipulating object's matrix and it works
Customized: (I think) per object glDraw* call with changing model-matrix uniform is not a good idea, so a translation matrix should be something like an attribute? I have to update them every frame.
Third, a block should disappear if the distance to the camera is lower than any const value Q.
Graphic-Engine: if (object.distance(camera) < Q) scene.drop(object);
Customized: (I think) our vbo is invalid and we have to recreate it?
Again to the very first sentence: it feels like engines do those manipulations for free, while we have to rethink how to provide and update data. And while we do so, the engine (might, but I actually don't know) say: 'update whatever you want, at least I'm going to send all matrizes'.
Another Example: What about a voxel-based world (e.g. Minecraft) where we only draw the visible surface, and we are able to throw a bomb and destroy many voxels. If the world's view data is in one huge buffer we only have one glDraw*-call but have to recreate the buffer every time then. If there are smaller chunks, we have many glDraw*-calls and also have to manipulate buffers, which are smaller.
So is it a good deal to send let's say 10MB of buffer update data instead of 2 gl*-calls with 1MB? How many updates are okay? Should a rendering loop deal with lazy updates?
I'm searching for a guide what a 60fps application should be able to update/draw per frame to get a feeling of what is possible. For my tests, every optimization try is another bottleneck.
And I don't want those tutorials which says: hey there is a new cool gl*Instance call which is super-fast, buuuuut you have to check if your gpu supports it. Well, I also rather consider this an optimization than a meaningful implementation at first.
Do you have any ideas, sources, best practices or rule of thumb how a rendering/updating routine best play together?
My questions are all nearly the same:
How many updates per frame are okay on today's hardware?
Can I lazy-load data to have it after a few frames, but without freezing my application
Do I have to do small updates and profile my loop if there are some microseconds left till next rendering?
Maybe I should implement a real-time profiler which gets a feeling over time, how expensive updates are and can determine the amount of updates per frame?
Thank you.
It's unclear how any of your questions relate to your "graphics engines" vs "customized" examples. All the updates you do with a "graphics engines" are translated to those OpenGL calls in the end.
In brief:
How many updates per frame are okay on today's hardware?
Today's PCIe bandwidth is huge (can go as high as 30 GB/s). However, to utilize it in its entirety you have to reduce the number transactions via consolidating OpenGL calls. The exact number of updates entirely depends on the hardware, drivers, and the way you use them, and graphics hardware is diverse.
This is the kind of answer you didn't want to hear, but unfortunately you have to face the truth: to reduce the number of OpenGL calls you have to use the newer version APIs. E.g. instead of setting each uniform individually you are better to submit a bunch of them through uniform shader buffer objects. Instead of submitting each MVP of each model individually, it's better to use instanced rendering. And so on.
An even more radical approach would be to move to a lower-level (and newer) API, i.e. Vulkan, which aims to solve exactly this problem: the cost of submitting work to the GPU.
Can I lazy-load data to have it after a few frames, but without freezing my application
Yes, you can upload buffer objects asynchronously. See Buffer Object Streaming for details.
Do I have to do small updates and profile my loop if there are some microseconds left till next rendering?
Maybe I should implement a real-time profiler which gets a feeling over time, how expensive updates are and can determine the amount of updates per frame?
You don't need any of these if you do it asynchronously.

Blitting surfaces in OpenGL

Both SDL and Game Maker have the concept of surfaces, images that you may modify on the fly and display them. I'm using OpenGL 1 and i'd like to know if openGL has this concept of Surface.
The only way that i came up with was:
Every frame create / destroy a new texture based on needs.
Every frame, update said texture based on needs.
These approachs don't seem to be very performant, but i see no alternative. Maybe this is how they are implemented in the mentioned engines.
Yes these two are the ways you would do it in OpenGL 1.0. I dont think there are any other means as far as 1.0 spec is concerned.
Link : https://www.opengl.org/registry/doc/glspec10.pdf
Do note that the textures are stored on the device memory (GPU) which is fast to access for shading. And the above approaches copy it between host (CPU) memory and device memory. Hence the performance hit is the speed of host-device copy.
Why are you limited to OpenGL 1.0 spec. You can go higher and then you start getting more options.
Use GLSL shaders to directly edit content from one texture and output the same to another texture. Processing will be done on the GPU and a device-device copy is as fast as it gets.
Use CUDA. Map a texture to a CUDA array, use your kernel to modify the content. Or use OpenCL for non-NVIDIA cards.
This would be the better scenario so long as the modification can be executed in parallel this would benefit.
I would suggest trying the CPU copy method, as it might be fast enough for your needs. The host-device copy is getting faster with latest hardware. You might be able to get real-time 60fps or higher even with this copy, unless its a lot of textures you plan to execute this for.

mipmap generation in opengl - is it hardware accelerated?

The purpose here isn't rendering, but gpgpu; it's for image blurring:
given an image, I need to blur it with a fixed given separable kernel (see e.g. Separable 2D Blur Kernel).
For GPU processing, a good popular method would be to first filter the lines, then filter the columns; and using the vertex shader and the fragment shader to do so (*)
However, if I have a fixed-sized kernel, I think I can use a fast-calculated mipmap that is close to the level I want, and then upsample it (as was suggested here) .
The question is therefore: will an opengl-created mipmap be faster than a mipmap I create myself using the method of (*)?
Put another way: is the mipmap creation optimized on the gpu itself? will it always outperform (speed-wise) user-created glsl code? or would it depend on the graphics card?
Edit:
Thanks for the replies (Kahler, Jean-Simon Brochu). However, I still haven't seen any resources that explicitly say whether mipmaps generation by the gpu is faster than any user-created mipmaps, because of specific mipmap-generation-gpu-hardware...
OpenGL does not care how the functions are implemented.
OpenGL is a set of specifications, among them is the glGenerateMipmap.
Anyone can write a software renderer or develop a video card compliant to the specification. If it pass the tests, it's ~OpenGL certified~
That means that no function is mandatory to be performed on CPU or GPU, or anywhere, they just have to produce the OpenGL expected results.
Now for the practical side:
Nowadays, you can just assume the mipmap generation is done by the video card, because the major-vendors adopted this approach.
If you really want to know, you will have to check specifically to the video card you are programing to.
As for performance, assume you can't beat the video card.
Even if you come up with some highly optimized code performed in some high-tech-full-of-things-CPU, you will have to upload the mipmaps you generated to the GPU, and this operation alone will probably take more time then letting the GPU do the work after you've uploaded the full-resolution texture.
And, if you program the mipmaping as a shader, still unlikely to beat the hard-coded (maybe even hard wired) built-in function. (and that code-alone, not counting the fact that it may schedule better, process apart, etc)
This site explains the glGenerateMipmap history better =))