I am implementing a map renderer for Quake. I am currently running through the arrays of vertices and sending them one at a time. I was told that by using vertex arrays, I can greatly speed up the rendering process by sending vertices in a batch. Now, I have just looked at display lists and finally VBO's or vertex buffer objects. VBO's mention a great advantage in relation to client/server communication. IF I am just going to be developing a client and not a server, are VBO's still applicable to what I am doing?
What are games using currently in the OpenGL spectrum for quick vertex processing?
When they say "client/server" communication, they are not talking about a internet network.
Client = The CPU
Server = The GFX hardware
These are two separate pieces of hardware. While they are (usually) attached to the same motherboard, they still need to communicate with each other. A graphics card doesn't usually have access to the main memory on your motherboard, so vertices (and textures, indices etc.) need to actually be sent to the graphics device.
Short story: Use Vertex Buffer Objects. For static data (like a Quake map) there is nothing better. The vertices are sent to the graphics device (server) once and stay there. When you draw things vertex by vertex (using something like glBegin(GL_TRIANGLES)) the vertices are sent across every frame, which, as you can imagine, is pretty inefficient.
VBOs are the modern answer.
Do not misunderstand the terms "client" and "server" in the OpenGL-sense. This is not the difference between Apache (HTTP server) and Chrome (HTTP client, among other things). In OpenGL terms, the client is you/your code. The server is the graphics driver/hardware.
VBOs allow you to store vertex information on the graphics hardware, which can let you avoid sending each vertex to the card every time you use it. This is a huge advantage.
You ask about vertex processing, but to answer your question we'd have to clarify what you meant. If you mean getting vertices to the graphics card, the answer is VBOs. If we're talking about manipulating vertex data this can efficiently be achieved through Shaders.
Related
I am not a graphics programmer, I use C++ and C mainly, and every time I try to go into OpenGL, every book, and every resource starts like this:
GLfloat Vertices[] = {
some, numbers, here,
some, more, numbers,
numbers, numbers, numbers
};
Or they may even be vec4.
But then you do something like this:
for(int i = 0; i < 10000; i++)
for(int j = 0; j < 10000; j++)
make_vertex();
And you get a problem. That loop is going to take a significant amount of time to finish- and if the make_vertex() function is anything like a saxpy or something of the sort, it is not just a problem... it is a big problem. For example, let us assume I wish to create fractal terrain. For any modern graphic card this would be trivial.
I understand the paradigm goes like this: Write the vertices manually -> Send them over to the GPU -> GPU does vertex processing, geometry, rasterization all the good stuff. I am sure it all makes sense. But why do I have to do the entire 'Send it over' step? Is there no way to skip that entire intermediary step, and just create vertices on the GPU, and draw them, without the obvious bottleneck?
I would very much appreciate at least a point in the right direction.
I also wonder if there is a possible solution without delving into compute shaders or CUDA? Does openGL or GLSL not provide a suitable random function which can be executed in parallel?
I think what you're asking for could work by generating height maps with a compute shader, and mapping that onto a grid with fixed spacing which can be generated trivially. That's a possible solution off the top of my head. You can use GL Compute shaders, OpenCL, or CUDA. Details can be generated with geometry and tessellation shaders.
As for preventing the camera from clipping, you'd probably have to use transform feedback and do a check per frame to see if the direction you're moving in will intersect the geometry.
Your entire question seems to be built on a huge misconception, that vertices are the only things which need to be "crunched" by the GPU.
First, you should understand that GPUs are far more superior than CPUs when it comes to parallelism (heck, GPUs sacrifice conditional control jumping for the sake of parallelism). Second, shaders and these buffers you make are all stored on the GPU after being uploaded by the CPU. The reason you don't just create all vertices on the GPU? It's the same reason for why you load an image from the hard drive instead of creating a raw 2D array and start filling it up with your pixel data inline. Even then, your image would be stored in the executable program file, which is stored on the hard disk and only loaded to memory when you run it. In an actual application, you'll want to load your graphics off assets stored somewhere (usually the hard drive). Why not let the GPU load the assets from the hard drive by itself? The GPU isn't connected to a hardware's storage directly, but barely to the system's main memory via some BUS. That's because to connect to any storage directly, the GPU will have to deal with the file system which is managed by the OS. That's one of the things the CPU would be faster at doing since we're dealing with serialized data.
Now what shaders deal with is this data you upload to the GPU (vertices, texture coordinates, textures..etc). In ancient OpenGL, no one had to write any shaders. Graphics drivers came with a builtin pipeline which handles regular rendering requests for you. You'd provide it with 4 vertices, 4 texture coordinates and a texture among other things (transformation matrices..etc), and it'd draw your graphics for you on the screen. You could go a bit farther and add some lights to your scene and maybe customize a few things about it, but things were still pretty tight. New OpenGL specifications gave more freedom to the developer by allowing them to rewrite parts of the pipeline with shaders. The developer becomes responsible for transforming vertices into place and doing all sort of other calculations related to lighting etc.
I would very much appreciate at least a point in the right direction.
I am guessing it has something to do with uniforms, but really, with
me skipping pages, I really cannot understand how a shader program
runs or what the lifetime of the variables is.
uniforms are variables you can send to the shaders from the CPU every frame before you use it to render graphics. When you use the saturation slider in Photoshop or Gimp, it (probably) sends the saturation factor value to the shader as a uniform of type float. uniforms are what you use to communicate little settings like these to your shaders from your application.
To use a shader program, you first have to set it up. A shader program consists of at least 2 types of shaders linked together, a fragment shader and a vertex shader. You use some OpenGL functions to upload your shader sources to the GPU, issue an order of compilation followed by linking, and it'll give you the program's ID. To use this program, you simply glUseProgram(programId) and everything following this call will use it for drawing. The vertex shader is the code that runs on the vertices you send to position them on the screen correctly. This is where you can do transformations on your geometry like scaling, rotation etc. A fragment shader runs at some stage afterwards using interpolated (transitioned) values outputted from the vertex shader to define the color and the depth of every unit fragment on what you're drawing. This is where you can do post-processing effects on your pixels.
Anyway, I hope I've helped making a few things clearer to you, but I can only tell you that there are no shortcuts. OpenGL has quite a steep learning curve, but it all connects and things start to make sense after a while. If you're getting so bored of books and such, then consider maybe taking code snippets of every lesson, compile them, and start messing around with them while trying to rationalize as you go. You'll have to resort to written documents eventually, but hopefully then things will fit easier into your head when you have some experience with the implementation components. Good luck.
Edit:
If you're trying to generate vertices on the fly using some algorithm, then try looking into Geometry Shaders. They may give you what you want.
You probably want to use CUDA for the things you are used to do in C or C++, and let OpenGL access the rasterizer and other graphics stuff.
OpenGL an CUDA interact somehow nicely. A good entry point to customize the contents of a buffer object is here: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OPENGL.html#group__CUDART__OPENGL_1g0fd33bea77ca7b1e69d1619caf44214b , with cudaGraphicsGLRegisterBuffer method.
You may also want to have a look at the nbody sample from NVIDIA GPU SDK samples the come with current CUDA installs.
I want to find a way to send all the geometry from an opengl framebuffer to a remote computer, who would do the rendering. This would allow me to have very complex simulations running on some kind of a big supercomputer, and rendered on a small mobile or simply cheap client machine doing the rendering.
Before starting digging in my code, I though it would be relatively easy: let's copy the vertex arrays and send it through the network, using boost::serialisation for example, and that's it. But my geometry are encapsulated, which prevents me from accessing it from where I want to.
I have been able to render into a framebuffer instead of rendering directly on screen though, and I was wondering if there is a way to retrieve data from OpenGL's fbo's in anyway?
First your terminology is wrong. Frame Buffer Objects are encapsulations of off-screen images/surfaces and don't hold geometry.
Second: What you imagine has been implemented already by the VirtualGL project (however it's stuck at a rather old OpenGL profile and doesn't support modern GPUs).
Also X11/GLX always supported indirect OpenGL operation, i.e. a remote machine would send OpenGL commands to the local display server, which is exactly what you probably think of. But this has a major drawback: Network bandwidth becomes the major bottleneck.
Maya promo video explains how GPU Cache affects user making application run faster. In frameworks like Cinder we redraw all geopetry we want to be in the scene on each frame update sending it to video card. So I worder what is behind GPU Caching from a programmer prespective? What OpenGL/DirectX APIs are behind such technology? How to "Cache" my mesh in GPU memory?
There is, to my knowledge, no way in OpenGL or DirectX to directly specify what is to be, and not to be, stored and tracked on the GPU cache. There are however methodologies that should be followed and maintained in order to make best use of the cache. Some of these include:
Batch, batch, batch.
Upload data directly to the GPU
Order indices to maximize vertex locality across the mesh.
Keep state changes to a minimum.
Keep shader changes to a minimum.
Keep texture changes to a minimum.
Use maximum texture compression whenever possible.
Use mipmapping whenever possible (to maximize texel sampling locality)
It is also important to keep in mind that there is no single GPU cache. There are multiple (vertex, texture, etc.) independent caches.
Sources:
OpenGL SuperBible - Memory Bandwidth and Vertices
GPU Gems - Graphics Pipeline Performance
GDC 2012 - Optimizing DirectX Graphics
First off, the "GPU cache" terminology that Maya uses probably refers to graphics data that is simply stored on the card refers to optimizing a mesh for device-independent storage and rendering in Maya . For card manufacturer's the notion of a "GPU cache" is different (in this case it means something more like the L1 or L2 CPU caches).
To answer your final question: Using OpenGL terminology, you generally create vertex buffer objects (VBO's). These will store the data on the card. Then, when you want to draw, you can simply instruct the card to use those buffers.
This will avoid the overhead of copying the mesh data from main (CPU) memory into graphics (GPU) memory. If you need to draw the mesh many times without changing the mesh data, it performs much better.
I'm writing a 3d application for the Playbook, which has a PowerVR SGX 540. I noticed that if I stuff enough data in the VBO through opengl, I can cause the device to crash (not just the application, but the entire device, requiring a hard reboot). To cause the crash, I sent data for a model with ~300k triangles and ~150k vertices. I sent normal data for the vertices as well.
I found that the problem doesn't occur if I send less data (tried another model with half the triangles and vertices). Also, the issue doesn't occur if I use vertex arrays (though it's incredibly slow).
I'd like to know:
Is what I'm seeing a common result for mobile hardware? That is, is a 300k tri model with 150k vertices and normals overkill?
Can I check how much memory I have available for VBO usage outside of testing a bunch of different model sizes (it takes a good five minutes to recover the device from a crash)?
Could anything else be causing this issue? I've provided some additional information:
I'm using Qt for my GUI, and drawing the 3d scene to an FBO before it's painted to the GUI (I haven't checked if redoing all this without a UI by creating an EGL window and drawing to that recreates the problem yet -- that'll take awhile).
To verify it wasn't me using OpenGL poorly, I tried both using raw OpenGL calls for all the 3d stuff, and also doing everything with OpenSceneGraph. Both methods fail in the exact same way (VBO works with less data, vertex arrays work, increased VBO data causes a crash).
The program works fine on my desktop. Unfortunately, I don't have any other mobile devices I can test my application out on.
OpenGL ES only supports unsigned short (16 bit) as data type for indices, so if you're using an index array, you're over that limit.
I'm new to OpenGL. I have written a programm before getting to know OpenGL display lists. By now, it's pretty difficult to exploit them as my code contains many not 'gl' lines everywhere in between. So I'm curious how much I've lost in performance..
It depends, for static geometry, it can help quite a lot, but display lists are being deprecated in favor of VBOs (vertex buffer objects) and similar techniques that allow you to upload geometry data to the card once then just re-issue draw calls that use that data.
Display-lists (which was a nice idea in OpenGL 1.0) have turned out to be difficult to implement correctly in all corner cases, and are being phased out for more straightforward approaches.
However, if all you cache are geometry data, then by all means, lists are ok. But VBOs are the future so I'd suggest to learn those.
Call lists are boosting performance. If your users graphics card has enough memory, and supports Call lists (most of them):
There are several "performance bottlenecks", of which one of them is the pushing speed of data to the graphics card. When you call a OpenGL function, it will either calculate on the CPU or on the GPU (graphics card's processor). The GPU is optimized for graphics operations, but, the CPU must send it's data over to the GPU. This requires a lot of time.
If you use call lists; the data is sent (prematurely) to the GPU, if possible. Then when you call a call list, the CPU won't have to push data to the GPU.
Huge performance boost if you use them.
Corner case:
If you plan to run your app remotely over GLX then display lists can be a huge win because "pushing to the card" involves a costly libgl->network->xserver->libgl trip.