C++, OpenGL - Rendering large amount of... teapots - c++

I am quite new in OpenGL programming. My goal was to set object-oriented graphics programming and I proudly can say that I done some progress. Now I have different problem.
Lets say we have working program what can make one, two or many rotating teapots. I made this by using list inside my class. Raw code for Drawing function is here:
void Draw(void)
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glPushMatrix();
for(list<teapot>::iterator it=teapots.begin();it!=teapots.end();it++){
glTranslatef(it->pos.x,it->pos.y,it->pos.z);
glRotatef(angle,it->ang.x,it->ang.y,it->ang.z);
glutSolidTeapot(it->size);
glRotatef(angle,-it->ang.x,-it->ang.y,-it->ang.z);
glTranslatef(-it->pos.x,-it->pos.y,-it->pos.z);
}
glPopMatrix();
glutSwapBuffers();
}
Everything is great, but when I draw large amount of teapots - say, 128 in two rows - my fps number drops. I don't know, if it is just hardware limit or I make something wrong? Maybe glPushMatrix() and glPopMatrix() should happen more often? Or less often?

You're using an old, deprecated part of OpenGL (called "immediate mode") in which all the graphics data is sent from the CPU to the GPU every frame: inside glutSolidTeapot() is code that does something like glBegin(GL_TRIANGLES) followed by lots of glVertex3f(...) and finally glEnd(). The reason that's deprecated is because it's a bottleneck. GPUs are highly parallel and are capable of processing many triangles at the same time, but they can't do that if your program is sending the vertices one-at-a-time with glVertex3f.
You should learn about the modern OpenGL API, in which you start by creating a "buffer object" and loading your vertex data into it — basically uploading your shape into the GPU's memory once, up-front — and then you can issue lots of calls telling the GPU to draw triangles using the vertices in that buffer object, instead of having to send all the vertices again every time.
(Unfortunately, this means you won't be able to use glutSolidTeapot(), since that draws in immediate mode and doesn't know how to produce vertex data for a buffer object. But I'm sure you can find a teapot model somewhere on the web.)
Open.gl is a decent tutorial that I know of for modern-style OpenGL, but I'm sure there are others as well.

Wyzard is right,partially.Besides the fact you are using old deprecated API where on each draw call you submit all your data again and again from CPU to GPU you also expect to maintain descent frame rate while rendering the same geometry multiple times.So in fact,keeping such an approach to geometry rendering while using programmable pipeline will not gain you much either.You will start noticing FPS drop already after +- 40-60 objects(depends on your GPU).What you really need is called batched drawing.The batch drawing may have different techniques all of witch imply you using modern OpenGL as we are talking here of data buffers(Arrays of vertices in your case which you upload to GPU).You can either push all the geometry into a single vertex buffer or use instanced rendering commands.In your case ,if all you are after is drawing the same mesh multiple times,second technique is perfect solution.There are more complex techniques like indirect multiple draw commands ,which allow you drawing indeed very large quantities of different geometry by a single draw call.But those are pretty advanced for the beginners.Anyway,the bottom line is you must move to modern OpenGL and start using geometry batching if you want to keep your app FPS high while drawing large amounts of meshes.

Related

C++ OpenGL array of coordinates to draw lines/borders and filled rectangles?

I'm working on a simple GUI for my application on OpenGL and all I need is to draw a bunch of rectangles and a 1px border arround them. Instead of going with glBegin and glEnd for each widget that has to draw (which can reduce performance). I need to know if this can be done with some sort of arrays/lists (batch data) of coordinates and their color.
Requirements:
Rectangles are simple filled with one color for every corner or each corner with a color. (mainly to form gradients)
Lines/borders are simple with one color and 1px thick, but they may not always closed (do not form a loop).
Use of textures/images is excluded. Only geometry data.
Must be compatible with older OpenGL versions (down to version 1.3)
Is there a way to achieve this with some sort of arrays and not glBegin and glEnd? I'm not sure how to do this for lines/borders.
I've seen this kind of implementation in Gwen GUI but it uses textures.
Example: jQuery EasyUI Metro Theme
In any case in modern OpenGL you should restrain to use old fashion API calls like glBegin and the likes. You should use the purer approach that has been introduced with core contexts from OpenGL 3.0. The philosophy behind it is to become much closer to actual way of modern hardware functionning. DiretX10 took this approach, somehow OpenGL ES also.
It means no more lists, no more immediate mode, no more glVertex or glTexCoord. In any case the drivers were already constructing VBOs behind this API because the hardware only understands that. So the OpenGL core "initiative" is to reduce OpenGL implementation complexity in order to let the vendors focus on hardware and stop producing bad drivers with buggy support.
Considering that, you should go with VBO, so you make an interleaved or multiple separated buffer data to store positions and color information, then you bind to attributes and use a shader combination to render the whole. The attributes you declare in the vertex shader are the attributes you bound using glBindVertexBuffer.
good explanation here:
http://www.opengl.org/wiki/Vertex_Specification
The recommended way is then to make one vertex buffer for the whole GUI and every element should just be put one after another in the buffer, then you can render your whole GUI in one draw call. This is how you will get the best performance.
Then if your GUI has dynamic elements this is no longer possible exept if using glUpdateBufferSubData or the likes but it has complex performance implications. You are better to cut your vertex buffer in as many buffers that are necessary to compose the independent parts, then you can render with uniforms modified between each draw call at will to configure the change of looks that is necessary in the dynamic part.

What is the point of an SDL2 Texture?

I'm kind of stuck on the logic behind an SDL2 texture. To me, they are pointless since you cannot draw to them.
In my program, I have several surfaces (or what were surfaces before I switched to SDL2) that I just blitted together to form layers. Now, it seems, I have to create several renderers and textures to create the same effect since SDL_RenderCopy takes a texture pointer.
Not only that, but all renderers have to come from a window, which I understand, but still fouls me up a bit more.
This all seems extremely bulky and slow. Am I missing something? Is there a way to draw directly to a texture? What are the point of textures, and am I safe to have multiple (if not hundreds) of renderers in place of what were surfaces?
SDL_Texture objects are stored as close as possible to video card memory and therefore can easily be accelerated by your GPU. Resizing, alpha blending, anti-aliasing and almost any compute-heavy operation can harshly be affected by this performance boost. If your program needs to run a per-pixel logic on your textures, you are encouraged to convert your textures into surfaces temporarily. Achieving a workaround with streaming textures is also possible.
Edit:
Since this answer recieves quite the attention, I'd like to elaborate my suggestion.
If you prefer to use Texture -> Surface -> Texture workflow to apply your per-pixel operation, make sure you cache your final texture unless you need to recalculate it on every render cycle. Textures in this solution are created with SDL_TEXTUREACCESS_STATIC flag.
Streaming textures (creation flag is SDL_TEXTUREACCESS_STREAMING) are encouraged for use cases where source of the pixel data is network, a device, a frameserver or some other source that is beyond SDL applications' full reach and when it is apparent that caching frames from source is inefficient or would not work.
It is possible to render on top of textures if they are created with SDL_TEXTUREACCESS_TARGET flag. This limits the source of the draw operation to other textures although this might already be what you required in the first place. "Textures as render targets" is one of the newest and least widely supported feature of SDL2.
Nerd info for curious readers:
Due to the nature of SDL implementation, the first two methods depend on application level read and copy operations, though they are optimized for suggested scenarios and fast enough for realtime applications.
Copying data from application level is almost always slow when compared to post-processing on GPU. If your requirements are more strict than what SDL can provide and your logic does not depend on some outer pixel data source, it would be sensible to allocate raw OpenGL textures painted from you SDL surfaces and apply shaders (GPU logic) to them.
Shaders are written in GLSL, a language which compiles into GPU assembly. Hardware/GPU Acceleration actually refers to code parallelized on GPU cores and using shaders is the prefered way to achieve that for rendering purposes.
Attention! Using raw OpenGL textures and shaders in conjunction with SDL rendering functions and structures might cause some unexpected conflicts or loss of flexibility provided by the library.
TLDR;
It is faster to render and operate on textures than surfaces although modifying them can sometimes be cumborsome.
Through creating a SDL2 Texture as a STREAMING type, one can lock and unlock the entire texture or just an area of pixels to perform direct pixel operations. One must create prior a SDL2 Surface, and link with lock-unlock as follows:
SDL_Surface surface = SDL_CreateSurface(..);
SDL_LockTexture(texture, &rect, &surface->pixels, &surface->pitch);
// paint into surface pixels
SDL_UnlockTexture(texture);
The key is, if you draw to texture of larger size, and the drawing is incremental ( e.g. data graph in real time ) be sure to only lock and unlock the actual area to update. Otherwise the operations will be slow, with heavy memory copying.
I have experienced reasonable performance and the usage model is not too difficult to understand.
In SDL2 it is possible to render off-screen / render directly to a texture. The function to use is:
int SDL_SetRenderTarget(SDL_Renderer *renderer, SDL_Texture *texture);
This only works if the renderer enables SDL_RENDERER_TARGETTEXTURE.

Opengl 2D performance tips

I'm currently developing a Touhou-esque bullet hell shooter game. The screen will be absolutely filled with bullets (so instancing is what I want here), but I want this to work on older hardware, so I'm doing something along the lines of this at the moment, there are not colors, textures, etc. yet until I figure this out.
glVertexPointer(3, GL_FLOAT, 0, SQUARE_VERTICES);
for (int i = 0; i < info.centers.size(); i += 3) {
glPushMatrix();
glTranslatef(info.centers.get(i), info.centers.get(i + 1), info.centers.get(i + 2));
glScalef(info.sizes.get(i), info.sizes.get(i + 1), info.sizes.get(i + 2));
glDrawElements(GL_QUADS, 4, GL_UNSIGNED_SHORT, SQUARE_INDICES);
glPopMatrix();
}
Because I want this to work on old hardware I'm trying to avoid shaders and whatnot. The setup up there fails me on about 80 polygons. I'm looking to get at least a few hundred out of this. info is a struct which has all the goodies for rendering, nothing much to it besides a few vectors.
I'm pretty new to OpenGL, but I at least heard and tried out everything that can be done, not saying I'm good with it at all though. This game is a 2D game, I switched from SDL to Opengl because it would make for some fancier effects easier. Obviously SDL works differently, but I never had this problem using it.
It boils down to this, I'm clearly doing something wrong here, so how can I implement instancing for old hardware (OpenGL 1.x) correctly? Also, give me any tips for increasing performance.
Also, give me any tips for increasing performance.
If you're going to use sprites....
Load all sprites into single huge texture. If they don't fit, use several textures, but keep number of textures low - to avoid texture switching.
Switch textures and change OpenGL state as infrequently as possible. Ideally, you should set texture once, and draw everything you can with it.
Use texture fonts for text. FTGL font might look nice, but it can hit performance very hard with complex fonts.
Avoid alpha-blending when possible and use alpha-testing.
When alpha-blending, always use alpha-testing to reduce number of pixels you draw. When your texture has many pixels with alpha==0, cut them out with alpha-test.
Reduce number of very big sprites. Huge screen-aligned/pixel-aligne sprite (1024*1024) will drop FPS even on very good hardware.
Don't use non-power-of-2 sized textures. They (used to) produce huge performance drop on certain ATI cards.
glTranslatef
For 2D sprite-based(that's important) game you could avoid matrices completely (with exception of camera/projection matrices, perhaps). I don't think that matrices will benefit you very much with 2D game.
With 2d game your main bottleneck will be GPU memory transfer speed - transferring data from texture to screen. So "use as little draw calls" and "put everything in VA" won't help you - you can kill performance with one sprite.
However, if you're going to use vector graphics (see area2048(youtube) or rez) that does not use textures, then most of the advice above will not apply, and such game won't be very different from 3d game. In this case it'll be reasonable to use vertex arrays, vertex buffer objects or display lists (depends on what is available) and utilize matrix function - because your bottleneck will be vertex processing. You'll still have to minimize number of state switches.

OpenGL voxel engine slow

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?

What is the most efficient way to draw voxels (cubes) in opengl?

I would like to draw voxels by using opengl but it doesn't seem like it is supported. I made a cube drawing function that had 24 vertices (4 vertices per face) but it drops the frame rate when you draw 2500 cubes. I was hoping there was a better way. Ideally I would just like to send a position, edge size, and color to the graphics card. I'm not sure if I can do this by using GLSL to compile instructions as part of the fragment shader or vertex shader.
I searched google and found out about point sprites and billboard sprites (same thing?). Could those be used as an alternative to drawing a cube quicker? If I use 6, one for each face, it seems like that would be sending much less information to the graphics card and hopefully gain me a better frame rate.
Another thought is maybe I can draw multiple cubes using one drawelements call?
Maybe there is a better method altogether that I don't know about? Any help is appreciated.
Drawing voxels with cubes is almost always the wrong way to go (the exceptional case is ray-tracing). What you usually want to do is put the data into a 3D texture and render slices depending on camera position. See this page: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch39.html and you can find other techniques by searching for "volume rendering gpu".
EDIT: When writing the above answer I didn't realize that the OP was, most likely, interested in how Minecraft does that. For techniques to speed-up Minecraft-style rasterization check out Culling techniques for rendering lots of cubes. Though with recent advances in graphics hardware, rendering Minecraft through raytracing may become the reality.
What you're looking for is called instancing. You could take a look at glDrawElementsInstanced and glDrawArraysInstanced for a couple of possibilities. Note that these were only added as core operations relatively recently (OGL 3.1), but have been available as extensions quite a while longer.
nVidia's OpenGL SDK has an example of instanced drawing in OpenGL.
First you really should be looking at OpenGL 3+ using GLSL. This has been the standard for quite some time. Second, most Minecraft-esque implementations use mesh creation on the CPU side. This technique involves looking at all of the block positions and creating a vertex buffer object that renders the triangles of all of the exposed faces. The VBO is only generated when the voxels change and is persisted between frames. An ideal implementation would combine coplanar faces of the same texture into larger faces.