Reducing opengl draw calls vs binding smaller textures - opengl

I'm making an isometric (2D) game with SFML. I handle the drawing order (depth) by sorting all drawables by their Y position and it works perfectly well.
The game uses an enourmous amount of art assets, such that the npcs, monsters and player graphics alone are contained in their own 4k texture atlas. It is logistically not possible for me to put everything into one atlas. The target devices would not be able to handle textures of that size. Please do not focus on WHY it's impossible, and understand that I simply MUST use seperate files for my textures in this case.
This causes a problem. Let's say I have a level with 2 npcs and 2 pillars. The npcs are in NPCs.png and the pillars are in CastleLevel.png. Depending on where the npcs move, the drawing order (hence the opengly texture binding order) can be different. Let's say the Y positions are sorted like this:
npc1, pillar1, npc2, pillar 2
This would mean that opengl has to switch between the 2 textures twice. My question is, should I:
a) keep the texture atlasses OR
b) divide them all into smaller png files (1 png per npc, 1 png per pillar etc). Since the textures must be changed multiple times anyway, would it improve performance if opengl had to bind smaller textures instead?
Is it worth keeping the texture atlasses because it will SOMETIMES reduce the number of draw calls?

Since the textures must be changed multiple times anyway, would it improve performance if opengl had to bind smaller textures instead?
Almost certainly not. The cost of a texture bind is fixed; it isn't based on the texture's size.
It would be better for you to either:
Properly batch your rendering. That is, when you say "draw NPC1", you don't actually draw it yet. You stick some data in an array, and later on, you execute "draw NPCs", which draws all of the NPCs you've buffered in one go.
Use a bigger texture atlas, probably involving array textures. Each layer of the array texture would be one of the atlases you load. This way, you only ever bind one texture to render your scene.
Deal with it. 2D games aren't exactly stressful on the GPU or CPU. The overhead from the additional state changes will not be what knocks you down from 60FPS to 30FPS.

Related

OpenGL - How to render many different models?

I'm currently struggling with finding a good approach to render many (thousands) slightly different models. The model itself is a simple cube with some vertex offset, think of a skewed quad face. Each 'block' has a different offset of its vertices, so basically I have a voxel engine on steroids as each block is not a perfect cube but rather a skewed cuboid. To render this shape 48 vertices are needed but can be cut to 24 vertices as only 3 faces are visible. With indexing we are at 12 vertices (4 for each face).
But, now that I have the vertices for each block in the world, how do I render them?
What I've tried:
Instanced Rendering. Sounds good, doesn't work as my models are not the same.
I could simplify distant blocks to a cube and render them with glDrawArraysInstanced/glDrawElementsInstanced.
Put everything in one giant VBO. This has a better performance than rendering each cube individually, but has the downside of having one large mesh. This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
I am aware of frustum culling and occlusion culling, but I already have problems with some cubes in front of me (tested with a 128x128 world).
My requirements:
Draw some thousand models.
Each model has vertices offsets to make the block less cubic, stored in another VBO.
Each block has to be an individual object, as you should be able to place/remove blocks.
Any good performance advices?
This is not desireable as I need every cube to have different textures, lighting, etc... Selecting a single cube within that huge mesh is not possible.
Programmers should avoid declaring that something is "impossible"; it limits your thinking.
Giving each face of these cubes different textures has many solutions. The Minecraft approach uses texture atlases. Each "texture" is really just a sub-section of one large texture, and you use texture coordinates to select which sub-section a particular face uses. But you can get more complex.
Array textures allow for a more direct way to solve this problem. Here, the texture coordinates would be the same, but you use a per-vertex integer to select the correct texture for a face. All of the vertices for a particular face would have an index. And if you're clever, you don't even really need texture coordinates. You can generate them in your vertex shader, based on per-vertex values like gl_VertexID and the like.
Lighting parameters would work the same way: use some per-vertex data to select parameters from a UBO or SSBO.
As for the "individual object" bit, that's merely a matter of how you're thinking about the problem. Do not confuse what happens in the player's mind with what happens in your code. Games are an elaborate illusion; just because something appears to the user to be an "individual object" doesn't mean it is one to your rendering engine.
What you need is the ability to modify your world's data to remove and add new blocks. And if you need to show a block as "selected" or something, then you simply need another per-block value (like the lighting parameters and index for the texture) which tells you whether to draw it as a "selected" block or as an "unselected" one. Or you can just redraw that specific selected block. There are many ways of handling it.
Any decent graphics card (since about 2010) is able to render a few millions vertices in a blinking.
The approach is different depending on how many changes per frame. In other words, how many data must be transferred to the GPU per frame.
For the case of small number of changes, storing the data in one big VBO or many smaller VBOs (and their VAOs), sending the changes by uniforms, and calling several glDraw***, shows similar performance. Different hardwares behave with little difference. Indexed data may improve the speed.
When most of the data changes in every frame and these changes are hard or impossible to do in the shaders, then your app is memory-transfer bound. Streaming is a good advise.

How to do particle binning in OpenGL?

Currently I'm creating a particle system and I would like to transfer most of the work to the GPU using OpenGL, for gaining experience and performance reasons. At the moment, there are multiple particles scattered through the space (these are currently still created on the CPU). I would more or less like to create a histogram of them. If I understand correctly, for this I would first translate all the particles from world coordinates to screen coordinates in a vertex shader. However, now I want to do the following:
So, for each pixel a hit count of how many particles are inside. Each particle will also have several properties (e.g. a colour) and I would like to sum them for every pixel (as shown in the lower-right corner). Would this be possible using OpenGL? If so, how?
The best tool I recomend for having the whole data (if it fits on GPU memory) is the use of SSBO.
Nevertheless, you need data after transforming them (e.g. by a projection). Still SSBO is your best option:
In the fragment shader you read the properties of already handled particles (let's say, the rendered pixel) and write modified properties (number of particles at this pixel, color, etc) to the same index in the buffer.
Due to parallel nature of GPU, several instances coming from different particles may be doing concurrently the work for the same index. Thus you need to handle this on your own. Read Memory model and Atomic operations
Another approach, but limited, is using Blending
The idea is that each fragment increments the actual color value of the frame buffer. This can be done using GL_FUNC_ADD for glBlendEquationSeparate and using as fragment-output-color a value of 1/255 (normalized integer) for each RGB/a component.
Limitations come from the [0-255] range: Only up to 255 particles in the same pixel, the rest amount is clamped to this range and so "lost".
You have four components RGBA, thus four properties can be handled. But can have several renderbuffers in a FBO.
You can read the FBO by glReadPixels. Use glReadBuffer first with a GL_COLOR_ATTACHMENTi if you use a FBO instead of the default frame buffer.

Texturing Opengl Terrain?

What is the best way to texture terrain made from quads in OpenGL? I have around 30 different textures I want to have for my terrains (1 texture per terrain type, so 30 terrain types) and would like to have smooth transitions between any two of the terrains.
I have been doing some browsing on the web and found that there are many different methods, including 3d texturing, Alpha channels, blending, and using shaders. However, which of these is the most efficient and can handle the amount of textures I am looking to use? For example: This popular answer describes how to use some techniques, but since the mixmap only has 4 properties (RGBA) and so can only support 4 textures.
I should also note that I know nothing about shaders, so non-shader required techniques would be preferable.
Since you linked to an answer that describes texture splatting, and its question mentions the game Oblivion, I can provide some additional insight into that.
Basic texture splatting with an RGBA mixmap only supports four textures per terrain quad, but you can use different sets of textures for different quads. Oblivion divides its terrain into squares (called "cells") of 32 grid points (192 feet) per side, and each cell defines its own set of four terrain textures. So you can't have lots of texture diversity within a small area, but you can easily vary your textures over larger regions. If you prefer, you can define texture sets for smaller regions, even individual quads, at the expense of using more memory.
If you really need more than four textures in a quad, you can use multiple mixmaps. For each additional one, you just do another texture lookup to get four more blending factors, and blend in four more textures on top of the results from the previous mixmap. You can scale up to as many textures as you want, again at the expense of memory.
Texture splatting can be tricky to combine with with LOD techniques on the height map, because when a single low-detail terrain quad represents a group of high-detail quads, you have to sample several different mixmaps for different regions of the big quad. Oblivion sidesteps that problem by using texture splatting only for full-detail terrain; distant cells, rendered at lower resolution, use precomputed textures produced by the editor, which does the splatting and downscaling in advance.
One alternative to texture splatting is to use a clipmap to render a "megatexture". With this approach, you have a single large texture that represents your entire terrain, and you avoid filling up your RAM by loading different parts of it with only as much detail as is actually needed to render it based on the viewer's current position. (Distant parts of the terrain can't be seen at full detail, so there's no need to load them at full detail.)
The advantage of this approach is its artistic freedom: you can place fine details anywhere you want in the texture, without regard to the vertex grid. The disadvantage is that it's rather complex to implement, and the entire clipmap has to be stored somewhere, probably in a big file on disk, so that you can load parts of it into RAM as needed.

Using Vertex Buffer Objects for a tile-based game and texture atlases

I'm creating a tile-based game in C# with OpenGL and I'm trying to optimize my code as best as possible.
I've read several articles and sections in books and all come to the same conclusion (as you may know) that use of VBOs greatly increases performance.
I'm not quite sure, however, how they work exactly.
My game will have tiles on the screen, some will change and some will stay the same. To use a VBO for this, I would need to add the coordinates of each tile to an array, correct?
Also, to texture these tiles, I would have to create a separate VBO for this?
I'm not quite sure what the code would look like for tiling these coordinates if I've got tiles that are animated and tiles that will be static on the screen.
Could anyone give me a quick rundown of this?
I plan on using a texture atlas of all of my tiles. I'm not sure where to begin to use this atlas for the textured tiles.
Would I need to compute the coordinates of the tile in the atlas to be applied? Is there any way I could simply use the coordinates of the atlas to apply a texture?
If anyone could clear up these questions it would be greatly appreciated. I could even possibly reimburse someone for their time & help if wanted.
Thanks,
Greg
OK, so let's split this into parts. You didn't specify which version of OpenGL you want to use - I'll assume GL 3.3.
VBO
Vertex buffer objects, when considered as an alternative to client vertex arrays, mostly save the GPU bandwidth. A tile map is not really a lot of geometry. However, in recent GL versions the vertex buffer objects are the only way of specifying the vertices (which makes a lot of sense), so we cannot really talked about "increasing performance" here. If you mean "compared to deprecated vertex specification methods like immediate mode or client-side arrays", then yes, you'll get a performance boost, but you'd probably only feel it with 10k+ vertices per frame, I suppose.
Texture atlases
The texture atlases are indeed a nice feature to save on texture switching. However, on GL3 (and DX10)-enabled GPUs you can save yourself a LOT of trouble characteristic to this technique, because a more modern and convenient approach is available. Check the GL reference docs for TEXTURE_2D_ARRAY - you'll like it. If GL3 cards are your target, forget texture atlases. If not, have a google which older cards support texture arrays as an extension, I'm not familiar with the details.
Rendering
So how to draw a tile map efficiently? Let's focus on the data. There are lots of tiles and each tile has the following infromation:
grid position (x,y)
material (let's call it "material" not "texture" because as you said the image might be animated and change in time; the "material" would then be interpreted as "one texture or set of textures which change in time" or anything you want).
That should be all the "per-tile" data you'd need to send to the GPU. You want to render each tile as a quad or triangle strip, so you have two alternatives:
send 4 vertices (x,y),(x+w,y),(x+w,y+h),(x,y+h) instead of (x,y) per tile,
use a geometry shader to calculate the 4 points along with texture coords for every 1 point sent.
Pick your favourite. Also note that directly corresponds to what your VBO is going to contain - the latter solution would make it 4x smaller.
For the material, you can pass it as a symbolic integer, and in your fragment shader - basing on current time (passed as an uniform variable) and the material ID for a given tile - you can decide on the texture ID from the texture array to use. In this way you can make a simple texture animation.

How to create textures within GPU

Can anyone pls tell me how to use hardware memory to create textures in OpenGL ? Currently I'm running my game in window mode, do I need to switch to fullscreen to get the use of hardware ?
If I can create textures in hardware, is there a limit for no of textures (other than the hardware memory) ? and then how can I cache my textures into hardware ? Thanks.
This should be covered by almost all texture tutorials for OpenGL. For example here, here and here.
For every texture you first need a texture name. A texture name is like a unique index for a single texture. Every name points to a texture object that can have its own parameters, data, etc. glGenTextures is used to get new names. I don't know if there is any limit besides the uint range (2^32). If there is then you will probably get 0 for all new texture names (and a gl error).
The next step is to bind your texture (see glBindTexture). After that all operations that use or affect textures will use the texture specified by the texture name you used as parameter for glBindTexture. You can now set parameters for the texture (glTexParameter) and upload the texture data with glTexImage2D (for 2D textures). After calling glTexImage you can also free the system memory with your texture data.
For static textures all this has to be done only once. If you want to use the texture you just need to bind it again and enable texturing (glEnable(GL_TEXTURE_2D)).
The size (width/height) for a single texture is limited by GL_MAX_TEXTURE_SIZE. This is normally 4096, 8192 or 16384. It is also limited by the available graphics memory because it has to fit into it together with some other resources like the framebuffer or vertex buffers. All textures together can be bigger then the available memory but then they will be swapped.
In most cases the graphics driver should decide which textures are stored in system memory and which in graphics memory. You can however give certain textures a higher priority with either glPrioritizeTextures or with glTexParameter.
Edit:
I wouldn't worry too much about where textures are stored because the driver normally does a very good job with that. Textures that are used often are also more likely to be stored in graphics memory. If you set a priority that's just a "hint" for the driver on how important it is for the texture to stay on the graphics card. It's also possible the the priority is completely ignored. You can also check where textures currently are with glAreTexturesResident.
Usually when you talk about generating a texture on the GPU, you're not actually creating texture images and applying them like normal textures. The simpler and more common approach is to use Fragment shaders to procedurally calculate the colors of for each pixel in real time from scratch for every single frame.
The canonical example for this is to generate a Mandelbrot pattern on the surface of an object, say a teapot. The teapot is rendered with its polygons and texture coordinates by the application. At some stage of the rendering pipeline every pixel of the teapot passes through the fragment shader which is a small program sent to the GPU by the application. The fragment shader reads the 2D texture coordinates and calculates the Mandelbrot set color of the 2D coordinates and applies it to the pixel.
Fullscreen mode has nothing to do with it. You can use shaders and generate textures even if you're in window mode. As I mentioned, the textures you create never actually occupy space in the texture memory, they are created on the fly. One could probably think of a way to capture and cache the generated texture but this can be somewhat complex and require multiple rendering passes.
You can learn more about it if you look up "GLSL" in google - the OpenGL shading language.
This somewhat dated tutorial shows how to create a simple fragment shader which draws the Mandelbrot set (page 4).
If you can get your hands on the book "OpenGL Shading Language, 2nd Edition", you'll find it contains a number of simple examples on generating sky, fire and wood textures with the help of an external 3D Perlin noise texture from the application.
To create a texture on GPU look into "render to texture" tutorials. There are two common methods: Binding a PBuffer context as texture, or using Frame Buffer Objects. PBuffer render to textures are the older method, and have the wider support. Frame Buffer Objects are easier to use.
Also you don't have to switch to "fullscreen" mode for OpenGL to be HW accelerated. In fact OpenGL doesn't know about windows at all. A fullscreen OpenGL window is just that: A toplvel window on top of all other windows with no decorations and the input focus grabed. Some drivers bypass window masking and clipping code, and employ a simpler, faster buffer swap method if the window with the active OpenGL context covers the whole screen, thus gaining a little performance, but with current hard- and software the effect is very small compared to other influences.