Performance of GL_POINTS on modern hardware - opengl

Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Where GL_POINTS is initialized like this:
glPointSize(1.0);
glDisable(GL_POINT_SMOOTH);
I have a somewhat low-end graphics card (9600gt) and drawing vertices-only can bring a 2x fps increase on certain sceneries. Not sure if it applies too on more recent gpus.

2x fps increase on
You lose 98% of picture and get only 2x fps increase. That's not impressive. If you take into account that you should be able to easily render 300..500 fps on any decent hardware (with vsync disabled and minor optimizations), that's probably not worth it.
Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Well, if your scene has a LOT of alpha-blending and very "heavy" pixel shaders, then, obviously, displaying scene as point cloud will speed things up, because there's less pixels to fill.
On other hand, this kind of "optimization" will be completely useless for any practical task. I mean, if you're using blending and shaders, you probably wouldn't want to display your scene as pointlist in the first place, unless you're doing some kind of debug render (using glPolygonMode), and in case of debug render, you'll probably turn shaders off (because shaded/lit point will be hard to see) and disable lighting.
Even if you're using point sprites as particles or something, I'd stick with triangles - they give more control and do not have maximum size limit (compared to point sprites).
I can display more objects?
If you want more objects, you should probably try to optimzie things elsewhere first. If you stop trying to draw invisible objects (outside of field of view, etc), that'll be a start that can improve performance.
you have a mesh which is very far away from the camera. 1 million triangles and you know it is always in view. At this density ratio, triangles can't be bigger than a pixel,
When triangles are smaller than a pixel, and there are many of them, your mesh start looking like garbage and turns into pixelated mess of points. It will be ugly. Roughly same effect as when you disable mippimapping and texture filters and then render checkboard pattern. Using points instead of triangles might even aggravate effect.
: If you have 1mil triangle mesh that is always visible, you already need different kind of optimization. Reduce number of triangles (level of detail, dynamic tesselation or some solution that can simplify geometry on the fly), use bump mapping(maybe parallax mapping) to simulate extra geometry details that aren't even here, or even turn it into static background or a sprite. That'll work much better. Trying to render it using points will simply make it look ugly.

No, if the number of triangles is similar to the number of their shared vertices (considering the glDrawElements rendering command being used) in both modes the geometry-wise part of the rendering pipeline will be evaluated at roughly the same speed. The only benefit you can get from drawing GL_POINTS relies solely on the percentage of empty screen space you get from not drawing faces, thus only at fragment shader level.

Related

Texture tiling with continuous random offset?

I have a texture and a mesh, if I apply the texture on the mesh, it tiles it continuously as one would expect. The offset for each tile is equal.
The problem:
Non-tilable texture or texture with some outstanding elements are looking repetitive and cheap.
Example:
Solution Attempt
My first attempt was to programatically generate a texture size of a mesh with randomised offsets for each tiles. Of course the size of the texture became a problem, let alone the GPU limitation of a single texture max size.
What I would like to do
I would like to know if there's a way to make a Unity shader or a material that would load a single texture and tile it with random offsets for each tile and do it only once to keep the performance high?
I believe you might try one of techniques invented by Inigo Quilez (http://www.iquilezles.org/www/articles/texturerepetition/texturerepetition.htm).
Basically, non-tilable textures and textures with some outstanding elements are different problems.
Non-tilable textures
There are 2 ways of solving it:
Fixing the texture itself;
Mirrored repeat can be used in some cases (see GL_MIRRORED_REPEAT)
Textures with some outstanding elements
This can be solved in the following ways (or conjunction of them):
Modifying the texture (this includes enlargement as well);
Using multitexturing;
Well, maybe mirrored repeat can be used as well in some cases.
Shifting texture coordinates randomly
Unfortunately, I can't think of any case of these 2 problems (except, maybe, white nose textures) where texture coordinates shifting is a solution.
You are looking at this problem the wrong way. All games face this issue. They hide it simply by a) varying textures a lot instead of texturing large areas with the same texture and b) through level design. Imagine this plane filled with barns, gras, trees, fences and what not - suddenly the mono-textured surface blends in with its surroundings. Also camera angle plays a huge role in this. Try changing your camera position close to the ground and the repeating texture is much less noticeable.
Your plane is just a very extreme example. You should not try to fix it at this point but rather continue to build your game. Or design your textures to repeat well without showing clear patterns. The extreme would be a flatcolored texture. But generally large outdoor terrain textures simply have very little structure, almost being like noise, plus they don't use colors with any contrast, just shades of the same color.
Your offset idea won't work. Perhaps it might work technically (it may be inefficient though). But random offsets can't cover up the patterns, instead it will create new ones because the textures won't smoothly interpolate at their edges anymore, so you could clearly see a grid of squares. That I guess would be even uglier and more noticeable.
Lastly you can increase texture size or scale (blurryness may need to be covered up as explained above). In relation to camera angle this would be the easiest, most effective fix. Or at least an improvement.
old thread, but relevant to many I think. You can do this in a shader, by randomizing the Vertex position on the XZ plane, (or better) the UV co-ordinates, based on the world space of the co-ordinates.
The texture will still tile.... but instead of being in a straight line... it will be in a random wiggly line. This is great for stuff like terrain, grass etc.... but obviously no good if you want to maintain straight lines in your textures.
A second option is diffuse-detail shader. It tiles one texture up close to camera, and another when further away (which you can make softer / more blurry
Third option... blend 2 textures together, with different UV tiling scale (non divisible. e.g not scale 2 and 4, but use 1 and 2.334556) on each, so the pattern is harder to see

Deferred Rendering with Tile-Based culling Concept Problems

EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?
I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)
In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:
for each tile {
for each light effecting the tile {
render quad (the tile) and compute lighting
blend with previous tiles (GL_ONE, GL_ONE)
}
}
This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:
for each tile {
render quad (the tile) and compute all lights
}
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .
It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.
Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.

Blurry Skybox Texture

I have a problem when I render my skybox. I am using DirectX 11 with c++. The picture is too blurry. I think it might me I'm using too low resolution textures. Currently for every face of the skybox, the resolution is 1024x1024. My screen resolution is 1920x1080. On average I will be staring into one face of the skybox at all times, this means the 1024x1024 picture will be stretched to fill my screen, which is why it is blurry. I'm considering using 2048x2048 textures. I created a simple skybox texture and it is not blurry anymore. But my problem is it takes too much memory! Almost 100MB loaded to the GPU just for the background.
My question is that is there a better way to render skyboxes? I've looked around on the internet without much luck. Some say that the norm is 512x512 per face. The blurriness then is unacceptable. I'm wondering how the commercial games did their skyboxes? Did they use huge texture sizes? In particular, for those have seen it, I love the Dead Space 3 space environment. I would like to create something like that. So how did they do it?
Firstly, the pixel density will depend not only on the resolution of your texture and the screen, but also the field of view. A narrow field of view will result in less of the skybox filling the screen, and thus will zoom into the texture more, requiring higher resolution. You don't say exactly what FOV you're using, but I'm a little surprised a 1k texture is particularly blurry, so maybe it's a bit on the narrow side?
Oh, and before I forget - you should be using compressed textures... 2k textures shouldn't be that scary.
However, aside from changing the resolution, which obviously does start to burn through memory fairly quickly, I generally always combine the skybox with some simple distant objects.
For example, in a space scene I would probably render a fairly simple skybox which only contained things like nebula, etc., where resolution wasn't critical. I'd perhaps render at least some of the stars as sprites, where the texture density can be locally higher. A planet could be textured geometry.
If I was rendering a more traditional outdoor scene, I could render sky and clouds on the skybox, but a distant horizon as geometry. A moon in the sky might be an overlay.
There is no one standard answer - a variety of techniques can be employed, depending on the situation.

OpenGL voxel engine slow

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?

2D OpenGL scene slows down with lots of overlapping shapes

I'm drawing 2D shapes with OpenGL. They aren't using that many polygons. I notice that I can have lots and lots of shapes as long as they don't overlap. If I get a shape behind a shape behind.... etc.. it really starts lagging. I feel like I might be doing something wrong. Is this normal and is there a way to fix this (I can't omit rendering because I do blending for alpha). I also have CW backface culling enabled.
Thanks
Are your two cases (overlapping and non-overlapping) using the exact same set of shapes? Because if the overlapping case involves a total area of all your shapes that is larger than the first case, then it would be expected to be slower. If it's the same set of shapes that slows down if some of them overlap, then that would be very unusual and shouldn't happen on any standard hardware OpenGL implementation (what platform are you using?). Backface culling won't be causing any problem.
Whenever a shape is drawn, the GPU has to do some work for each pixel that it covers on the screen. If you draw the same shape 100 times in the same place, then that's 100-times the pixel work. Depth buffering can reduce some of the extra cost for opaque objects if you draw objects in depth-sorted order, but that trick can't work for things that use transparency.
When using transparency, it's the sum of the area of each rendered shape that matters. Not the amount of the screen that's covered after everything is rendered.
You need to order your shapes front-to-back if they are opaque. Then the depth test can quickly and easily reject each pixel.
Then, you need to order them back-to-front if they are transparent. Rendering transparency out-of-order is very slow.
Edit: Hmm, I (somehow) missed the fact that this is 2D, despite the fact that the OP mentioned it repeatedly.