Alternative to Geometric Shader in OpenGL ES 3.0 - c++

Objective
I would want to render a nifti file in browser with better appearances and better frame rates. I have to use OpenGL ES 3.0 in C++. This code is then being transpiled to JS and Web Assembly code using Emscripten. All I need in solutions in GLES 3.0
Expected Outcome
I expect to pass some vertices(like in OpenGL) and each of these vertices is expected to be the corner of a cube. I think you could call it a voxel.
Ways to solve in OpenGL 3.0
I can use geometry shader to create more primitives from the point but this makes each vertex a voxel and is less preferred than the next method because of absence of interpolation. This method has an advantage of offering great frame rates.
I could create a GL_ELEMENT_ARRAY_BUFFER. It would take a size of 9*(no. of elements in vertex array)*(4 bytes). I ordered each index so as that a GL_TRIANGLE_FAN would create only 3 adjacent sides of a cube in each vertex. So each vertex corresponts to 9 indices(the 9th index being 0xFFFFFFFF GL_PRIMITIVE_RESTART_FIXED_INDEX). I think there is a much better way using GL_TRIANGLE_STRIP.
The 3rd Method is using glDrawArraysInstanced. I not so sure if this supports interpolation between pixels.
The Actual Problem
Rendering it as glDrawArrays(GL_POINTS) itself gives me 25 fps(not enough). My probable hurdles are
Could I just adjust gl_PointSize in the shader code so as to eliminate the space between each vertex I draw and hence get a good 3D image without gaps?(I would need to zoom the object) If so, how would I code that.
I can't use GL_ELEMENT_ARRAY_BUFFER because it takes way too much memory and I have built with the maximum available (4294967296 bytes or 4 GB). So I have to get creative somehow(without geometry shader). If possible I need to interpolate color for each vertex to form individual cubes.
Any other much better alternative for this is well welcomed.
THANKS IN ADVANCE.
I expect to get a solution which might enlighten me. For this to happen it must be something missed by me in the points listed above.

Related

Is there an efficient way to exceed GL_MAX_VIEWPORTS?

I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N different hypothesised poses (N will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N renderings simultaneously.
In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS model-pose matrices to a mat4 uniform array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.
GL_MAX_VIEWPORTS is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:
Is there is a workaround for the GL_MAX_VIEWPORTS limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS) times?
I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:
a geometry shader that adds h*gl_InvocationID to the y coordinates of the vertices after perspective projection (where h is the desired viewport height) and passes gl_InvocationID onto the fragment shader; and
a fragment shader that discards fragments with y coordinates that satisfy y<gl_InvocationID*h || y>=(gl_InvocationID+1)*h.
But I was put off investigating this idea further by the fear that branching and discard would be very detrimental to performance.
The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.
I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.
From a cursory glance at the paper, it seems to me that the actual viewport doesn't change. That is, you're still rendering to the same width/height and X/Y positions, with the same depth range.
What you want is to change which image you're rendering to. Which is what gl_Layer is for; to change which layer within the layered array of images attached to the framebuffer you are rendering to.
So just set the gl_ViewportIndex to 0 for all vertices. Or more specifically, don't set it at all.
The number of GS instancing invocations does not have to be a restriction; that's your choice. GS invocations can write multiple primitives, each to a different layer. So you could have each instance write, for example, 4 primitives, each to 4 separate layers.
Your only limitations should be the number of layers you can use (governed by GL_MAX_ARRAY_TEXTURE_LAYERS and GL_MAX_FRAMEBUFFER_LAYERS, both of which must be at least 2048), and the number of primitives and vertex data that a single GS invocation can emit (which is kind of complicated).

GLSL Shaders: blending, primitive-specific behavior, and discarding a vertex

Criteria: I’m using OpenGL with shaders (GLSL) and trying to stay with modern techniques (e.g., trying to stay away from deprecated concepts).
My questions, in a very general sense--see below for more detail—are as follows:
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
Background: My application draws points connected with lines in an ortho projection (vertices have varying depth in the projection). I’ve only recently started using shaders in the project (trying to get away from deprecated concepts). I understand that standard blending has ordering issues with alpha testing and depth testing: basically, if a “translucent” pixel at a higher z level is drawn first (thus blending with whatever colors were already drawn to that pixel at a lower z level), and an opaque object is then drawn at that pixel but at a lower z level, depth testing prevents changing the pixel that was already drawn for the “higher” z level, thus causing blending issues. To overcome this, you need to draw opaque items first, then translucent items in ascending z order. My gut feeling is that shaders wouldn’t provide an (efficient) way to change this behavior—am I wrong?
Further, for speed and convenience, I pass information for each vertex (along with a couple of uniform variables) to the shaders and they use the information to find a subset of the vertices that need special attention. Without doing a similar set of logic in the app itself (and slowing things down) I can’t know a priori what subset of vericies that is. Thus I send all vertices to the shader. However, when I draw “points” I’d like the shader to ignore all the vertices that aren’t in the subset it determines. I think I can get the effect by setting alpha to zero and using an alpha function in the GL context that will prevent drawing anything with alpha less than, say, 0.01. However, is there a better or more “correct” glsl way for a shader to say “just ignore this vertex”?
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Sort of. If you have access to GL 4.x-class hardware (Radeon HD 5xxx or better, or GeForce 4xx or better), then you can perform order-independent transparency. Earlier versions have techniques like depth peeling, but they're quite expensive.
The GL 4.x-class version uses essentially a series of "linked lists" of transparent samples, which you do a full-screen pass to resolve into the final sample color. It's not free of course, but it isn't as expensive as other OIT methods. How expensive it would be for your case is uncertain; it is proportional to how many overlapping pixels you have.
You still have to draw opaque stuff first, and you have to draw transparent stuff using special shader code.
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
No.
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
No in general, but yes for points. A Geometry shader can conditionally emit vertices, thus allowing you to discard any vertex for arbitrary reasons.
Discarding a vertex in non-point primitives is possible, but it will also affect the interpretation of that primitive. The reason it's simple for points is because a vertex is a primitive, while a vertex in a triangle isn't a whole primitive. You can discard lines, but discarding a vertex within a line is... of dubious value.
That being said, your explanation for why you want to do this is of dubious merit. You want to update vertex data with essentially a boolean value that says "do stuff with me" or not to. That means that, every frame, you have to modify your data to say which points should be rendered and which shouldn't.
The simplest and most efficient way to do this is to simply not render with them. That is, arrange your data so that the only thing on the GPU are the points you want to render. Thus, there's no need to do anything special at all. If you're going to be constantly updating your vertex data, then you're already condemned to dealing with streaming vertex data. So you may as well stream it in a way that makes rendering efficient.

Using Vertex Buffer Objects for a tile-based game and texture atlases

I'm creating a tile-based game in C# with OpenGL and I'm trying to optimize my code as best as possible.
I've read several articles and sections in books and all come to the same conclusion (as you may know) that use of VBOs greatly increases performance.
I'm not quite sure, however, how they work exactly.
My game will have tiles on the screen, some will change and some will stay the same. To use a VBO for this, I would need to add the coordinates of each tile to an array, correct?
Also, to texture these tiles, I would have to create a separate VBO for this?
I'm not quite sure what the code would look like for tiling these coordinates if I've got tiles that are animated and tiles that will be static on the screen.
Could anyone give me a quick rundown of this?
I plan on using a texture atlas of all of my tiles. I'm not sure where to begin to use this atlas for the textured tiles.
Would I need to compute the coordinates of the tile in the atlas to be applied? Is there any way I could simply use the coordinates of the atlas to apply a texture?
If anyone could clear up these questions it would be greatly appreciated. I could even possibly reimburse someone for their time & help if wanted.
Thanks,
Greg
OK, so let's split this into parts. You didn't specify which version of OpenGL you want to use - I'll assume GL 3.3.
VBO
Vertex buffer objects, when considered as an alternative to client vertex arrays, mostly save the GPU bandwidth. A tile map is not really a lot of geometry. However, in recent GL versions the vertex buffer objects are the only way of specifying the vertices (which makes a lot of sense), so we cannot really talked about "increasing performance" here. If you mean "compared to deprecated vertex specification methods like immediate mode or client-side arrays", then yes, you'll get a performance boost, but you'd probably only feel it with 10k+ vertices per frame, I suppose.
Texture atlases
The texture atlases are indeed a nice feature to save on texture switching. However, on GL3 (and DX10)-enabled GPUs you can save yourself a LOT of trouble characteristic to this technique, because a more modern and convenient approach is available. Check the GL reference docs for TEXTURE_2D_ARRAY - you'll like it. If GL3 cards are your target, forget texture atlases. If not, have a google which older cards support texture arrays as an extension, I'm not familiar with the details.
Rendering
So how to draw a tile map efficiently? Let's focus on the data. There are lots of tiles and each tile has the following infromation:
grid position (x,y)
material (let's call it "material" not "texture" because as you said the image might be animated and change in time; the "material" would then be interpreted as "one texture or set of textures which change in time" or anything you want).
That should be all the "per-tile" data you'd need to send to the GPU. You want to render each tile as a quad or triangle strip, so you have two alternatives:
send 4 vertices (x,y),(x+w,y),(x+w,y+h),(x,y+h) instead of (x,y) per tile,
use a geometry shader to calculate the 4 points along with texture coords for every 1 point sent.
Pick your favourite. Also note that directly corresponds to what your VBO is going to contain - the latter solution would make it 4x smaller.
For the material, you can pass it as a symbolic integer, and in your fragment shader - basing on current time (passed as an uniform variable) and the material ID for a given tile - you can decide on the texture ID from the texture array to use. In this way you can make a simple texture animation.

GLSL dynamically indexed arrays

I've been using DirectX (with XNA) for a while now, and have recently switched to OpenGL. I'm really loving it, but one thing has got me annoyed.
I've been trying to implement something that requires dynamic indexing in the vertex shader, but I've been told that this requires the equivilant of SM 4.0. However I know that this works in DX even with SM 2.0, possibly even 1.0. XNA's instancing sample uses this to do instancing on SM2.0 only cards http://create.msdn.com/en-US/education/catalog/sample/mesh_instancing.
The compiler can't have been "unrolling" it into a giant list of if statements, since this would surely exceed the instruction limit on SM2 for our 250 instances.
So is DX doing some trickery that I can't do with OpenGL, can I manipulate OpenGL to do the same, or is it a hardware feature that OpenGL doesn't expose?
You can upload an array for your light directions with something like glUniform3fv, then (assuming I understand what you're trying to do correctly) you just need your vertex format to include an index into this array (so there be lots of duplication of these indices if the index only changes once per mesh or something). If you don't already know, you can use glGetAttribLocation + glVertexAttribPointer to send arbitrary vertex attributes like this to the shader (as opposed to using the deprecated built-in attributes like gl_Vertex, gl_Normal, etc).
From your link:
Note that there is no single perfect
instancing technique. This must be
implemented in a different way on
Windows compared to Xbox 360, and on
Windows the ideal technique requires
shader 3.0, but there is also a
fallback approach that will work with
shader 2.0. This sample implements
several different instancing
techniques, so it can work on both
platforms and shader versions.
Not the emboldened part. So ont hat basis you should be able to do similar instancing on shader model 3. Shader model 2's instancing is usually performed using a matrix palette. It sumply means you can render multiple meshes in one call by uploading a load of transformation matrices in one go. This reduces draw calls and improves speed.
Anyway for OpenGL there was a lot of troubles finalising this extension and hence you need shader 4. You CAN, however, still stick a per vertex matrix palette index in yoru vertex structure and do matrix palette rendering using a shader...

What is the most efficient way to draw voxels (cubes) in opengl?

I would like to draw voxels by using opengl but it doesn't seem like it is supported. I made a cube drawing function that had 24 vertices (4 vertices per face) but it drops the frame rate when you draw 2500 cubes. I was hoping there was a better way. Ideally I would just like to send a position, edge size, and color to the graphics card. I'm not sure if I can do this by using GLSL to compile instructions as part of the fragment shader or vertex shader.
I searched google and found out about point sprites and billboard sprites (same thing?). Could those be used as an alternative to drawing a cube quicker? If I use 6, one for each face, it seems like that would be sending much less information to the graphics card and hopefully gain me a better frame rate.
Another thought is maybe I can draw multiple cubes using one drawelements call?
Maybe there is a better method altogether that I don't know about? Any help is appreciated.
Drawing voxels with cubes is almost always the wrong way to go (the exceptional case is ray-tracing). What you usually want to do is put the data into a 3D texture and render slices depending on camera position. See this page: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch39.html and you can find other techniques by searching for "volume rendering gpu".
EDIT: When writing the above answer I didn't realize that the OP was, most likely, interested in how Minecraft does that. For techniques to speed-up Minecraft-style rasterization check out Culling techniques for rendering lots of cubes. Though with recent advances in graphics hardware, rendering Minecraft through raytracing may become the reality.
What you're looking for is called instancing. You could take a look at glDrawElementsInstanced and glDrawArraysInstanced for a couple of possibilities. Note that these were only added as core operations relatively recently (OGL 3.1), but have been available as extensions quite a while longer.
nVidia's OpenGL SDK has an example of instanced drawing in OpenGL.
First you really should be looking at OpenGL 3+ using GLSL. This has been the standard for quite some time. Second, most Minecraft-esque implementations use mesh creation on the CPU side. This technique involves looking at all of the block positions and creating a vertex buffer object that renders the triangles of all of the exposed faces. The VBO is only generated when the voxels change and is persisted between frames. An ideal implementation would combine coplanar faces of the same texture into larger faces.