Transferring large voxel data to a GLSL shader - c++

I'm working a program which renders a dynamic high resolution voxel landscape.
Currently I am storing the voxel data in 32x32x32 blocks with 4 bits each:
struct MapData {
char data[32][32][16];
}
MapData *world = new MapData[(width >> 5) * (height >> 5) * (depth >> 5)];
What I'm trying to do with this, is send it to my vertex and fragment shaders for processing and rendering. There are several different methods I've seen to do this, but I have no idea which one will be best for this.
I started with a sampler1D format, but that results in floating point output between 0 and 1. I also had the hinting suspicion that it was storing it as 16 bits per voxel.
As for Uniform Buffer Objects I tried and failed to implement this.
My biggest concern with all of this is not having to send the whole map to the GPU every frame. I want to be able to load maps up to ~256MB (1024x2048x256 voxels) in size, so I need to be able to send it all once, and then resend only the blocks that were changed.
What is the best solution for this short of writing OpenCL to handle the video memory for me. If there's a better way to store my voxels that makes this easier, I'm open to other formats.

If you just want a large block of memory to access from in a shader, you can use a buffer texture. This obviously requires a semi-recent GL version (3.0 or better), so you need DX10 hardware or better.
The concept is pretty straightforward. You make a buffer object that stores your data. You create a buffer texture using the typical glGenTextures command, then glBindTexture it to the GL_TEXTURE_BUFFER target. Then you use glTexBuffer to associate your buffer object with the texture.
Now, you seem to want to use 4 bits per voxel. So your image format needs to be a single-channel, unsigned 8-bit integral format. Your glTexBuffer call should be something like this:
glTexBuffer(GL_TEXTURE_BUFFER, GL_RUI8, buffer);
where buffer is the buffer object that stores your voxel data.
Once this is done, you can change the contents of this buffer object using the usual mechanisms.
You bind the buffer texture for rendering just like any other texture.
You use a usamplerBuffer sampler type in your shader, because it's an unsigned integral buffer texture. You must use the texelFetch command to access data from it, which takes integer texture coordinates and ignores filtering. Which is of course exactly what you want.
Note that buffer textures do have size limits. However, the size limits are often some large percentage of video memory.

Related

OpenGL: efficient way to read sparce pixel data from many framebuffer textures?

I'm writing a program that uses the GPU to calculate stuff, and I want to read data from the framebuffers to be used in my client code. The framebuffers I'm using are about 40 textures, all 1024x1024 in size, all of which contain data that needs read, but only very sparcely, like 50 or so pixels in arbitrary x/y coordinates from each texture. Using glReadPixels for each texture, for each frame, is proving too costly for me to do though...
I only need to read a few select pixels from each texture, is there a way to quickly gather their data without needing to download every entire texture from the GPU?
This sounds fairly expensive no matter how you slice it. A couple of approaches come to mind:
What I would try first is glReadPixels(), but with using a PBO. Bind a buffer large enough to hold all the pixels to the GL_PIXEL_PACK_BUFFER target, and then submit the glReadPixels() calls, with offsets to place the results in distinct sections of the buffer. Then call glMapBufferRange() to read back the values.
An alternate approach is that you copy all the pixels you want to read into a single texture. You could use glBlitFramebuffer() or glCopyTexSubImage2D(). Then use a single glReadPixels() or glGetTexImage() call to get all the data from this texture.
Both of these approaches should result in about the same amount of work and synchronization overhead. But one or the other could be more efficient, depending on which paths in the driver are better optimized.
As the earlier answer already suggested, I would make very sure that you really need this, and there isn't any way to keep and process the data on the GPU. Any time you read back data, you introduce synchronization between GPU and CPU, which is mostly harmful to performance.
Do you have any restrictions on what OpenGL version you can use? If not, it sounds like you should look into compute shaders. You say that you are calculating data, so I assume that you are "abusing" the rendering pipeline for your application, especially the fragment shader, and store fragment data in the framebuffer that is interpreted as something else than color.
If this is the case, then all you need is a shader storage buffer and an atomic counter. At some point right now you are deciding that fragment (x, y, z [z being the texture index]) should have value v. So in your compute shader, you do your calculation as you would in the fragment shader, but as output, you store a tuple (x, y, z, v). You store this tuple in the shader storage buffer at the index of the atomic counter which you increment after each written element. In the end, you have your data stored compactly in the buffer and only need to read back these elements. The exact number is the value the atomic counter holds after termination. Download the buffer with glGetBufferSubData into an array of location-value pairs, iterate over it and do your CPU magic.
If you need to copy the data from the GPU to the CPU memory, there is no way (AFAIK) around using glReadPixels.
Depending on what platform you're using, and the specific of your programs, you can try several optimizations, using FBOs:
Copy only part of the texture, assuming you know the locations of the pixels. Note that in most cases it still faster to copy the entire texture instead of issuing several small reads
If you don't need 32 bit textures, you can render to a lower color resolution. The specific depends on your platform extensions.
Maybe you don't really need to copy the pixels since you plan to use them as a texture input to the next stage? In that case you copy the pixels directly on the GPU using glCopyTexImage2D

I need my GLSL fragment shader to return the distance calculation

I'm using some standard GLSL (version 120) vertex and fragment shaders to simulate LIDAR. In other words, instead of just returning a color at each x,y position (each pixel, via the fragment shader), it should return color and distance.
I suppose I don't actually need all of the color bits, since I really only want the intensity; so I could store the distance in gl_FragColor.b, for example, and use .rg for the intensity. But then I'm not entirely clear on how I get the value back out again.
Is there a simple way to return values from the fragment shader? I've tried varying, but it seems like the fragment shader can't write variables other than gl_FragColor.
I understand that some people use the GLSL pipeline for general-purpose (non-graphics) GPU processing, and that might be an option — except I still do want to render my objects normally.
OpenGL already returns this "distance calculation" via the depth buffer, although it's not linear. You can simply create a frame buffer object (FBO), attach colour and depth buffers, render to it, and you have the result sitting in the depth buffer (although you'll have to undo the depth transformation). This is the easiest option to program provided you are familiar with the depth calculations.
Another method, as you suggest, is storing the value in a colour buffer. You don't have to use the main colour buffer because then you'd lose your colour or have to render twice. Instead, attach a second render target (texture) to your FBO (GL_COLOR_ATTACHMENT1) and use gl_FragData[0] for normal colour and gl_FragData[1] for your distance (for newer GL versions you should be declaring out variables in the fragment shader). It depends on the precision you need, but you'll probably want to make the distance texture 32 bit float (GL_R32F and write to gl_FragData[1].r).
- This is a decent place to start: http://www.opengl.org/wiki/Framebuffer_Object
Yes, GLSL can be used for compute purposes. Especially with ARB_image_load_store and nvidia's bindless graphics. You even have access to shared memory via compute shaders (though I've never got one faster than 5 times slower). As #Jherico says, fragment shaders generally output to a single place in a framebuffer attachment/render target, and recent features such as image units (ARB_image_load_store) allow you to write to arbitrary locations from a shader. It's probably overkill and slower but you could also write your distances to a buffer via image units .
Finally, if you want the data back on the host (CPU accessible) side, use glGetTexImage with your distance texture (or glMapBuffer if you decided to use image units).
Fragment shaders output to a rendering buffer. If you want to use the GPU for computing and fetching data back into host memory you have a few options
Create a framebuffer and attach a texture to it to hold your data. Once the image has been rendered you can read back information from the texture into host memory.
Use an CUDA, OpenCL or an OpenGL compute shader to write the memory into an arbitrary bound buffer, and read back the buffer contents

Using IBO for animation - good or bad?

I just watching my animated sprite code, and get some idea.
Animation was made by altering tex coords. It have buffer object, which holds current frame texture coords, as new frame requested, new texture coords feed up in buffer by glBufferData().
And what if we pre-calculate all animation frames texture coords, put them in BO and create Index Buffer Object with just a number of frame, which we need to draw
GLbyte cur_frames = 0; //1,2,3 etc
Now then as we need to update animation, all we need is update 1 byte (instead of 4 /quad vertex count/ * 2 /s, t/ * sizeof(GLfloat) bytes for quad drawing with TRIANGLE_STRIP) frame of our IBO with glBufferData, we don't need hold any texture coords after init of our BO.
I am missing something? What are contras?
Edit: of course your vertex data may be not gl_float just for example.
As Tim correctly states, this depends on your application, let us talk some numbers, you mention both IBO's and inserting texture coordinates for all frames into one VBO, so let us take a look at the impact of each.
Suppose a typical vertex looks like this:
struct vertex
{
float x,y,z; //position
float tx,ty; //Texture coordinates
}
I added a z-component but the calculations are similar if you don't use it, or if you have more attributes. So it is clear this attribute takes 20 bytes.
Let's assume a simple sprite: a quad, consisting of 2 triangles. In a very naive mode you just send 2x3 vertices and send 6*20=120 bytes to the GPU.
In comes indexing, you know you have actually only four vertices: 1,2,3,4 and two triangles 1,2,3 and 2,3,4. So we send two buffers to the GPU: one containing 4 vertices (4*20=80 byte) and one containing the list of indices for the triangles ([1,2,3,2,3,4]), let's say we can do this in 2 byte (65535 indices should be enough), so this comes down to 6*2=12 byte. In total 92 byte, we saved 28 byte or about 23%. Also, when rendering the GPU is likely to only process each vertex once in the vertex shader, it saves us some processing power also.
So, now you want to add all texture coordinates for all animations at once. First thing you have to note is that a vertex in indexed rendering is defined by all it's attributes, you can't split it in an index for positions and an index for texture coordinates. So if you want to add extra texture coordinates, you will have to repeat the positions. So each 'frame' that you add will add 80 byte to the VBO and 12 byte to the IBO. Suppose you have 64 frames, you end up with 64*(80+12)=5888byte. Let's say you have 1000 sprites, then this would become about 6MB. That does not seem too bad, but note that it scales quite rapidly, each frame adds to the size, but also each attribute (because they have to be repeated).
So, what does it gain you?
You don't have to send data to the GPU dynamically. Note that updating the whole VBO would require sending 80 bytes or 640 bits. Suppose you need to do this for 1000 sprites per frame at 30 frames per second, you get to 19200000bps or 19.2Mbps (no overhead included). This is quite low (e.g. 16xPCI-e can handle 32Gbps), but it could be worth wile if you have other bandwidth issues (e.g. due to texturing). Also, if you construct your VBO's carefully (e.g. separate VBO's or non-interleaved), you could reduce it to only updating the texture-part, which is only 16 byte per sprite in the above example, this could reduce bandwidth even more.
You don't have to waste time computing the next frame position. However, this is usually just a few additions and few if's to handle the edges of your textures. I doubt you will gain much CPU power here.
Finally, you also have the possibility to simply split the animation image over a lot of textures. I have absolutely no idea how this scales, but in this case you don't even have to work with more complex vertex attributes, you just activate another texture for each frame of animation.
edit: another method could be to pass the frame number in a uniform and do the calculations in your fragment shader, before sampling. Setting a single integer uniform should be that much of an overhead.
For a modern GPU, accessing/unpacking single bytes is not necessarily faster than accessing integer types or even vectors (register sizes & load instructions, etc.). You can just save memory and therefore memory bandwidth, but I wouldn't expect this to give much of a difference in relation to all other vertex attribute array accesses.
I think, the fastest way to supply a frame index for animated sprites is either an uniform, or if multiple sprites have to be rendered with one draw call, the usage of instanced vertex attrib arrays. With the latter, you could provide a single index for fixed-size subsequences of vertices.
For example, when drawing 'sprite-quads', you'd have one frame index fetch per 4 vertices.
A third approach would be a buffer-texture, when using instanced rendering.
I recommend a global (shared) uniform for time/frame index calculation, so you can calculate the animation index on the fly within you shader, which doesn't require you to update the index buffer (which then just represents the relative animation state among sprites)

Texture buffer objects or regular textures?

The OpenGL SuperBible discusses texture buffer objects, which are textures formed from data inside VBOs. It looks like there are benefits to using them, but all the examples I've found create regular textures. Does anyone have any advice regarding when to use one over the other?
According to the extension registry, texture buffers are only 1-dimensional, cannot do any filtering and have to be accessed by accessing explicit texels (by index), instead of normalized [0,1] floating point texture coordinates. So they are not really a substitution for regular textures, but for large uniform arrays (for example skinning matrices or per instance data). It would make much more sense to compare them to uniform buffers than to regular textures, like done here.
EDIT: If you want to use VBO data for regular, filtered, 2D textures, you won't get around a data copy (best done by means of PBOs). But when you just want plain array access to VBO data and attributes won't suffice for this, then a texture buffer should be the method of choice.
EDIT: After checking the corresponding chapter in the SuperBible, I found that they on the one hand mention, that texture buffers are always 1-dimensional and accessed by discrete integer texel offsets, but on the other hand fail to mention explicitly the lack of filtering. It seems to me they more or less advertise them as textures just sourcing their data from buffers, which explains the OP's question. But as mentioned above this is just the wrong comparison. Texture buffers just provide a way for directly accessing buffer data in shaders in the form of a plain array (though with an adjustable element type), not more (making them useless for regular texturing) but also not less (they are still a great feature).
Buffer textures are unique type of texture that allow a buffer object to be accessed from a shader like a texture. They are completely unique from normal OpenGL textures, including Texture1D, Texture2D, and Texture3D. There are two main reasons why you would use a Buffer Texture instead of a normal texture:
Since Texture Buffers are read like textures, you can read their contents from every vertex freely using texelFetch. This is something that you cannot do with vertex attributes, as those are only accessable on a per-vertex basis.
Buffer Textures can be useful as an alternative to uniforms when you need to pass in large arrays of data. Uniforms are limited in the size, while Buffer Textures can be massive in size.
Buffer Textures are supported in older versions of OpenGL than Shader Storage Buffer Objects (SSBO), making them good for use as a fallback if SSBOs are not supported on a GPU.
Meanwhile, regular textures in OpenGL work differently and are designed for actual texturing. These have the following features not shared by Texture Buffers:
Regular textures can have filters applied to them, so that when you sample pixels from them in your shaders, your GPU will automatically interpolate colors based on nearby pixels. This prevents pixelation when textures are upscaled heavily, though they will get progressively more blurry.
Regular textures can use mipmaps, which are lower quality versions of the same texture used at further view distances. OpenGL has built in functionality to generate mipmaps, or you can supply your own. Mipmaps can be helpful for performance in large 3d scenes. Mipmaps also can help prevent flickering in textures that are rendered further away.
In summary of these points, you could say that normal textures are good for actual texturing, while Buffer Textures are good as a method for passing in raw arrays of values.
Regular textures are used when VBOs are not supported.

OpenGL: Buffer object performance issue

I have a question related to Buffer object performance. I have rendered a mesh using standard Vertex Arrays (not interleaved) and I wanted to change it to Buffer Object to get some performance boost. When I introduce buffers object I was in shock when I find out that using Buffers object lowers performance four times. I think that buffers should increase performance. Does it true? So, I think that I am doing something wrong...
I have render 3d tiled map and to reduce amount of needed memory I use only a single tile (vertices set) to render whole map. I change only texture coordinates and y value in vertex position for each tile of map. Buffers for position and texture coords are created with GL_DYNAMIC_DRAW parameter. The buffer for indices is created with GL_STATIC_DRAW because it doesn't change during map rendering. So, for each tile of map buffers are mapped and unmapped at least one time. Should I use only one buffer for texture coords and positions?
Thanks,
Try moving your vertex/texture coordinates with GL_MODELVIEW/GL_TEXTURE matrices, and leave buffer data alone (GL_STATIC_DRAW alone). e.g. if tile is of size 1x1, create rect (0, 0)-(1, 1) and set it's position in the world with glTranslate. Same with texture coordinates.
VBOs are not there to increase performance of drawing few quads. Their true power is seen when drawing meshes with thousands of polygons using shaders. If you don't need any forward compatibility with newer opengl versions, I see little use in using them to draw dynamically changing data.
If you need to update the buffer(s) each frame you should use GL_STREAM_DRAW (which hints that the buffer contents will likely be used only once) rather than GL_DYNAMIC_DRAW (which hints that they will be but used a couple of times before being updated).
As far as my experience goes, buffers created with GL_STREAM_DRAW will be treated similarly to plain ol' arrays, so you should expect about the same performance as for arrays when using it.
Also make sure that you call glMapBuffer with the access parameter set to GL_WRITE_ONLY, assuming you don't need to read the contents of the buffer. Otherwise, if the buffer is in video memory, it will have to be transferred from video memory to main memory and then back again (well, that's up to the driver really...) for each map call. Transferring to much data over the bus is a very real bottleneck that's quite easy to bump into.