I have a PBO which is updated each frame by CUDA. After it, I also want to update a texture using this PBO, which I do using glTexSubImage2d. I'm afraid updating the whole texture is expensive and would like to update only the viewable region of the texture while my PBO has the whole data on it.
The problem is that, although glTexSubImage2d accepts an offset, width and height as parameters, they're only used when painting to the texture, while I still need my buffer data to be linearly layed. I'm afraid preparing the buffer data myself might be too expensive (actually it would be extremely expensive, since my PBO resides in GPU memory.)
Is there any alternative to glTexSubImage2d which also takes parameters for the buffer offset or should I keep updating the whole texture at once?
Please read up on the pixel store parameters, set with glPixelStore. The parameters GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_PIXELS and GL_UNPACK_SKIP_ROWS are of most interest for you:
These values are provided as a convenience to the programmer; they provide no functionality that cannot be duplicated by incrementing the pointer passed to glDrawPixels, glTexImage1D, glTexImage2D, glTexSubImage1D, glTexSubImage2D, glBitmap, or glPolygonStipple. Setting GL_UNPACK_SKIP_PIXELS to i is equivalent to incrementing the pointer by i n components or indices, where n is the number of components or indices in each pixel. Setting GL_UNPACK_SKIP_ROWS to j is equivalent to incrementing the pointer by j k components or indices, where k is the number of components or indices per row, as just computed in the GL_UNPACK_ROW_LENGTH section.
You're still going to use glTexImage and/or glTexSubImage for data transfer.
glTexSubimage2D has errors on getting data from the PBO if the selected ROI in the texture is not equal to the whole texture size.
That is a known issue which may not be fixed (e.g. opengl forum thread).
Related
Context
I have a fragment shader that processes a 2D image. Sometimes a pixel may be considered "invalid" (RGB value 0/0/0) for a few frame, while being valid the rest of the frames. This causes temporal noise as these pixels flicker.
I'd like to implement a sort of temporal filter where each rendering loop, each pixel is "shown" (RGB value not 0/0/0) if and only if this pixel was "valid" in the last X loops, where X might be 5, 10, etc. I figured if I could have an array of the same size as the image, I could set the element corresponding to a pixel to 0 when that pixel is invalid and increment it otherwise. And if the value is >= X, then the pixel can be displayed.
Image latency caused by the temporal filter is not an issue, but I want to minimize performance costs.
The question
So that's the context. I'm looking for a mechanism that allows me reading and writing (uniforms are therefore out) between different rendering loops of the same fragment shader. Reading back the data from my OpenGL application is a plus but not necessary.
I came across Shader Storage Buffer Object, would it fit my needs?
Are there other concerns I should be aware of? Performances? Coherency/memory barriers?
Yes, SSBOs are a suitable tool to have persistent memory between shader loops.
As I couldn't find a reason why it wouldn't work, I implemented it and I was indeed able to have a SSBO as an array with each element mapped to a pixel in order to do temporal filtering on each pixels.
I had to do a few things to not have artifacts in the image:
Use GL_DYNAMIC_COPY when binding the data with glBufferData.
Set my SSBO as volatile in the shader.
Use a barrier (memoryBarrierBuffer();) in my shader to separate the writing and reading of the SSBO.
As mentioned by #user253751 in a comment, I had to convert texture coordinates to index arrays.
I checked the performance costs of using the SSBO and they were negligible in my case: <0.1 ms for a 848x480 frame.
I'm writing a program that uses the GPU to calculate stuff, and I want to read data from the framebuffers to be used in my client code. The framebuffers I'm using are about 40 textures, all 1024x1024 in size, all of which contain data that needs read, but only very sparcely, like 50 or so pixels in arbitrary x/y coordinates from each texture. Using glReadPixels for each texture, for each frame, is proving too costly for me to do though...
I only need to read a few select pixels from each texture, is there a way to quickly gather their data without needing to download every entire texture from the GPU?
This sounds fairly expensive no matter how you slice it. A couple of approaches come to mind:
What I would try first is glReadPixels(), but with using a PBO. Bind a buffer large enough to hold all the pixels to the GL_PIXEL_PACK_BUFFER target, and then submit the glReadPixels() calls, with offsets to place the results in distinct sections of the buffer. Then call glMapBufferRange() to read back the values.
An alternate approach is that you copy all the pixels you want to read into a single texture. You could use glBlitFramebuffer() or glCopyTexSubImage2D(). Then use a single glReadPixels() or glGetTexImage() call to get all the data from this texture.
Both of these approaches should result in about the same amount of work and synchronization overhead. But one or the other could be more efficient, depending on which paths in the driver are better optimized.
As the earlier answer already suggested, I would make very sure that you really need this, and there isn't any way to keep and process the data on the GPU. Any time you read back data, you introduce synchronization between GPU and CPU, which is mostly harmful to performance.
Do you have any restrictions on what OpenGL version you can use? If not, it sounds like you should look into compute shaders. You say that you are calculating data, so I assume that you are "abusing" the rendering pipeline for your application, especially the fragment shader, and store fragment data in the framebuffer that is interpreted as something else than color.
If this is the case, then all you need is a shader storage buffer and an atomic counter. At some point right now you are deciding that fragment (x, y, z [z being the texture index]) should have value v. So in your compute shader, you do your calculation as you would in the fragment shader, but as output, you store a tuple (x, y, z, v). You store this tuple in the shader storage buffer at the index of the atomic counter which you increment after each written element. In the end, you have your data stored compactly in the buffer and only need to read back these elements. The exact number is the value the atomic counter holds after termination. Download the buffer with glGetBufferSubData into an array of location-value pairs, iterate over it and do your CPU magic.
If you need to copy the data from the GPU to the CPU memory, there is no way (AFAIK) around using glReadPixels.
Depending on what platform you're using, and the specific of your programs, you can try several optimizations, using FBOs:
Copy only part of the texture, assuming you know the locations of the pixels. Note that in most cases it still faster to copy the entire texture instead of issuing several small reads
If you don't need 32 bit textures, you can render to a lower color resolution. The specific depends on your platform extensions.
Maybe you don't really need to copy the pixels since you plan to use them as a texture input to the next stage? In that case you copy the pixels directly on the GPU using glCopyTexImage2D
I have a compute shader that is dispatched iteratively and uses a 2d texture to temporarily store values. Each invocation id accesses a particular row in the texture.
The problem is, this texture must be initialized to 0's before each shader dispatch.
Currently I use a loop at the end of the shader code that uses imageStore() to reset all pixels in the respective row back to 0.
for (uint i = 0; i < CONSTANT_SIZE; i++)
{
imageStore( myTexture, ivec2( i, global_invocation_id ), vec4( 0, 0, 0, 0) );
}
I was wondering if there is a faster way of doing this, a way to set more than one pixel with a single call (preferably an entire row)? I've looked at the GLSL 4.3 specification on image operations but I can't find one that doesn't require a specific pixel location.
If there is a faster way to achieve this on the CPU I would be open to that as well, i've tried rebuffering the texture using glTexImage2D(), but there is not really any noticeable performance changes to using imageStore for each individual pixel.
The "faster way" would be to clear the texture from OpenGL, rather than in your shader. 4.4 provides a direct texture clearing function, but even something as simple as a pixel transfer via glTexSubImage2D (after a barrier of course) would probably be faster than what you're doing.
Alternatively, if all you're using this texture for is scratch memory for invocations... why are you using a texture? It'd be better to use shared variables for that. Just create an array of arrays of vec4s, where each local invocation accesses one array of the arrays. Access to those are going to be loads faster.
Given 32KB of storage for shared variables (the bare minimum allowed), if you have 8 invocations per work group, that gives each one 4KB to work with. That gives each one 256 vec4s to play with. If you move up to 16 invocations, you reduce this to 128 vec4s.
I'm working a program which renders a dynamic high resolution voxel landscape.
Currently I am storing the voxel data in 32x32x32 blocks with 4 bits each:
struct MapData {
char data[32][32][16];
}
MapData *world = new MapData[(width >> 5) * (height >> 5) * (depth >> 5)];
What I'm trying to do with this, is send it to my vertex and fragment shaders for processing and rendering. There are several different methods I've seen to do this, but I have no idea which one will be best for this.
I started with a sampler1D format, but that results in floating point output between 0 and 1. I also had the hinting suspicion that it was storing it as 16 bits per voxel.
As for Uniform Buffer Objects I tried and failed to implement this.
My biggest concern with all of this is not having to send the whole map to the GPU every frame. I want to be able to load maps up to ~256MB (1024x2048x256 voxels) in size, so I need to be able to send it all once, and then resend only the blocks that were changed.
What is the best solution for this short of writing OpenCL to handle the video memory for me. If there's a better way to store my voxels that makes this easier, I'm open to other formats.
If you just want a large block of memory to access from in a shader, you can use a buffer texture. This obviously requires a semi-recent GL version (3.0 or better), so you need DX10 hardware or better.
The concept is pretty straightforward. You make a buffer object that stores your data. You create a buffer texture using the typical glGenTextures command, then glBindTexture it to the GL_TEXTURE_BUFFER target. Then you use glTexBuffer to associate your buffer object with the texture.
Now, you seem to want to use 4 bits per voxel. So your image format needs to be a single-channel, unsigned 8-bit integral format. Your glTexBuffer call should be something like this:
glTexBuffer(GL_TEXTURE_BUFFER, GL_RUI8, buffer);
where buffer is the buffer object that stores your voxel data.
Once this is done, you can change the contents of this buffer object using the usual mechanisms.
You bind the buffer texture for rendering just like any other texture.
You use a usamplerBuffer sampler type in your shader, because it's an unsigned integral buffer texture. You must use the texelFetch command to access data from it, which takes integer texture coordinates and ignores filtering. Which is of course exactly what you want.
Note that buffer textures do have size limits. However, the size limits are often some large percentage of video memory.
I have a question related to Buffer object performance. I have rendered a mesh using standard Vertex Arrays (not interleaved) and I wanted to change it to Buffer Object to get some performance boost. When I introduce buffers object I was in shock when I find out that using Buffers object lowers performance four times. I think that buffers should increase performance. Does it true? So, I think that I am doing something wrong...
I have render 3d tiled map and to reduce amount of needed memory I use only a single tile (vertices set) to render whole map. I change only texture coordinates and y value in vertex position for each tile of map. Buffers for position and texture coords are created with GL_DYNAMIC_DRAW parameter. The buffer for indices is created with GL_STATIC_DRAW because it doesn't change during map rendering. So, for each tile of map buffers are mapped and unmapped at least one time. Should I use only one buffer for texture coords and positions?
Thanks,
Try moving your vertex/texture coordinates with GL_MODELVIEW/GL_TEXTURE matrices, and leave buffer data alone (GL_STATIC_DRAW alone). e.g. if tile is of size 1x1, create rect (0, 0)-(1, 1) and set it's position in the world with glTranslate. Same with texture coordinates.
VBOs are not there to increase performance of drawing few quads. Their true power is seen when drawing meshes with thousands of polygons using shaders. If you don't need any forward compatibility with newer opengl versions, I see little use in using them to draw dynamically changing data.
If you need to update the buffer(s) each frame you should use GL_STREAM_DRAW (which hints that the buffer contents will likely be used only once) rather than GL_DYNAMIC_DRAW (which hints that they will be but used a couple of times before being updated).
As far as my experience goes, buffers created with GL_STREAM_DRAW will be treated similarly to plain ol' arrays, so you should expect about the same performance as for arrays when using it.
Also make sure that you call glMapBuffer with the access parameter set to GL_WRITE_ONLY, assuming you don't need to read the contents of the buffer. Otherwise, if the buffer is in video memory, it will have to be transferred from video memory to main memory and then back again (well, that's up to the driver really...) for each map call. Transferring to much data over the bus is a very real bottleneck that's quite easy to bump into.