Well i have a texture that is generated every frame and I was wondering the best way to render it in opengl. It is simply pixel data that is generated on the cpu in rgba8 (32-bit, 8 bit for each component) format, I simply need to transfer it to the gpu and draw it onto the screen. I remember there being some sort of pixel buffer or frame buffer that does this without having to generate a new texture every frame in association with glTexImage2d?
Pixel Buffer Objects do not change the fact that you need to call glTexImage2D (...) to (re-)allocate texture storage and copy your image. PBOs provide a means of asynchronous pixel transfer - basically making it so that a call to glTexImage2D (...) does not have to block until it finishes copying your memory from the client (CPU) to the server (GPU).
The only way this is really going to improve performance for you is if you map the memory in a PBO (Pixel Unpack Buffer) and write to that mapped memory every frame while you are computing the image on the CPU.
While that buffer is bound to GL_PIXEL_UNPACK_BUFFER, call glTexImage2D (...) with NULL for the data parameter and this will upload your texture using memory that is already owned by the server, so it avoids an immediate client->server copy. You might get a marginal improvement in performance by doing this, but do not expect anything huge. It depends on how much work you do between the time you map/unmap the buffer's memory and when you upload the buffer to your texture and use said texture.
Moreover, if you call glTexSubImage2D (...) every frame instead of allocating new texture image storage by calling glTexImage2D (...) (do not worry -- the old storage is reclaimed when no pending command is using it anymore) you may introduce a new source of synchronization overhead that could reduce your performance. What you are looking for here is known as buffer object streaming.
You are more likely to improve performance by using a pixel format that requires no conversion. Newer versions of GL (4.2+) let you query the optimal pixel transfer format using glGetInternalFormativ (...).
On a final, mostly pedantic note, glTexImage2D (...) does not generate textures. It allocates storage for their images and optionally transfers pixel data. Texture Objects (and OpenGL objects in general) are actually generated the first time they are bound (e.g. glBindTexture (...)). From that point on, glTexImage2D (...) merely manages the memory belonging to said texture object.
Related
I render into a texture via FBO. I want to copy the texture data into a PBO so I use glGetTexImage. I will use glMapBuffer on this PBO but only in the next frame (or later) so it should not cause a stall.
However, can I use the texture immediately after the glGetTexImage call without causing a stall? Can I bind it to a texture unit and render from it? Can I render to it again via FBO?
However, can I use the texture immediately after the glGetTexImage call without causing a stall?
That's implementation dependent behavior. It may or may not cause a stall, depending on how the implementation does actual data transfers.
Can I bind it to a texture unit and render from it?
Yes.
Can I render to it again via FBO?
Yes. This however might or might not cause a stall, depending on how the implementation internally deals with data consistency requirements. I.e. before modifying the data, the texture data either must be completely transferred into the PBO, or if the implementation can detect, that the entire thing will get altered (e.g. by issuing a glClear call that matches the attachment of the texture), it might simply orphan the internal data structure and start with a fresh memory region, avoiding that stall.
This is one of those corner cases which are nigh impossible to predict. You'll have to profile the performance and see for yourself. The deadsure way to avoid stalls is to use a fresh texture object.
I am creating a simple framework for teaching fundamental graphics concepts under C++/D3D11. The framework is required to enable direct manipulation of the screen raster contents via a simple interface function (e.g. Putpixel( x,y,r,g,b )).
Under D3D9 this was a relatively simple goal achieved by allocating a surface buffer on the heap where the CPU would compose a surface. Then the backbuffer would be locked and the heap buffer's contents transferred to the backbuffer. As I understand it, it is not possible to access the backbuffer directly from the CPU under D3D11. One must prepare a texture resource and then draw it to the backbuffer via some fullscreen geometry.
I have considered two systems for such a procedure. The first comprises a D3D11_USAGE_DEFAULT texture and a D3D11_USAGE_STAGING texture. The staging texture is first mapped and then drawn to from the CPU. When the scene is complete, the staging texture is unmapped and copied to the default texture with CopyResource (which uses the GPU to perform the copy if I am not mistaken), and then the default texture is drawn to the backbuffer via a fullscreen textured quad.
The second system comprises a D3D11_USAGE_DYNAMIC texture and a frame buffer allocated on the heap. When the scene is composed, the dynamic texture is mapped, the contents of the heap buffer are copied over to the dynamic texture via the CPU, the dynamic texture is unmapped, and then it is drawn to the backbuffer via a fullscreen textured quad.
I was under the impression that textures created with read and write access and D3D11_USAGE_STAGING would reside in system memory, but the performance tests I have run seem to indicate that this is not the case. Namely, drawing a simple 200x200 filled rectangle via CPU is about 3x slower with the staging texture than with the heap buffer (exact same disassembly for both cases (a tight rep stos loop)), strongly hinting that the staging texture resides in the graphics adapter memory.
I would prefer to use the staging texture system, since it would allow both the work of rendering to the backbuffer and the work of copying from system memory to graphics memory to be offloaded onto the GPU. However, I would like to prioritize CPU access speed over such an ability in any case.
So what method method would be optimal for this usage case? Any hints, modifications of my two approaches, or suggestions of altogether different approaches would be greatly appreciated.
The dynamic and staging are both likely to be in system memory, but their is good chance that your issue, is write combined memory. It is a cache mode where single writes are coalesced together, but if you attempt to read, because it is un-cached, each load pay the price of a full memory access. You even have to be very careful, because a c++ *data=something; may sometime also leads to unwanted reads.
There is nothing wrong with a dynamic texture, the GPU can read system memory, but you need to be careful, create a few of them, and cycle each frame with a map_nooverwrite, to inhibit the costly driver buffer renaming of the discard. Of course, never do a map in read and write, only write, or you will introduce gpu/cpu sync and kill the parallelism.
Last, if you want a persistent surface and only a few putpixel a frame (or even a lot of them), i would go with an unordered access view and a compute shader that consume a buffer of pixel position with colors to update. That buffer would be a dynamic buffer with nooverwrite mapping, once again. With that solution, the main surface will reside in video memory.
On a personal note, i would not even bother to teach cpu surface manipulation, this is almost always a bad practice and a performance killer, and not the way to go in a modern gpu architecture. This was not a fundamental graphic concept a decade ago already.
I have a working prototype that tests bindless textures. I have a camera that pans over 6 gigs of texture, while i only have 2 gigs of VRAM. I have an inner frustum that is used to get the list of objects in the viewport for rendering, and an outer frustum that is used to Queue in (make resident) the textures that will soon be rendered, all other textures, if they are resident, are made non resident using the function glMakeTextureHandleNonResident.
The program runs, but the VRAM of the gpu behaves as if it has a GC step where it clears VRAM at random intervals of time. When it does this, my rendering is completely frozen, but then skips to the proper frame, eventually getting back to up 60 FPS. Im curious that glMakeTextureHandleNonResident doesnt actually pull the texture out of VRAM "when" it is called. Does anyone know EXACTLY what the GPU is doing with that call?
GPU: Nvidia 750GT M
Bindless textures essentially expose a translation table on the hardware so that you can reference textures using an arbitrary integer (handle) in a shader rather than GL's traditional bind-to-image-unit mechanics; they don't allow you to directly control GPU memory residency.
Sparse textures actually sound more like what you want. Note that both of these things can be used together.
Making a handle non-resident does not necessarily evict the texture memory from VRAM, it just removes the handle from said translation table. Eviction of texture memory can be deferred until some future time, exactly as you have discovered.
You can read more about this in the extension specification for GL_ARB_bindless_texture.
void glMakeImageHandleResidentARB (GLuint64 handle, GLenum access):
"When an image handle is resident, the texture it references is not necessarily considered resident for the purposes of the AreTexturesResident command."
Issues:
(18) Texture and image handles may be made resident or non-resident. How
does handle residency interact with texture residency queries from
OpenGL 1.1 (glAreTexturesResident or GL_TEXTURE_RESIDENT)?
RESOLVED:
The residency state for texture and image handles in this
extension is completely independent from OpenGL 1.1's GL_TEXTURE_RESIDENT
query. Residency for texture handles is a function of whether the
glMakeTextureHandleResidentARB has been called for the handle. OpenGL 1.1
residency is typically a function of whether the texture data are
resident in GPU-accessible memory.
When a texture handle is not made resident, the texture that it refers
to may or may not be stored in GPU-accessible memory. The
GL_TEXTURE_RESIDENT query may return GL_TRUE in this case. However, it does
not guarantee that the texture handle may be used safely.
When a texture handle is made resident, the texture that it refers to is
also considered resident for the purposes of the old GL_TEXTURE_RESIDENT
query. When an image handle is resident, the texture that it refers to
may or may not be considered resident for the query -- the resident
image handle may refer only to a single layer of a single mipmap level
of the full texture.
Is there any way to attach a texture buffer object (ARB_texture_buffer_object) to a framebuffer (EXT_framebuffer_object), so that I can directly render into the texture buffer object?
I need this to make an exact, bit-wise copy of a multisample framebuffer (color buffer, depth buffer and stencil buffer), and have this copy reside in main memory rather than VRAM.
UPDATE:
The problem is that I cannot directly call glReadPixels on a multi sampled frame buffer, to copy its contents. Instead, I have to blit the multi sampled frame buffer to an intermediate frame buffer and then call glReadPixels on that. During this process, multiple samples are averaged and written to the intermediate buffer. There is now, of course, a loss in precision if I restore this buffer with glWritePixels.
I realize that I can use a multi sample texture as the backing storage for a frame buffer object, but this texture will reside in VRAM and there appears to be no way of copying it to main memory without the same loss of precision. Specifically, I am worried about a loss of precision pertinent to the multi sampled depth buffer attachment, rather than the color buffer.
Is there another way to make an exact copy (and restore this copy) of a multi sampled frame buffer in OpenGL?
TL;DR: How to copy the exact contents of a multi sample frame buffer (specifically, depth buffer) to main memory and restore those contents later, without a loss of precision.
OpenGL does not allow you to bind a buffer texture as a render target. However, I don't see what is stopping you from making "an exact, bit-wise copy of a multisample framebuffer". What problem are you encountering that you believe buffer textures can solve?
How to copy the exact contents of a multi sample frame buffer (specifically, depth buffer) to main memory and restore those contents later, without a loss of precision.
No.
And you don't need to copy the contents of an image to main memory to be able to save and restore it later. If you need to preserve the contents of a multisample image, simply blit it to another multisample image. You can blit it back to restore it. Or better yet, render to a multisample texture that you don't erase until you're done with it. That way, there's no need for any copying.
What is the difference between FBO and PBO? Which one should I use for off-screen rendering?
What is the difference between FBO and PBO?
A better question is how are they similar. The only thing that is similar about them is their names.
A Framebuffer Object (note the capitalization: framebuffer is one word, not two) is an object that contains multiple images which can be used as render targets.
A Pixel Buffer Object is:
A Buffer Object. FBOs are not buffer objects. Again: framebuffer is one word.
A buffer object that is used for asynchronous uploading/downloading of pixel data to/from images.
If you want to render to a texture or just a non-screen framebuffer, then you use FBOs. If you're trying to read pixel data back to your application asynchronously, or you're trying to transfer pixel data to OpenGL images asynchronously, then you use PBOs.
They're nothing alike.
A FBO (Framebuffer object) is a target where you can render images other than the default frame buffer or screen.
A PBO (Pixel Buffer Object) allows asynchronous transfers of pixel data to and from the device. This can be helpful to improve overall performance when rendering if you have other things that can be done while waiting for the pixel transfer.
I would read VBOs, PBOs and FBOs:
Apple has posted two very nice bits of
sample code demonstrating PBOs and
FBOs. Even though these are
Mac-specific, as sample code they're
good on any platoform because PBOs and
FBOs are OpenGL extensions, not
windowing system extensions.
So what are all these objects? Here's
the situation:
I want to highlight something.
FBO it not memory block. I think it look like struct of pointer. You Must attach Texture to FBO to use it. After attach Texture you now can draw to it for offscreen render or for second pass effect.
struct FBO{
AttachColor0 *ptr0;
AttachColor1 *ptr1;
AttachColor2 *ptr2;
AttachDepth *ptr3;
};
In the other hand, PBO is memory block "block to hold type of memory. "Try to think of it as malloc of x size, then you can use memcpy to copy data from it to texture/FBO or to it".
Why to use PBO?
Create intermediate memory buffer to interface with Host memory and not stop OpenGL drawing will upload texture to or from host.