This is for realtime graphics.
Let's say that there is a single mesh that we are rendering. We place a 1k (1024x1024) texture on it and it renders fine. Now let's say that we place a 4k texture on it but render only a 1k section of the texture by using different UVs on the same mesh.
Now both times, the visible surface has 1k texture on it. But one comes from 1k texture map the other from 4k texture map. Would there be a difference in performance, not counting increased VRAM usage from 4k map.
For all intents and purposes, no, there will be no difference.
By restricting the UVs to the top left 1024x1024 you'll be pulling in the same amount of texture data as if the texture were 1024x1024 and you read the entire thing. The number of texture samples remains the same as well.
It's impossible to rule it out completely of course without having low-level knowledge of every GPU past, present and future, but you should assume the performance will be the same.
Related
I'm making an isometric (2D) game with SFML. I handle the drawing order (depth) by sorting all drawables by their Y position and it works perfectly well.
The game uses an enourmous amount of art assets, such that the npcs, monsters and player graphics alone are contained in their own 4k texture atlas. It is logistically not possible for me to put everything into one atlas. The target devices would not be able to handle textures of that size. Please do not focus on WHY it's impossible, and understand that I simply MUST use seperate files for my textures in this case.
This causes a problem. Let's say I have a level with 2 npcs and 2 pillars. The npcs are in NPCs.png and the pillars are in CastleLevel.png. Depending on where the npcs move, the drawing order (hence the opengly texture binding order) can be different. Let's say the Y positions are sorted like this:
npc1, pillar1, npc2, pillar 2
This would mean that opengl has to switch between the 2 textures twice. My question is, should I:
a) keep the texture atlasses OR
b) divide them all into smaller png files (1 png per npc, 1 png per pillar etc). Since the textures must be changed multiple times anyway, would it improve performance if opengl had to bind smaller textures instead?
Is it worth keeping the texture atlasses because it will SOMETIMES reduce the number of draw calls?
Since the textures must be changed multiple times anyway, would it improve performance if opengl had to bind smaller textures instead?
Almost certainly not. The cost of a texture bind is fixed; it isn't based on the texture's size.
It would be better for you to either:
Properly batch your rendering. That is, when you say "draw NPC1", you don't actually draw it yet. You stick some data in an array, and later on, you execute "draw NPCs", which draws all of the NPCs you've buffered in one go.
Use a bigger texture atlas, probably involving array textures. Each layer of the array texture would be one of the atlases you load. This way, you only ever bind one texture to render your scene.
Deal with it. 2D games aren't exactly stressful on the GPU or CPU. The overhead from the additional state changes will not be what knocks you down from 60FPS to 30FPS.
Question:
Why does the same amount of pixels take dramatically less video memory if stored in a square texture than in a long rectangular texture?
Example:
I'm creating 360 4x16384 size textures with the glTexImage2D command. Internal format is GL_RGBA. Video memory: 1328 MB.
If I'm creating 360 256x256 textures with the same data, the memory usage is less than 100MB.
Using an integrated Intel HD4000 GPU.
It's not about the texture being rectangular. It's about one of the dimensions being extremely small.
In order to select texels from textures in an optimal fashion, hardware will employ what's known as swizzling. The general idea is that it will restructure the bytes in the texture so that pixels that neighbor each other in 2 dimensions will be neighbors in memory too. But doing this requires that the texture be of a certain minimum size in both dimensions.
Now, the texture filtering hardware can ignore this minimum size and only fetch from pixels within the texture's actual size is. But that extra storage is still there, taking up space to no useful purpose.
Given what you're seeing, there's a good chance that Intel's swizzling hardware has a base minimum size of 32 or 64 pixels.
In OpenGL, there's not much you can do to detect this incongruity other than what you've done here.
I have a working prototype that tests bindless textures. I have a camera that pans over 6 gigs of texture, while i only have 2 gigs of VRAM. I have an inner frustum that is used to get the list of objects in the viewport for rendering, and an outer frustum that is used to Queue in (make resident) the textures that will soon be rendered, all other textures, if they are resident, are made non resident using the function glMakeTextureHandleNonResident.
The program runs, but the VRAM of the gpu behaves as if it has a GC step where it clears VRAM at random intervals of time. When it does this, my rendering is completely frozen, but then skips to the proper frame, eventually getting back to up 60 FPS. Im curious that glMakeTextureHandleNonResident doesnt actually pull the texture out of VRAM "when" it is called. Does anyone know EXACTLY what the GPU is doing with that call?
GPU: Nvidia 750GT M
Bindless textures essentially expose a translation table on the hardware so that you can reference textures using an arbitrary integer (handle) in a shader rather than GL's traditional bind-to-image-unit mechanics; they don't allow you to directly control GPU memory residency.
Sparse textures actually sound more like what you want. Note that both of these things can be used together.
Making a handle non-resident does not necessarily evict the texture memory from VRAM, it just removes the handle from said translation table. Eviction of texture memory can be deferred until some future time, exactly as you have discovered.
You can read more about this in the extension specification for GL_ARB_bindless_texture.
void glMakeImageHandleResidentARB (GLuint64 handle, GLenum access):
"When an image handle is resident, the texture it references is not necessarily considered resident for the purposes of the AreTexturesResident command."
Issues:
(18) Texture and image handles may be made resident or non-resident. How
does handle residency interact with texture residency queries from
OpenGL 1.1 (glAreTexturesResident or GL_TEXTURE_RESIDENT)?
RESOLVED:
The residency state for texture and image handles in this
extension is completely independent from OpenGL 1.1's GL_TEXTURE_RESIDENT
query. Residency for texture handles is a function of whether the
glMakeTextureHandleResidentARB has been called for the handle. OpenGL 1.1
residency is typically a function of whether the texture data are
resident in GPU-accessible memory.
When a texture handle is not made resident, the texture that it refers
to may or may not be stored in GPU-accessible memory. The
GL_TEXTURE_RESIDENT query may return GL_TRUE in this case. However, it does
not guarantee that the texture handle may be used safely.
When a texture handle is made resident, the texture that it refers to is
also considered resident for the purposes of the old GL_TEXTURE_RESIDENT
query. When an image handle is resident, the texture that it refers to
may or may not be considered resident for the query -- the resident
image handle may refer only to a single layer of a single mipmap level
of the full texture.
I am using GLScene to view images on planes.
I have a hard time figuring out why a 1.2 megabyte photo uses over 50mb of memory
when loaded to a texture which is applied to a plane.
I have tried setting the Texture Compression to tcHighSpeed since render quality
isn't that important to me, no difference as far as I can see.
When adding an "empty" plane with no texture set to it. it uses 1 to 2 megabytes of memory.
Is this an OpenGL thing or is GLScene very innefficient when it comes to memory management?
I am looking into using a VBO instead of immediate mode for performance reasons. I am creating a 2D orthographic scene filled with sprites. I do not want to draw sprites that are off-screen. I do this by checking their position against the screen size and position of the camera.
In immediate mode this is simple; there is draw method for each sprite. Using a VBO this seems non-trivial; I render an entire section of a VBO at one time. There would be no way for me (that I can think of) to elect out of rendering sprites that are off-screen.
I'll just assume that you do indeed animate the sprites on the CPU, because that's the only thing that makes sense in the light of your question (otherwise, how would you draw them in immediate mode initially, and how would you skip drawing some).
AGP/PCIe behaves much like a harddisk from a performance point of view. Bandwidth is huge, but access time is quite noticeable. In other words, doing a transfer at all is painful, but once you do it, a few kilobytes more don't really make any difference. Uploading 500 sprites and uploading 1000 sprites is the same thing.
Since you animate the sprites on the CPU, you already must do one transfer (glBufferSubData or glMapBuffer/glUnmapBuffer) every frame, there is no other way.
Be sure to use a "fresh" buffer e.g. by applying the glBufferData(null) idiom. This avoids pipeline stalls by allowing OpenGL to continue using (drawing from) the buffer while giving you a different buffer (without you knowing) at the same time. Later when it is done drawing, it just secretly flips buffers and throws the old one away. That way, you achieve good parallelism (which is key to performance and much more important than culling a few thousand vertices).
Also, graphics cards are reasonably good at culling geometry (this includes discarding entire triangles that are off-screen before fragments are generated). Hundreds? Thousands? Hundred thousands? No issue. Let the graphics card do it.
Unless you have a million sprites of which one half is visible at a time and the other half isn't, it is not unlikely that writing the entire buffer continuously and without branches is not only just as fast, but even faster due to cache and pipeline effects.