I am using GLScene to view images on planes.
I have a hard time figuring out why a 1.2 megabyte photo uses over 50mb of memory
when loaded to a texture which is applied to a plane.
I have tried setting the Texture Compression to tcHighSpeed since render quality
isn't that important to me, no difference as far as I can see.
When adding an "empty" plane with no texture set to it. it uses 1 to 2 megabytes of memory.
Is this an OpenGL thing or is GLScene very innefficient when it comes to memory management?
Related
Question:
Why does the same amount of pixels take dramatically less video memory if stored in a square texture than in a long rectangular texture?
Example:
I'm creating 360 4x16384 size textures with the glTexImage2D command. Internal format is GL_RGBA. Video memory: 1328 MB.
If I'm creating 360 256x256 textures with the same data, the memory usage is less than 100MB.
Using an integrated Intel HD4000 GPU.
It's not about the texture being rectangular. It's about one of the dimensions being extremely small.
In order to select texels from textures in an optimal fashion, hardware will employ what's known as swizzling. The general idea is that it will restructure the bytes in the texture so that pixels that neighbor each other in 2 dimensions will be neighbors in memory too. But doing this requires that the texture be of a certain minimum size in both dimensions.
Now, the texture filtering hardware can ignore this minimum size and only fetch from pixels within the texture's actual size is. But that extra storage is still there, taking up space to no useful purpose.
Given what you're seeing, there's a good chance that Intel's swizzling hardware has a base minimum size of 32 or 64 pixels.
In OpenGL, there's not much you can do to detect this incongruity other than what you've done here.
I am creating a simple framework for teaching fundamental graphics concepts under C++/D3D11. The framework is required to enable direct manipulation of the screen raster contents via a simple interface function (e.g. Putpixel( x,y,r,g,b )).
Under D3D9 this was a relatively simple goal achieved by allocating a surface buffer on the heap where the CPU would compose a surface. Then the backbuffer would be locked and the heap buffer's contents transferred to the backbuffer. As I understand it, it is not possible to access the backbuffer directly from the CPU under D3D11. One must prepare a texture resource and then draw it to the backbuffer via some fullscreen geometry.
I have considered two systems for such a procedure. The first comprises a D3D11_USAGE_DEFAULT texture and a D3D11_USAGE_STAGING texture. The staging texture is first mapped and then drawn to from the CPU. When the scene is complete, the staging texture is unmapped and copied to the default texture with CopyResource (which uses the GPU to perform the copy if I am not mistaken), and then the default texture is drawn to the backbuffer via a fullscreen textured quad.
The second system comprises a D3D11_USAGE_DYNAMIC texture and a frame buffer allocated on the heap. When the scene is composed, the dynamic texture is mapped, the contents of the heap buffer are copied over to the dynamic texture via the CPU, the dynamic texture is unmapped, and then it is drawn to the backbuffer via a fullscreen textured quad.
I was under the impression that textures created with read and write access and D3D11_USAGE_STAGING would reside in system memory, but the performance tests I have run seem to indicate that this is not the case. Namely, drawing a simple 200x200 filled rectangle via CPU is about 3x slower with the staging texture than with the heap buffer (exact same disassembly for both cases (a tight rep stos loop)), strongly hinting that the staging texture resides in the graphics adapter memory.
I would prefer to use the staging texture system, since it would allow both the work of rendering to the backbuffer and the work of copying from system memory to graphics memory to be offloaded onto the GPU. However, I would like to prioritize CPU access speed over such an ability in any case.
So what method method would be optimal for this usage case? Any hints, modifications of my two approaches, or suggestions of altogether different approaches would be greatly appreciated.
The dynamic and staging are both likely to be in system memory, but their is good chance that your issue, is write combined memory. It is a cache mode where single writes are coalesced together, but if you attempt to read, because it is un-cached, each load pay the price of a full memory access. You even have to be very careful, because a c++ *data=something; may sometime also leads to unwanted reads.
There is nothing wrong with a dynamic texture, the GPU can read system memory, but you need to be careful, create a few of them, and cycle each frame with a map_nooverwrite, to inhibit the costly driver buffer renaming of the discard. Of course, never do a map in read and write, only write, or you will introduce gpu/cpu sync and kill the parallelism.
Last, if you want a persistent surface and only a few putpixel a frame (or even a lot of them), i would go with an unordered access view and a compute shader that consume a buffer of pixel position with colors to update. That buffer would be a dynamic buffer with nooverwrite mapping, once again. With that solution, the main surface will reside in video memory.
On a personal note, i would not even bother to teach cpu surface manipulation, this is almost always a bad practice and a performance killer, and not the way to go in a modern gpu architecture. This was not a fundamental graphic concept a decade ago already.
This is for realtime graphics.
Let's say that there is a single mesh that we are rendering. We place a 1k (1024x1024) texture on it and it renders fine. Now let's say that we place a 4k texture on it but render only a 1k section of the texture by using different UVs on the same mesh.
Now both times, the visible surface has 1k texture on it. But one comes from 1k texture map the other from 4k texture map. Would there be a difference in performance, not counting increased VRAM usage from 4k map.
For all intents and purposes, no, there will be no difference.
By restricting the UVs to the top left 1024x1024 you'll be pulling in the same amount of texture data as if the texture were 1024x1024 and you read the entire thing. The number of texture samples remains the same as well.
It's impossible to rule it out completely of course without having low-level knowledge of every GPU past, present and future, but you should assume the performance will be the same.
What does gl.glTexImage2D do? The docs say it "uploads texture data". But does this mean the whole image is in GPU memory? I'd like to use one large image file for texture mapping. Further: can I simply use a VBO for uv and position coordinates to draw the texture?
Right, I am using words the wrong way here. What I meant was carrying a 2D array of UV coordinates and a 2D array of model to subsample a larger PNG image (in texture memory) onto individual tile models. My confusion here lies in not knowing how fast these fetches can take. Lets say I have a 5000x5000 pixel image. I load it as a texture. Then I create my own algorithm for fetching portions of it to draw. Where do I save myself the bandwidth for drawing these tiles? If I implement an LOD algorithm to determine which tiles are close, which are far and which are out of the camera frustum how do manage each these tiles in memory? Loaded question I know but I am struggling to find the best implementation to get started. I am developing for mobile devices with OpenGL ES 2.0.
What exactly happens when you call glTexImage2D() is system dependent, and there's no way for you to know, unless you have developer tools that allow you to track GPU and memory usage.
The only thing guaranteed is that the data you pass to the call has been consumed by the time the call returns (since the API definition allows you to modify/free the data after the call), and that the data is accessible to the GPU when it's used for rendering. Between that, anything is fair game. Keep in mind that OpenGL is a very asynchronous API. When you make API calls, the corresponding work is mostly queued up for later execution by the GPU, and is generally not completed by the time the calls return. This can include calls for uploading data.
Also, not all GPUs have "GPU memory". In fact, if you look at them by quantity, very few of them do. Mobile GPUs have caches, but mostly not VRAM in the sense of traditional discrete GPUs. How VRAM and caches are managed is highly system dependent.
With all the caveats above, and picturing a GPU that has VRAM: While it's possible that they can load the data into VRAM in the glTexImage2D() call, I would be surprised if that was commonly done. It just wouldn't make much sense to me. When a texture is loaded, you have no idea how soon it will be used for rendering. Since you don't know if all textures will fit in VRAM (and they often will not), you might have to evict it from VRAM before it was ever used. Which would obviously be very wasteful. As a general strategy, I think it will be much more efficient to load the texture data into VRAM only when you have a draw call that uses it.
Things would be somewhat different if the driver could be very confident that all texture data will fit in VRAM. But with OpenGL, there's really no reasonable way to know this ahead of time. And things get even more complicated since at least on desktop computers, you can have multiple applications running at the same time, while VRAM is a shared resource.
You are correct.
glteximage2d is the function that actually moves the texture data across to the gpu.
you will need to create the texture object first using glGenTextures() and then bind it using glBindTexture().
there is a good example of this process in the opengl redbook
example
you can then use this texture with a VBO. There are many ways to accomplish this, but interleaving your vertex coordinates, texture coordinates, and vertex normals and then telling the GPU how to unpack them with several calls to glVertexAttribPointer is the best bet as far as performance.
you are on the right track with VBOs, the old fixed pipeline GL stuff is depricated so you should just learn VBO from the outset.
this book is not 100% up to date, but it is complete and free and should serve as a great place to start learning VBO Open GL Book
I try to implement a Volume Renderer with OpenGL and ray casting. Everything works well but I get a performance problem when I look in a negative direction. This means if I look in positive x direction (viewing vector 1, 0, 0) is the performance ok. But if I look in negative x direction (-1, 0, 0) the framerate goes down to 2-3 fps.
I use a 3D texture to hold the data of im dataset. Is there maybe a problem with the caching of the texture on the GPU? Or what could be the problem that the framerate goes down, when I look in a negative direction?
It would be great if I get some tipps, what the problem could be.
There are two things to consider in this situation:
memory access pattern
and
texture data swapping
The performance of a GPU is strongly influenced by the pattern in which data is addressed in accessed from memory. A ray caster casts its ray from front to back of the view (or in the opposite direction, depending on the implementation and internal blending mode) so depending on from which side you look at a 3D texture you get completely different access patterns. And that can have a very large influence.
3D textures consume very large amounts of memory. OpenGL uses an abstract memory model, where there's no explicit limit on the size of objects like textures. Either loading them works or not. And if a driver can manage it, you can load textures larger than what fits into GPU memory. Effectively the GPU memory is a cache for OpenGL data. And if your 3D texture happens to be larger than the available GPU memory it might be, that the OpenGL implementation swaps out texture data while its being accessed at rendering. If your access pattern is lucky, this swapping processed nicely fits "into the gaps" and can sort of stream the texture data into the cache (by which I mean GPU memory) right before its required, without stalling the rendering process. However a different access pattern (other view) will not work well with on demand swapping of data, trashing performance.
I suggest you reduce the size of your 3D texture.