OpenGL uses power-of-two textures.
This is because some GPUs only accept power-of-two textures due to MipMapping. Using these power-of-two textures causes problems when drawing a texture larger than it is.
I had thought of one way to workaround this, which is to only use the PO2 ratios when we're making the texture smaller than it actually is, and using a 1:1 ratio when we're making it bigger, but will this create compatibility issues with some GPUs?
If anybody knows whether issues would occur (I cannot check this as my GPU accepts NPO2 Textures), or a better workaround, I would be grateful.
Your information is outdated. Arbitrary dimension textures are supported since OpenGL-2, which has been released in 2004. All contemporary GPUs do support NPOT2 textures very well, and without any significant performance penality.
There's no need for any workarounds.
Related
Since imageAtomicAdd (which seems to be the only real atomic "read-modify-store" function that operates on images) is only available for 32bit integers, I don't see any sensible way to accumulate multiple color values from different shader invocations in one pixel.
The only somewhat reasonable way to do this that I can see is to use 32bit per color (128bit per RGBA pixel), add 8bit color values up, hope that it doesn't overflow and clamp to 8bit afterwards.
This seems wasteful and restrictive (only pure additive blending?)
Accumulating in other data structures also doesn't solve the issue, since shared variables and ssbos also only seem to support atomicAdd and also only on integers.
There are two reasons that make me think I am probably missing something:
1. Every pathtracer that allows for concurrent intersection testing (for example for shadow rays) has to solve this issue so it seems like there must be a solution.
2. All kinds of fancy blending can be done in fragment shaders, so the hardware is definitely capable of doing this.
Is everyone just writing pathtracers that have a 1:1 shader invocation:pixel mapping?
The naive interpretation of multisampling would imply that, for instance, 8x MSAA would require a framebuffer that takes 8 times the space of a non-multisampled framebuffer, for all the duplicated samples. Since the latest video cards support even 32x MSAA, that would mean that just the color buffer of a 1600x1200 output would use 1600·1200·4·32 = ~245 MB.
Is this actually the case? I mean, I realize that potential memory optimizations are likely to be implementation-dependent, but is there any information on this? Should I be extremely conscious of, for instance, allocating multisampled textures? (This is my main question.)
I'm asking in the context of OpenGL, but I don't reckon this would be different between DirectX and OpenGL.
I have been hearing controversial opinions on whether it is safe to use non-power-of two textures in OpenGL applications. Some say all modern hardware supports NPOT textures perfectly, others say it doesn't or there is a big performance hit.
The reason I'm asking is because I want to render something to a frame buffer the size of the screen (which may not be a power of two) and use it as a texture. I want to understand what is going to happen to performance and portability in this case.
Arbitrary texture sizes have been specified as core part of OpenGL ever since OpenGL-2, which was a long time ago (2004). All GPUs designed every since do support NP2 textures just fine. The only question is how good the performance is.
However ever since GPUs got programmable any optimization based on the predictable patterns of fixed function texture gather access became sort of obsolete and GPUs now have caches optimized for general data locality and performance is not much of an issue now either. In fact, with P2 textures you may need to upscale the data to match the format, which increases the required memory bandwidth. However memory bandwidth is the #1 bottleneck of modern GPUs. So using a slightly smaller NP2 texture may actually improve performance.
In short: You can use NP2 textures safely and performance is not much of a big issue either.
All modern APIs (except some versions of OpenGL ES, I believe) on modern graphics hardware (the last 10 or so generations from ATi/AMD/nVidia and the last couple from Intel) support NP2 texture just fine. They've been in use, particularly for post-processing, for quite some time.
However, that's not to say they're as convenient as power-of-2 textures. One major case is memory packing; drivers can often pack textures into memory far better when they are powers of two. If you look at a texture with mipmaps, the base and all mips can be packed into an area 150% the original width and 100% the original height. It's also possible that certain texture sizes will line up memory pages with stride (texture row size, in bytes), which would provide an optimal memory access situation. NP2 makes this sort of optimization harder to perform, and so memory usage and addressing may be a hair less efficient. Whether you'll notice any effect is very much driver and application-dependent.
Offscreen effects are perhaps the most common usecase for NP2 textures, especially screen-sized textures. Almost every game on the market now that performs any kind of post-processing or deferred rendering has 1-15 offscreen buffers, many of which are the same size as the screen (for some effects, half or quarter-size are useful). These are generally well-supported, even with mipmaps.
Because NP2 textures are widely supported and almost a sure bet on desktops and consoles, using them should work just fine. If you're worried about platforms or hardware where they may not be supported, easy fallbacks include using the nearest power-of-2 size (may cause slightly lower quality, but will work) or dropping the effect entirely (with obvious consquences).
I have a lot of experience in making games (+4 years) and using texture atlases for iOS & Android though cross platform development using OpenGL 2.0
Stick with PoT textures with a maximum size of 2048x2048 because some devices (especially the cheap ones with cheap hardware) still don't support dynamic texture sizes, i know this from real life testers and seeing it first hand. There are so many devices out there now, you never know what sort of GPU you'll be facing.
You're iOS devices will also show black squares and artefacts if you are not using PoT textures.
Just a tip.
Even if arbitrary texture size is required by OpenGL X certain videocards are still not fully compliant with OpenGL. I had a friend with a IntelCard having problems with NPOT2 textures (I assume now Intel Cards are fully compliant).
Do you have any reason for using NPOT2 Textures? than do it, but remember that maybe some old hardware don't support them and you'll probably need some software fallback that can make your textures POT2.
Don't you have any reason for using NPOT2 Textures? then just use POT2 Textures. (certain compressed formats still requires POT2 textures)
I'm building an OpenGL app with many small textures. I estimate that I will have a few hundred
textures on the screen at any given moment.
Can anyone recommend best practices for storing all these textures in memory so as to avoid potential performance issues?
I'm also interested in understanding how OpenGL manages textures. Will OpenGL try to store them into GPU memory? If so, how much GPU memory can I count on? If not, how often does OpenGL pass the textures from application memory to the GPU, and should I be worried about latency when this happens?
I'm working with OpenGL 3.3. I intend to use only modern features, i.e. no immediate mode stuff.
If you have a large number of small textures, you would be best off combining them into a single large texture with each of the small textures occupying known sub-regions (a technique sometimes called a "texture atlas"). Switching which texture is bound can be expensive, in that it will limit how much of your drawing you can batch together. By combining into one you can minimize the number of times you have to rebind. Alternatively, if your textures are very similarly sized, you might look into using an array texture (introduction here).
OpenGL does try to store your textures in GPU memory insofar as possible, but I do not believe that it is guaranteed to actually reside on the graphics card.
The amount of GPU memory you have available will be dependent on the hardware you run on and the other demands on the system at the time you run. What exactly "GPU memory" means will vary across machines, it can be discrete and used only be the GPU, shared with main memory, or some combination of the two.
Assuming your application is not constantly modifying the textures you should not need to be particularly concerned about latency issues. You will provide OpenGL with the textures once and from that point forward it will manage their location in memory. Assuming you don't need more texture data than can easily fit in GPU memory every frame, it shouldn't be cause for concern. If you do need to use a large amount of texture data, try to ensure that you batch all use of a certain texture together to minimize the number of round trips the data has to make. You can also look into the built-in texture compression facilities, supplying something like GL_COMPRESSED_RGBA to your call to glTexImage2D, see the man page for more details.
Of course, as always, your best bet will be to test these things yourself in a situation close to your expected use case. OpenGL provides a good number of guarantees, but much will vary depending on the particular implementation.
Can you explain me, why hardware acceleration required for a long time textures be power of two? For PCs, since GeForce 6 we achieved npot textures with no-mips and simplified filtering. OpenGL ES 2.0 also supports npot textures without mipmaps and etc. What is the hardware restriction for this? Just simplified arithmetics?
I imagine it has to do with being able to use bitwise shift-left operations, rather than multiplication to convert an (x, y) coordinate to a memory offset from the start of the texture. So yes, simplified arithmetics, from a processor point of view.
I'm guessing that it was to make mipmap generation easier, because it allows you to just average 2x2 pixels into one pixel all the way from NxN down to 1x1.
Now that doesn't matter if you're not using mipmapping, but it's easier to have just one rule, and I think that mipmapping was the more common use case.