I am confused as to how OpenGL stores single component textures(like GL_RED).
The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha.
Does this mean that my texture will take 32 bpp in graphic memory even though I only give 8 bpp?
Also I would like to know how OpenGL converts bytes to float for operations in the shader. It doesn't seem logical to simply divide by 255..
You don't know, and you have no way of knowing (ok ok, I kind of lied... there exists documentation which tells you those details for some particular hardware. But in general you have no way of knowing, because you don't know in advance what hardware your program will run on).
OpenGL stores textures somewhat following your request, but it finally chooses something that the hardware supports. If that means that it converts your input data to something completely different, it does that silently.
For example, most implementations convert RGB to RGBA because that's more convenient for memory accesses. The same goes for 5-5-5 data being converted to 8-8-8 and similar.
Usually, a 8 bpp texture will take only 1 byte per pixel nowadays (since pretty much every card supports that, and for software implementations it does not matter), though this is not something you can 100% rely on. You should not worry either, though... it will make sure that it somehow works.
Similar can happen with non-power-of-two textures too, by the way. On all modern versions of OpenGL, this is supported (beginning with 2.0 if I remember right). Though, at least in theory, some older cards might not support this feature.
In that case, OpenGL would just silently make the texture the next bigger power-of-two size and only use a part of it (without telling you!).
Related
Since imageAtomicAdd (which seems to be the only real atomic "read-modify-store" function that operates on images) is only available for 32bit integers, I don't see any sensible way to accumulate multiple color values from different shader invocations in one pixel.
The only somewhat reasonable way to do this that I can see is to use 32bit per color (128bit per RGBA pixel), add 8bit color values up, hope that it doesn't overflow and clamp to 8bit afterwards.
This seems wasteful and restrictive (only pure additive blending?)
Accumulating in other data structures also doesn't solve the issue, since shared variables and ssbos also only seem to support atomicAdd and also only on integers.
There are two reasons that make me think I am probably missing something:
1. Every pathtracer that allows for concurrent intersection testing (for example for shadow rays) has to solve this issue so it seems like there must be a solution.
2. All kinds of fancy blending can be done in fragment shaders, so the hardware is definitely capable of doing this.
Is everyone just writing pathtracers that have a 1:1 shader invocation:pixel mapping?
As title said, I have a dynamic texture (which is updated in every frame) from a RGB565 color buffer, I don't know which way will have better performance:
Creating a texture with RGB565 format and upload RGB565 color buffer to GPU in every frame.
Creating a texture with RGBA8888 format and convert RGB565 color buffer to RGBA8888 before upload to GPU.
I think if OpenGL/DirectX converts other formats to RGBA8888 internally, then the creating RGBA8888 texture and convert data myself before upload to GPU way may be faster.
Don't know which one is more performant?
Benchmark it.
That being said, 5-6-5 mode is this weird for a reason - it's exactly 16 bits. GPUs typically support all of those in hardware, so if a format is present, you can assume the hardware instructions for handling it are there.
It also may depends on the global workload you put on your gpu, and on the gpu characteristics : Putting 565 texture onto video memory and reading back from this texture in a shader will consume half the memory bandwidth of the 888 counterpart, but it might (and not for sure) consume a bit more processing power.
So benchmark it, if possible on multiple configurations :)
I doubt that converting the RGB565 data to RGBA8888 yourself would ever be faster.
First of all, RGB565 is a format that's pretty widely used, and there is a high likelihood that your hardware supports it directly. If the precision is high enough for your use case, it will use half the memory of RGBA8888, and most likely be at least as efficient, due to the reduced memory bandwidth and correspondingly higher cache hit rates.
Even if the hardware does not support it, I still don't think converting it to RGBA8888 yourself will be more efficient. Any driver worth its money will have highly optimized code for format conversion. And even more importantly, it might be able to apply the format conversion during a data copy it will have to make anyway, which avoids one copy of the data compared to your code doing the conversion.
I'm currently doing some GPGPU on my GPU. I've written a shader that performs all the calculations I want it to do and this gives the right results. However, the engine I'm using (Unity), requires me to use a slow and cumbersome way to load the values from the GPU to the CPU, which is also memory-inefficient and loses precision. In short, it works, but it also sucks.
However, Unity also gives me the option to retrieve the texture's ID (openGL specific ?), or the texture's pointer (not platform specific apparently), after which I can write a DLL in native code (c++), to get the data from the GPU to the CPU. On the GPU it's a texture in RGBAFloat (so 4 floats per pixel, but I could easily change this to just 1 float per pixel if that would be necessary), and on the CPU I just want a two-dimensional array of floats. It seems to me that this would be pretty trivial, yet I can't seem to find useful information.
Does anyone have any ideas how I can retrieve the floats in the texture using the pointer, and let C++ output it as an array of floats?
Please ask for clarification if needed.
I have been hearing controversial opinions on whether it is safe to use non-power-of two textures in OpenGL applications. Some say all modern hardware supports NPOT textures perfectly, others say it doesn't or there is a big performance hit.
The reason I'm asking is because I want to render something to a frame buffer the size of the screen (which may not be a power of two) and use it as a texture. I want to understand what is going to happen to performance and portability in this case.
Arbitrary texture sizes have been specified as core part of OpenGL ever since OpenGL-2, which was a long time ago (2004). All GPUs designed every since do support NP2 textures just fine. The only question is how good the performance is.
However ever since GPUs got programmable any optimization based on the predictable patterns of fixed function texture gather access became sort of obsolete and GPUs now have caches optimized for general data locality and performance is not much of an issue now either. In fact, with P2 textures you may need to upscale the data to match the format, which increases the required memory bandwidth. However memory bandwidth is the #1 bottleneck of modern GPUs. So using a slightly smaller NP2 texture may actually improve performance.
In short: You can use NP2 textures safely and performance is not much of a big issue either.
All modern APIs (except some versions of OpenGL ES, I believe) on modern graphics hardware (the last 10 or so generations from ATi/AMD/nVidia and the last couple from Intel) support NP2 texture just fine. They've been in use, particularly for post-processing, for quite some time.
However, that's not to say they're as convenient as power-of-2 textures. One major case is memory packing; drivers can often pack textures into memory far better when they are powers of two. If you look at a texture with mipmaps, the base and all mips can be packed into an area 150% the original width and 100% the original height. It's also possible that certain texture sizes will line up memory pages with stride (texture row size, in bytes), which would provide an optimal memory access situation. NP2 makes this sort of optimization harder to perform, and so memory usage and addressing may be a hair less efficient. Whether you'll notice any effect is very much driver and application-dependent.
Offscreen effects are perhaps the most common usecase for NP2 textures, especially screen-sized textures. Almost every game on the market now that performs any kind of post-processing or deferred rendering has 1-15 offscreen buffers, many of which are the same size as the screen (for some effects, half or quarter-size are useful). These are generally well-supported, even with mipmaps.
Because NP2 textures are widely supported and almost a sure bet on desktops and consoles, using them should work just fine. If you're worried about platforms or hardware where they may not be supported, easy fallbacks include using the nearest power-of-2 size (may cause slightly lower quality, but will work) or dropping the effect entirely (with obvious consquences).
I have a lot of experience in making games (+4 years) and using texture atlases for iOS & Android though cross platform development using OpenGL 2.0
Stick with PoT textures with a maximum size of 2048x2048 because some devices (especially the cheap ones with cheap hardware) still don't support dynamic texture sizes, i know this from real life testers and seeing it first hand. There are so many devices out there now, you never know what sort of GPU you'll be facing.
You're iOS devices will also show black squares and artefacts if you are not using PoT textures.
Just a tip.
Even if arbitrary texture size is required by OpenGL X certain videocards are still not fully compliant with OpenGL. I had a friend with a IntelCard having problems with NPOT2 textures (I assume now Intel Cards are fully compliant).
Do you have any reason for using NPOT2 Textures? than do it, but remember that maybe some old hardware don't support them and you'll probably need some software fallback that can make your textures POT2.
Don't you have any reason for using NPOT2 Textures? then just use POT2 Textures. (certain compressed formats still requires POT2 textures)
Is there any way in OpenGL to load and read a 10 bit image? It doesn't have to be optimal efficiency on the GPU side. I just want to offload my CPU from converting everyting to 8bit before shuffling it to the GPU.
I noticed that the only 10 bit texture format supported is RGB10, which isn't what I'm looking for.
Vendor specific extensions are alright.
I just want to offload my CPU from converting everyting to 8bit before shuffling it to the GPU.
Well, that's not going to happen. The GPU never does format conversions (except maybe swizzling, but that's really part of the DMA). The CPU does format conversions, which is why it is so important to avoid format mismatches.
So even if OpenGL had a way to describe 10-bit single-channel data, you'd still be relying on the CPU to decode it into the format the GPU actually uses (ie: 8-bit). It just wouldn't be your code doing the conversion; it'd be driver code. Either way, it's eating CPU resources.
But that's irrelevant to your needs, since OpenGL does not have a way to upload 10-bit single-channel data. How do you even store that; the pixels aren't byte-aligned.
In general, you are advised to do this kind of conversion off-line where possible and store the data in the formats where it makes the most sense.