opengl tall texture vs wide texture for memory locality - opengl

A 2D Texture has two coordinates, x and y. To store a 2D array in 1D memory, the two possible formats are [x + y * width] and [x * height + y]. OpenGL has various confusing row-major/column-major conventions so I am unsure which of the two formats it uses. This is relevant because if a texture is used to store multiple images, such as in a sprite sheet or atlas, it is better to have the parts of an image located close together in memory. For example, if the format is [x + y * width] and we are using a very wide texture, then the GPU will have to skip through long parts of memory to find the texels it needs.
Thus: is a tall texture atlas superior to a wide texture atlas, or is it the other way around? Or do GPUs have no memory locality benefits?

The most important aspect of a texture atlas is how many images can fit inside it. Even when it comes to texture atlases, you are far more likely to access adjacent texels than distant ones.
Think about it. Say you render 2 32x32 sprites. So that's 2 quads, in a single rendering call. Each quad will take up 32x32 pixels on the screen; that's 1024 pixels.
Locality matters; you're rendering from 1024 locally adjacent texels, then rendering from a different set of 1024 locally adjacent texels.
In any case, OpenGL does not expose you to the details of the GPU's image formats. You can ask for a particular size of texel and a number of channels. But you don't get any more details than that. The data you provide will be appropriately converted by the driver into the actual internal GPU data.
Typically, GPUs will swizzle textures in memory. This means rearranging data so that locality is preserved. That is, instead of storing texels as either x + y * width or x * height + y, they get stored in a more complex arrangement.
For example, the first 4 values would be texels 0,0; 0,1; 1,0; and 1,1. So a 2x2 block of texels is store in a single contiguous array of memory. That's an example of how swizzled texture storage works.
But this is all an implementation detail; there's nothing you can do to influence or affect this, and not even a low-level API like Vulkan allows you to directly load pre-swizzled texel data.

Related

Create texture array [duplicate]

If I understand correctly, if I was to set TEXTURE_MIN_FILTER to NEAREST then there's not much difference between sampler2DArray/TEXTURE_2D_ARRAY and sampler3D/TEXTURE_3D
The differences seem to be
GenerateMipmap will blend cross layers with 3D textures but not 2D arrays
the Z coordinate passed to texture in GLSL is 0 to 1 with 3D textures but an 0 to N (depth) in 2D arrays.
If filtering is not NEAREST 3D will blend across layers, 2D array will not.
Correct?
Incorrect. There's one more difference: mipmap sizes.
In a 3D texture, the width, height, and depth all decrease at lower mipmap sizes. In a 2D array texture, every mipmap level has the same number of array layers; only width and height decrease.
It's not just a matter of blending and some texture coordinate oddities; the very size of the texture data is different. It is very much a different kind of texture, as different from 3D textures as 2D textures are from 1D textures.
This is also why you cannot create a view texture of a 3D texture that is a 2D array, or vice-versa.
Apart from the answer already given, there is another difference worth noting: The size limits are also quite different. A single layer of an array texture may be as big as an standard 2D texture, and there is an extra limit on the number of layers, while for 3D textures, there is a limit constraining the maximum size in all dimensions.
For example, OpenGL 4.5 guarantees the following minimal values:
GL_MAX_TEXTURE_SIZE 16384
GL_MAX_ARRAY_TEXTURE_LAYERS 2048
GL_MAX_3D_TEXTURE_SIZE 2048
So a 16384 x 16384 x 16 array texture is fine (and should also fit into memory for every GL 4.5 capable GPU found in the real world), while a 3D texture of the same dimensions would be unsupported on most of todays implementations (even though the complete mipmap pyramid would consume less memory in the 3D texture case).

"Interleaved rendering" in fragment shader

P.S. Yes, I posted this question on Computer Graphics Stack Exchange. But posting there also in hope more people will see
Intro
I'm trying to render multi-channel images (more than 4 channels, for the purposes of feeding it to a Neural Network). Since OpenGL doesn't support it natively, I have multiple 4-channel render buffers, into which I render a corresponding portion of channels.
For example, I need multi-channel image of size 512 x 512 x 16, in OpenGL I have 4 render buffers of size 512 x 512 x 4. Now the problem is that the Neural Network expects the data with strides 512 x 512 x 16, i.e. 16 values of channels of one pixel are followed by 16 values of channels from the next pixel. However currently I can efficiently read my 4 render buffers via 4 calls to glReadPixels, basically making the data having strides 4 x 512 x 512 x 4. Manual reordering of data on the client side will not suffice me as it's too slow.
Main question
I've got an idea to render to a single 4-channel render buffer of size 512*4 x 512 x 4, because stride-wise it's equivalent to 512 x 512 x 16, we just treat a combination of 4 pixels in a row as a single pixel of 16-channel output image. Let's call it an "interleaved rendering"
But this requires me to magically adjust my fragment shader, so that every group of consequent 4 fragments would have exactly the same interpolation of vertex attributes. Is there any way to do that?
This bad illustration with 1 render buffer of 1024 x 512 4-channel image, is an example of how it should be rendered. With that I can in 1 call glReadPixels extract the data with stride 512 x 512 x 8
EDIT: better pictures
What I have now (4 render buffers)
What I want to do natively in OpenGL (this image is done in Python offline)
But this requires me to magically adjust my fragment shader, so that every group of consequent 4 fragments would have exactly the same interpolation of vertex attributes.
No, it would require a bit more than that. You have to fundamentally change how rasterization works.
Rendering at 4x the width is rendering at 4x the width. That means stretching the resulting primitives, relative to a square area. But that's not the effect you want. You need the rasterizer to rasterize at the original resolution, then replicate the rasterization products.
That's not possible.
From the comments:
It just got to me, that I can try to get a 512 x 512 x 2 image of texture coordinates from vertex+fragment shaders, then stitch it with itself to make 4 times wider (thus we'll get the same interpolation) and from that form the final image
This is a good idea. You'll need to render whatever interpolated values you need to the original size texture, similar to how deferred rendering works. So it may be more than just 2 values. You could just store the gl_FragCoord.xy values, and then use them to compute whatever you need, but it's probably easier to store the interpolated values directly.
I would suggest doing a texelFetch when reading the texture, as you can specify exact integer texel coordinates. The integer coordinates you need can be computed from gl_FragCoord as follows:
ivec2 texCoords = ivec2(int(gl_FragCoord.x * 0.25f), int(gl_FragCoord.y));

Flip back textures loaded with SDL_image and used in OpenGL

I am using SDL2 to create a context for OpenGL. I use SDL_image to load the images, and I bind them to OpenGL textures. But because the coordinate system isn't the same the textures are flipped.
I found two ways to correct this:
Modify the texture after loading it
Advantage: Only done once per texture
Disadvantage: Done using the CPU which slows down the loading of each texture
Apply a rotation of 180° on the Y and Z axis when rendering
Advantage: Using super fast functions
Disadvantage: Needs to be done multiple times per frame
Is there another way to flip back the textures after they have been loaded with SDL_Image? And if not, which method is usually used?
There are a bunch of options. Some that come to mind:
Edit original assets
You can flip the image files upside down with an image processing tool, and use the flipped images as your assets. They will look upside down when viewed in an image viewer, but will then turn out correct when used as textures.
This is the ideal solution if you're in full control of the images. It obviously won't work if you get images from external sources at runtime.
Flip during image load
Some image loading libraries allow you to flip the image during loading. From the documentation of SOIL_image I could find, I did not see this option there. But you might be able to find an alternate library that supports it. And of course you can do this if you write your own image loading.
This is a good solution. The overhead is minimal sice you do the flipping while you're touching the data anyway. One common approach is that you read the data row by row, and store in the texture in the opposite order, using glTexSubImage2D().
Flip between loading and first use
You can create a flipped copy of the texture after you already loaded it. The typical way to do this would be by drawing a screen sized quad while sampling the original texture and rendering to an FBO that has the resulting flipped texture as a rendering target. Or, more elegant, use glBlitFramebuffer().
This is not very appealing because it involves copying the memory. While it should be quite efficient if you let the GPU create the copy, extra copying is always undesirable. Even if it happens only once for each texture, it can increase your startup/loading time.
Apply transformation to texture coordinates
You can apply a transformation to the texture coordinates in either the vertex or fragment shader. You're talking about rotations in your question, but the transformation you need is in fact trivial. You basically just map the y of the texture coordinate to 1.0 - y, and leave the x unchanged.
This adds a small price to shader execution. But the operation is very simple and fast compared to the texture sampling operation it goes along with. In reality, the added overhead is probably insignificant. While I don't think it's very pretty, it's a perfectly fine solution.
Invert the texture coordinates
This is similar to the previous option, but instead of inverting the texture coordinates in the shader, you already specify them inverted in the vertex attribute data.
This is often trivial to do. For example, it is very common to texture quads by using texture coordinates of (0, 0), (1, 0), (0, 1), (1, 1) for the 4 corners. Instead, you simply replace 0 with 1 and 1 with 0 in the second components of the texture coordinates.
Or say you load a model containing texture coordinates from a file. You simply replace each y in the texture coordinates by 1.0f - y during reading, and before storing away the texture coordinates for later rendering.
IMHO, this is often the best solution. It's very simple to do, and has basically no performance penalty.
I would disagree with most of the previous answer's point, except for flipping the image either on load, or before first use.
The reason being that if you are following data driven software development practices, you should never allow code to dictate the nature of data. The software should be designed to support the data accurately. Anything else is not fit for purpose.
Modifying texture coordinates is hackery, despite it's ease of use. What happens if you decide at some later stage, to use a different image library which doesn't flip the image? Now your image will be inverted again during rendering.
Instead, deal with the problem at the source, and flip the image during load or before first use (I advocate on load, as it can be integrated into the code that loads the image via SDL_Image, and therefore is more easily maintained).
To flip an image, I'll post some simple pseudo code which will illustrate how to do it:
function flip_image( char* bytes, int width, int height, int bytes_per_pixel):
char buffer[bytes_per_pixel*width]
for ( i = 0 -> height/2 ) loop
offset = bytes + bytes_per_pixel*width * i
copy row (offset -> offset + bytes_per_pixel*width) -> buffer
offset2 bytes + bytes_per_pixel * height * width;
copy row (offset2 -> offset2 + bytes_per_pixel*width) -> (offset -> offset + bytes_per_pixel*width)
copy row(buffer -> buffer + width * bytes_per_pixel ) -> offset
end loop
Here is a visual illustration of one iteration of this code loop:
Copy current row N to buffer
Copy row (rows - N) to row N
Copy buffer to row (rows - N)
Increment N and repeat until N == rows/2
However, this will only work on images which have an even number of rows, which is fine as opengl doesn't like texture with non-power of two dimensions.
It should also be noted that if the image loaded does not have power of two width, SDL_Image pads it. Therefore, the "width" passed to the function should be the pitch of the image, not it's width.

Slicing Image to map different textures

In OpenGL fix function programming, can I possibly map different textures on different objects, but that texture in generated from one image only. For e.g. I have 1024 X 1024 image. I have four rectangles in my scene. Now I would want to slice image into 256 X 256 *4 and map these sliced images as textures.
How can I do this. One option is to off course pre-slice the image. But can this be done using glTexSubImage2D or some similar/different API?
Yes, you can use texture coordinates, which indicate which parts of the texture you wish to be mapped onto your object, rather than mapping the whole thing.
Read more: http://www.glprogramming.com/red/chapter09.html#name6
EDIT : Shaders are required to use array textures : http://www.opengl.org/registry/specs/EXT/texture_array.txt. I'll leave the answer as it might still be useful info.
I guess you can use an Array Texture for this : http://www.opengl.org/wiki/Array_Texture
You make a 256 x 2048 texture. When loading the texture you specify a layer size (256 x 256) and layer count (4). Your texture will then be split up in four layers.
You can access the texture using UVs : [x, y, LayerId]
Note that if you want to generate mipmaps, you need to define the number of levels when you allocate the storage with glTexStorage3D.
I think you have three options.
Use different texture coordinates for each different rectangle.
Transform the texture coordinates using glMatrixMode(GL_TEXTURE) and a different matrix between drawing each rectangle.
Create four different OpenGL textures from your original big texture. I don't think OpenGL offers you any help here. You have to either use a paint package to do it (easiest option if you only have to do this a few times), or copy parts of the image into a new buffer before calling glTexImage2D.
I think the first option is the easiest, with the advantage that you don't have to change any state between drawing the rectangles.

Fast texel settting in OpenGL

I'm in need of rendering an influence map in OpenGL. At present I have 100 x 100 quads rendering with a set color to represent the influence at each point on the map. I've been recommended to change my rendering method to one quad with a texture, then allowing the rendering pipeline to take over in speed.
Basic testing has shown that glTexSubImage2D is too slow for setting 10,000 texels per frame. Do you have any suggestions? Would it better to create an entirely new texture each frame? My influence map is in normalized floats (0.0 to 1.0) and that is converted to grayscale colors (1.0f = white).
Thanks :D
Are you currently updating each of the 10000 texels separately, with 10000 calls of glTexSubImage2D?
Just use one 100x100 grayscale float texture (array of 10000 floats) in RAM, update values directly to that and then send the whole data to GPU with one glTexImage2D call. You could also use buffer objects to allow the transfer happen on background, but it should be unnecessary since you are not moving very large amounts of data.