A game uses software rendering to draw a full-screen paletted (8-bit) image in memory.
What's the fastest way to put that image on the screen, using OpenGL?
Things I've tried:
glDrawPixels with glPixelMap to specify the palette, and letting OpenGL do the palette mapping. Performance was horrendous (~5 FPS).
Doing palette mapping (indexed -> BGRX) in software, then drawing that using glDrawPixels. Performance was better, but CPU usage is still much higher than using 32-bit DirectDraw with the same software palette mapping.
Should I be using some kind of texture instead?
glDrawPixels with glPixelMap to specify the palette, and letting OpenGL do the palette mapping. Performance was horrendous (~5 FPS).
That's not a surprise. glDrawPixels is not very fast to begin with and glPixelMap will do the index/palette → RGB conversion on the CPU in a surely not very optimized codepath.
Doing palette mapping (indexed -> BGRX) in software, then drawing that using glDrawPixels.
glDrawPixels is about one of the slowest functions in OpenGL there is. This has two main reasons: First it's a codepatch not very much optimized, second it writes directly into the target framebuffer, hence forcing the pipeline into synchronization every time it's called. Also on most GPU it isn't backed by any cache.
What I suggest is you place your indexed image into single channel texture, e.g. GL_R8 (for OpenGL-3 or later) or GL_LUMINANCE8, and your palette into a 1D RGB texture, so that the index used as texture coordinate does look up the color. Using a texture as a LUT is perfectly normal. With this combination you use a fragment shader for in-situ palette index to color conversion.
The fragment shader would look like this
#version 330
uniform sampler2D image;
uniform sampler1D palette;
in vec2 texcoord;
void main()
{
float index = tex2D(image, texcoord).r * 255.f; // 255 for a 8 bit palette
gl_FragColor = texelFetch(palette, index, 0);
}
Related
I'm trying to read a 3D texture I rendered using an FBO. This texture is so large that glGetTexImage results in GL_OUT_OF_MEMORY error due to failure of nvidia driver to allocate memory for intermediate storage* (needed, I suppose, to avoid changing destination buffer in case of error).
So I then thought of getting this texture layer by layer, using glReadPixels after I render each layer. But glReadPixels doesn't have layer index as a parameter. The only place where it actually appears as something that directs I/O to the particular layer is gl_Layer output in the geometry shader. And that is for the writing stage, not reading.
As I tried simply doing the calls to glReadPixels anyway after I render each layer, I only got the texels for layer 0. So glReadPixels at least doesn't fail to get something.
But the question is: can I get arbitrary layer of a 3D texture using glReadPixels? And if not, what should I use instead, given the above described memory constraints? Do I have to sample the layer from 3D texture in a shader to render the result to a 2D texture, and read this 2D texture afterwards?
*It's not a guess, I've actually tracked it down to a failing malloc call (with the size of the texture as argument) from within the nvidia driver's shared library.
If you have access to GL 4.5 or ARB_get_texture_sub_image, you can employ glGetTextureSubImage. As the function name suggests, it's for querying a sub-section of a texture's image data. This allows you to read slices of the texture without having to get the whole thing in one go.
The extension seems fairly widely supported, available on any implementation that's still being supported by its IHV.
Yes, glReadPixels can read other slices from the 3D texture. One just has to use glFramebufferTextureLayer to attach the correct current slice to the FBO — instead of attaching the full 3D texture as the color attachment. Here's the replacement code for glGetTexImage (a special FBO for this, fboForTextureSaving, should be generated beforehand):
GLint origReadFramebuffer=0, origDrawFramebuffer=0;
gl.glGetIntegerv(GL_READ_FRAMEBUFFER_BINDING, &origReadFramebuffer);
gl.glGetIntegerv(GL_DRAW_FRAMEBUFFER_BINDING, &origDrawFramebuffer);
gl.glBindFramebuffer(GL_FRAMEBUFFER, fboForTextureSaving);
for(int layer=0; layer<depth; ++layer)
{
gl.glFramebufferTextureLayer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
texture, 0, layer);
checkFramebufferStatus("framebuffer for saving textures");
gl.glReadPixels(0,0,w,h,GL_RGBA,GL_FLOAT, subpixels+layer*w*h*4);
}
gl.glBindFramebuffer(GL_READ_FRAMEBUFFER, origReadFramebuffer);
gl.glBindFramebuffer(GL_DRAW_FRAMEBUFFER, origDrawFramebuffer);
Anyway, this is not a long-term solution to the problem. The first reason for GL_OUT_OF_MEMORY errors with large textures is actually not lack of RAM or VRAM. It's subtler: each texture allocated on GPU is mapped to the process' address space (at least on Linux/nvidia). So if your process doesn't malloc even half of the RAM available to it, its address space may be already used by these large mappings. Add to this a bit of memory fragmentation, and you get either GL_OUT_OF_MEMORY, or malloc failure, or std::bad_alloc somewhere even earlier than expected.
The proper long-term solution is to embrace the 64-bit reality and compile your app as 64-bit code. This is what I ended up doing, ditching all this layer-by-layer kludge and simplifying the code quite a bit.
So once you got your 3D texture you can do this:
for (z=0;z<z_resolution_of_your_txr;z++)
{
render_textured_quad(using z slice of 3D texture);
glReadPixels(...);
}
Its best to match the QUAD size ot your 3D texture x,y resolutions and use GL_NEAREST filtering...
This will be slow so if you are not on Intel and want to be more fast you can use render to 2D Texture instead and use glGetTexImage on the target 2D texture instead of glReadPixels.
Here example shaders for rendering slice z:
Vertex:
//------------------------------------------------------------------
#version 420 core
//------------------------------------------------------------------
uniform float aspect;
layout(location=0) in vec2 pos;
out smooth vec2 vpos;
//------------------------------------------------------------------
void main(void)
{
vpos=pos;
gl_Position=vec4(pos.x,pos.y*aspect,0.0,1.0);
}
//------------------------------------------------------------------
Fragment:
//------------------------------------------------------------------
#version 420 core
//------------------------------------------------------------------
uniform float slice=0.25; // <0,1> slice of txr
in smooth vec2 vpos;
uniform sampler3D vol_txr; // 3D texture unit used
out layout(location=0) vec4 frag_col;
void main()
{
frag_col=texture(vol_txr,vec3(0.5*(vpos+1.0),slice));
}
//---------------------------------------------------------------------------
So you need to change the slice uniform before each slice render. The rendering itself is just single QUAD covering the screen <-1,+1> while viewport matches the texture x,y resolution...
I'm trying to take advantage of a gpu's parallelism to make an image proccessing application. I'm having a shader, which takes two textures, and based on some uniform variables, computes an output texture. But instead of transparency alpha value, each texture pixel needs an extra metadata byte, mandatory in computation:
So I consider running the shader twice each frame, once to compute the Dynamic Metadata as a single byte texture, and once to calculate the resulting Paint Texture, which I need to be 3 bytes (to limit memory usage, as there might be quite some such textures loaded at once).
I find the above problem a bit complicated, I've used opengl to paint to
the screen, but I need to paint to two different textures this time,
which I do not know how to do. Besides, gl_FragColor built-in variable's
type is vec4, but I need different output values.
So, to sum it up a little, is it possible for the fragment shader to output
anything other than a vec4?
Is it possible to save to two different textures with a single call?
Is it possible to make an editable texture to store changes, until the editing ends and the data have to be passed back to the cpu?
What openGL calls would be most usefull for the above?
Paint texture should also be able to be retrieved to be shown on the screen.
The above could very easily be done via blitting textures on the cpu.
I could keep all the relevant data on the cpu, do all the work 60 times/sec,
and update the relevant texture by passing the data from the cpu to the gpu.
For changing relatively small regions of a texture each frame
(about ~20% of the total scale of about 512x512 size textures), would you consider the above approach worth the trouble?
It depends on which version of OpenGL you use.
The latest OpenGL 4+ does not have a gl_FragColor variable, and instead lets you write any number (up to supported maximum) of output colors from the fragment shader, each sent to the corresponding framebuffer color attachment:
layout(location = 0) out vec4 OUT0;
layout(location = 1) out float OUT1;
That will write OUT0 to GL_COLOR_ATTACHMENT0 and OUT1 to GL_COLOR_ATTACHEMENT1 of the currently bound framebuffer.
However, considering that you use gl_FragColor, you use some old version of OpenGL. I'm not proficient in the legacy older OpenGL versions, but you can check out whether your implementation supports the GL_ARB_draw_buffers extension and/or gl_FragData[] output variable.
Also, as stated, it's unclear why can't you use a single RGBA texture and use its alpha channel for that metadata.
I've written a simple GL fragment shader which performs an RGB gamma adjustment on an image:
uniform sampler2D tex;
uniform vec3 gamma;
void main()
{
vec3 texel = texture2D(tex, gl_TexCoord[0].st).rgb;
texel = pow(texel, gamma);
gl_FragColor.rgb = texel;
}
The texture paints most of the screen and it's occurred to me that this is applying the adjustment per output pixel on the screen, instead of per input pixel on the texture. Although this doesn't change its appearance, this texture is small compared to the screen.
For efficiency, how can I make the shader process the texture pixels instead of the screen pixels? If it helps, I am changing/reloading this texture's data on every frame anyway, so I don't mind if the texture gets permanently altered.
and it's occurred to me that this is applying the adjustment per output pixel on the screen
Almost. Fragment shaders are executed per output fragment (hence the name). A fragment is a the smallest unit of rasterization, before it's written into a pixel. Every pixel that's covered by a piece of visible rendered geometry is turned into one or more fragments (yes, there may be even more fragments than covered pixels, for example when drawing to an antialiased framebuffer).
For efficiency,
Modern GPUs won't even "notice" the slightly reduced load. This is a kind of microoptimization, that's on the brink of non-measureability. My advice: Don' worry about it.
how can I make the shader process the texture pixels instead of the screen pixels?
You could preprocess the texture, by first rendering it through a texture sized, not antialiased framebuffer object to a intermediate texture. However if your change is nonlinear, and a gamma adjustment is exactly that, then you should not do this. You want to process images in a linear color space and apply nonlinear transformation only as late as possible.
I am loading bitmaps with OpenGL to texture a 3d mesh. Some of these bitmaps have alpha channels (transparency) for some of the pixels and I need to figure out the best way to
obtain the values of transparency for each pixel
and
render them with the transparency applied
Does anyone have a good example of this? Does OpenGL support this?
First of all, it's generally best to convert your bitmap data to 32-bit so that each channel (R,G,B,A) gets 8 bits. When you upload your texture, specify a 32bit format.
Then when rendering, you'll need to glEnable(GL_BLEND); and set the blend function, eg: glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);. This tells OpenGL to mix the RGB of the texture with that of the background, using the alpha of your texture.
If you're doing this to 3D objects, you might also want to turn off back-face culling (so that you see the back of the object through the front) and sort your triangles back-to-front (so that the blends happen in the correct order).
If your source bitmap is 8-bit (ie: using a palette with one colour specified as the transparency mask), then it's probably easiest to convert that to RGBA, setting the alpha value to 0 when the colour matches your transparency mask.
Some hints to make things (maybe) look better:
Your alpha channel is going to be an all-or-nothing affair (either 0x00 or 0xff), so apply some blur algorithm to get softer edges, if that's what you're after.
For texels (texture-pixels) with an alpha of zero (fully transparent), replace the RGB colour with the closest non-transparent texel. When texture coordinates are being interpolated, they wont be blended towards the original transparency colour from your BMP.
If your pixmap are 8-bit single channel they are either grayscale or use a palette. What you first need to do is converting the pixmap data into RGBA format. For this you allocate a buffer large enough to hold a 4-channel pixmap of the dimensions of the original file. Then for each pixel of the pixmap use that pixel's value as index into the palette (look up table) and put that color value into the corresponding pixel in the RGBA buffer. Once finished, upload to OpenGL using glTexImage2D.
If your GPU supports fragment shaders (very likely) you can do that LUT transformation in the shader: Upload the 8-bit pixmal as a GL_RED or GL_LUMINANCE 2D texture. And upload the palette as a 1D GL_RGBA texture. Then in the fragment shader:
uniform sampler2D texture;
uniform sampler1D palette_lut;
void main()
{
float palette_index = texture2D(texture,gl_TexCoord[0].st).r;
vec4 color = texture1D(palette_lut, palette_index);
gl_FragColor = color;
}
Blended rendering conflicts with the Z buffer algorithm, so you must sort your geometry back-to-front for things to look properly. As long as this affects objects at a whole this is rather simple, but it becomes tedious if you need to sort the faces of a mesh rendering each and every frame. A method to avoid this is breaking down meshes into convex submeshes (of course a mesh that's convex already can not be broken down further). Then use the following method:
Enable face culling
for convex_submesh in sorted(meshes, far to near):
set face culling to front faces (i.e. the backside gets rendered)
render convex_submesh
set face culling to back faces (i.e. the fronside gets rendered)
render convex_submesh again
I would like to increase the brightness on a texture used in OpenGL rendering. Such as making it bright red or white. This is a 2D rendering environment, where every sprite is mapped as a texture to an OpenGL polygon.
I know little to nothing on manipulating data, and my engine works with a texture cache, so altering the whole surface would affect everything using the texture.
I can simulate the effect by having a "mask" and overlaying it, allowing me to make the sprite having solid colors, but that takes away memory.
If there any other solution to this?
If your requirement afford it, you can always write a very simple GLSL fragment shader which does this. It's literally a one liner.
Something like:
uniform sampler2d tex;
void main()
{
gl_FragColor = texture2d(tex, gl_TexCoord[0]) + gl_Color;
}
Perhaps GL_ADD instead of GL_MODULATE?
use GL_MODULATE to multiply the texture color by the current color.
see the texture tutorial in this page.