I have a set of questions about NOT uniform flow control in GLSL, and its performance cost on modern desktop GPUs. First of all, I want to note that I have read the manual but still didn't find answer. Lets get started.
Alpha check and zero multiplication optimization.
Which fragment shader will work faster? (the header is the same for both)
in vec2 textureCoordIn; //interpolated texture coords from vertex shader
out vec4 outputColor; //resulted color should be here
uniform sampler2D alphaMask; // splat alpha mask for textures1-4;
uniform sampler2D mainTexture1;
uniform sampler2D mainTexture2;
uniform sampler2D mainTexture3;
uniform sampler2D mainTexture4;
void main(){
vec4 maskValues = texture(alphaMask,textureCoordIn);
if (maskValues.r>0){
outputColor += maskValues.r * texture(mainTexture1,textureCoordIn);
}
if (maskValues.g>0){
outputColor += maskValues.g * texture(mainTexture2,textureCoordIn);
}
if (maskValues.b>0){
outputColor += maskValues.b * texture(mainTexture3,textureCoordIn);
}
if (maskValues.w>0){
outputColor += maskValues.w * texture(mainTexture4,textureCoordIn);
}
}
OR
void main(){
vec4 maskValues = texture(alphaMask,textureCoordIn);
outputColor += maskValues.r * texture(mainTexture1,textureCoordIn);
outputColor += maskValues.g * texture(mainTexture2,textureCoordIn);
outputColor += maskValues.b * texture(mainTexture3,textureCoordIn);
outputColor += maskValues.w * texture(mainTexture4,textureCoordIn);
}
Lets assume that maskValues can have zeroes in 50% cases. What shader will perform faster? Also it is interesting, if glsl have the build-in optimization for zero multiplication. Does somebody knows?
Texture array possible wrong index optimization. Avoiding undefined behaviour?
Lets assume we have texture array (sampler2DArray). Every vertex has ivec4 attribute, that contain 4 texture indexes for this texture array. In fragment shader we need to return sum of texture colors for this indexes. Fairy simple. But what should we do, if we want to handle case, when indexes can point to "null" texture. At init step we can setup this indexes (vertex attributes) as "-1", that means the vec4(0,0,0,0) color. What is the best (and correct!) way to handle it?
in vec2 textureCoordIn; //interpolated texture coords from vertex shader
out vec4 outputColor; //resulted color should be here
uniform sampler2DArray globalTextureArray;
flat in ivec4 textureIndexes;
void main(){
if (textureIndexes.x > -1){
outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.x));
}
if (textureIndexes.y > -1){
outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.y));
}
if (textureIndexes.z > -1){
outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.z));
}
if (textureIndexes.w > -1){
outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.w));
}
}
OR
we should put "fake" (transparent-black) texture into globalTextureArray, and use their index to handle such case. So what is faster for this - if-else fork OR 4x texture lookups?
Related
I am trying to assign texture unit 0 to a sampler2D uniform but the uniform's value does not change.
My program is coloring points based on their elevation (Y coordinates). Their color is looked up in a texture.
Here is my vertex shader code :
#version 330 core
#define ELEVATION_MODE
layout (location = 0) in vec3 position;
layout (location = 1) in float intensity;
uniform mat4 vpMat;
flat out vec4 f_color;
#ifdef ELEVATION_MODE
uniform sampler2D elevationTex;
#endif
#ifdef INTENSITY_MODE
uniform sampler2D intensityTex;
#endif
// texCoords is the result of calculations done on vertex coords, I removed the calculation for clarity
vec4 elevationColor() {
return vec4(textureLod(elevationTex, elevationTexCoords, 0), 1.0);
}
vec4 intensityColor() {
return vec4(textureLod(elevationTex, intensityTexCoords, 0), 1.0);
}
int main() {
gl_Position = vpMat * vec4(position.xyz, 1.0);
#ifdef ELEVATION_MODE
f_color = elevationColor();
#endif
#ifdef COLOR_LODDEPTH
f_color = getNodeDepthColor();
#endif
}
Here is my fragment shader :
#version 330 core
out vec4 color;
flat in vec4 f_color;
void main() {
color = f_color;
}
When this shader is executed, I have 2 textures bound :
elevation texture in texture unit 0
intensity texture in texture unit 1
I am using glUniform1i to set the uniform's value :
glUniform1i(elevationTexLocation, (GLuint)0);
But when I run my program, the value of the uniform elevationTex is 1 instead of 0.
If I remove the glUniform1i call, the uniform value does not change (still 1) so I think the call is doing nothing (but generates no error).
If I change the uniform's type to float and the call from glUniform1i to :
glUniform1f(elevationYexLocation, 15.0f);
The value in the uniform is now 15.0f. So there is no problem in my program with the location from which I call glUniform1i, it just has no impact on the uniform's value.
Any idea about what I could be doing wrong ?
I could give you more code but it is not really accessible so if you know the answer without it that's great. If you need the C++ part of the code, ask, I'll try to retrieve the important parts
How do I determine what mipmap level was used when sampling a texture in a GLSL fragment shader?
I understand that I can manually sample a particular mipmap level of a texture using the textureLod(...) method:
uniform sampler2D myTexture;
void main()
{
float mipmapLevel = 1;
vec2 textureCoord = vec2(0.5, 0.5);
gl_FragColor = textureLod(myTexture, textureCoord, mipmapLevel);
}
Or I could allow the mipmap level to be selected automatically using texture(...) like
uniform sampler2D myTexture;
void main()
{
vec2 textureCoord = vec2(0.5, 0.5);
gl_FragColor = texture(myTexture, textureCoord);
}
I prefer the latter, because I trust the driver's judgment about appropriate mipmap level more than I do my own.
But I'd like to know what mipmap level was used in the automatic sampling process, to help me rationally sample nearby pixels. Is there a way in GLSL to access the information about what mipmap level was used for an automatic texture sample?
Below are three distinct approaches to this problem, depending on which OpenGL features are available to you:
As pointed out by Andon M. Coleman in the comments, the solution in OpenGL version 4.00 and above is simple; just use the textureQueryLod function:
#version 400
uniform sampler2D myTexture;
in vec2 textureCoord; // in normalized units
out vec4 fragColor;
void main()
{
float mipmapLevel = textureQueryLod(myTexture, textureCoord).x;
fragColor = textureLod(myTexture, textureCoord, mipmapLevel);
}
In earlier versions of OpenGL (2.0+?), you might be able to load an extension, to similar effect. This approach worked for my case. NOTE: the method call is capitalized differently in the extension, vs. the built-in (queryTextureLod vs queryTextureLOD).
#version 330
#extension GL_ARB_texture_query_lod : enable
uniform sampler2D myTexture;
in vec2 textureCoord; // in normalized units
out vec4 fragColor;
void main()
{
float mipmapLevel = 3; // default in case extension is unavailable...
#ifdef GL_ARB_texture_query_lod
mipmapLevel = textureQueryLOD(myTexture, textureCoord).x; // NOTE CAPITALIZATION
#endif
fragColor = textureLod(myTexture, textureCoord, mipmapLevel);
}
If loading the extension does not work, you could estimate the automatic level of detail using the approach contributed by genpfault:
#version 330
uniform sampler2D myTexture;
in vec2 textureCoord; // in normalized units
out vec4 fragColor;
// Does not take into account GL_TEXTURE_MIN_LOD/GL_TEXTURE_MAX_LOD/GL_TEXTURE_LOD_BIAS,
// nor implementation-specific flexibility allowed by OpenGL spec
float mip_map_level(in vec2 texture_coordinate) // in texel units
{
vec2 dx_vtc = dFdx(texture_coordinate);
vec2 dy_vtc = dFdy(texture_coordinate);
float delta_max_sqr = max(dot(dx_vtc, dx_vtc), dot(dy_vtc, dy_vtc));
float mml = 0.5 * log2(delta_max_sqr);
return max( 0, mml ); // Thanks #Nims
}
void main()
{
// convert normalized texture coordinates to texel units before calling mip_map_level
float mipmapLevel = mip_map_level(textureCoord * textureSize(myTexture, 0));
fragColor = textureLod(myTexture, textureCoord, mipmapLevel);
}
In any case, for my particular application, I ended up just computing the mipmap level on the host side, and passing it to the shader, because the automatic level-of-detail turned out to be not exactly what I needed.
From here:
take a look at the OpenGL 4.2 spec chapter 3.9.11 equation 3.21. The mip map level is calculated based on the lengths of the derivative vectors:
float mip_map_level(in vec2 texture_coordinate)
{
vec2 dx_vtc = dFdx(texture_coordinate);
vec2 dy_vtc = dFdy(texture_coordinate);
float delta_max_sqr = max(dot(dx_vtc, dx_vtc), dot(dy_vtc, dy_vtc));
return 0.5 * log2(delta_max_sqr);
}
I am writing some font drawing shaders in OpenGL 3.3. I will render my font into a texture atlas and then generate some display lists for some text I want to draw. I would like the rendering of text to consume the least amount of resources (CPU, GPU memory, GPU time). How can I accomplish this?
Looking at Freetype-gl, I noticed that the author generates 6 indices and 4 vertices per character.
Since I am using OpenGL 3.3, I have some additional freedom. My plan was to generate 1 vertex per character plus one integer "code" per character. The character code can be used in texelFetch operations to retrieve texture coördinates and character size information. A geometry shader turns the size information and vertex into a triangle strip.
Is texelFetch going to be slower than sending more vertices/texture coördinates? Is this worth doing?, or is there are reason why it's not done in the font libraries I looked at?
Final code:
Vertex shader:
#version 330
uniform sampler2D font_atlas;
uniform sampler1D code_to_texture;
uniform mat4 projection;
uniform vec2 vertex_offset; // in view space.
uniform vec4 color;
uniform float gamma;
in vec2 vertex; // vertex in view space of each character adjusted for kerning, etc.
in int code;
out vec4 v_uv;
void main()
{
v_uv = texelFetch(
code_to_texture,
code,
0);
gl_Position = projection * vec4(vertex_offset + vertex, 0.0, 1.0);
}
Geometry shader:
#version 330
layout (points) in;
layout (triangle_strip, max_vertices = 4) out;
uniform sampler2D font_atlas;
uniform mat4 projection;
in vec4 v_uv[];
out vec2 g_uv;
void main()
{
vec4 pos = gl_in[0].gl_Position;
vec4 uv = v_uv[0];
vec2 size = vec2(textureSize(font_atlas, 0)) * (uv.zw - uv.xy);
vec2 pos_opposite = pos.xy + (mat2(projection) * size);
gl_Position = vec4(pos.xy, 0, 1);
g_uv = uv.xy;
EmitVertex();
gl_Position = vec4(pos.x, pos_opposite.y, 0, 1);
g_uv = uv.xw;
EmitVertex();
gl_Position = vec4(pos_opposite.x, pos.y, 0, 1);
g_uv = uv.zy;
EmitVertex();
gl_Position = vec4(pos_opposite.xy, 0, 1);
g_uv = uv.zw;
EmitVertex();
EndPrimitive();
}
Fragment shader:
#version 330
uniform sampler2D font_atlas;
uniform vec4 color;
uniform float gamma;
in vec2 g_uv;
layout (location = 0) out vec4 fragment_color;
void main()
{
float a = texture(font_atlas, g_uv).r;
fragment_color.rgb = color.rgb;
fragment_color.a = color.a * pow(a, 1.0 / gamma);
}
I wouldn't expect there to be a significant performance difference between your proposed method vs storing the quad vertex positions and texture coordinates in a vertex buffer. On the one hand your method requires a smaller vertex buffer and less work for the CPU. On the other hand the texelFetch calls will be more-or-less at random locations, and not make the best use of the cache. This last point may not be very significant as I guess that texture wont be very large. Also, the execution model of geometry shaders mean they can quickly become the bottleneck of the pipeline.
To answer "is this worth doing?" - I suspect not for performance reasons. Unfortunately you can't tell until you implement it and measure the performance. I think it's quite a cool idea though, so I don't think you'd be wasting your time trying it out.
Maybe you can use Atomic Counter to handle current position in text.
Here is an interresting paper on memory bandwidth
GPU perf...
You can cache the result in a fbo.
For realy fast rendering as you said, you may build a geom shader taking points as input and outputing quads and sample a texture to get additional on glyph info.
This appear effectively the best solution...
I'm trying to create a "transition" effect between two 2D scenes. I have 3 textures: before, after, and mask. before and after are self-explanatory. mask is a simple monochrome texture that defines how the first two get composited. It changes over time, to perform the transition. All 3 textures are the same size.
I've verified that all 3 textures contain the correct data, but when I try to perform the compositing, I end up with either before in its entirety, or after in its entirety, seemingly at random.
Here's what I'm doing:
Application code:
glEnable(GL_MULTISAMPLE);
glActiveTextureARB(GL_TEXTURE1_ARB);
glEnable(GL_TEXTURE_RECTANGLE_ARB);
after.handle.bind;
glActiveTextureARB(GL_TEXTURE2_ARB);
glEnable(GL_TEXTURE_RECTANGLE_ARB);
mask.handle.bind;
glActiveTextureARB(GL_TEXTURE0_ARB);
before.handle.bind;
GShaders.UseShaderProgram(maskProgramHandle); //GShaders: Global shader engine
GShaders.SetUniformValue(maskProgramHandle, 'before', 0);
GShaders.SetUniformValue(maskProgramHandle, 'after', 1);
GShaders.SetUniformValue(maskProgramHandle, 'mask', 2);
before.DrawFull; //draws the texture to the screen as a quad.
glDisable(GL_MULTISAMPLE);
Vertex shader:
varying vec4 v_color;
varying vec2 texture_coordinate;
void main()
{
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
texture_coordinate = vec2(gl_MultiTexCoord0);
v_color = gl_Color;
gl_FrontColor = gl_Color;
}
Fragment shader:
uniform sampler2DRect before;
uniform sampler2DRect after;
uniform sampler2DRect mask;
varying vec2 texture_coordinate;
void main()
{
vec3 maskValue = texture2DRect(mask, texture_coordinate).rgb;
float alpha = (maskValue.r + maskValue.g + maskValue.b) / 3.0;
vec4 beforeValue = texture2DRect(before, texture_coordinate);
vec4 afterValue = texture2DRect(after, texture_coordinate);
gl_FragColor = mix(beforeValue, afterValue, alpha);
}
Any idea what's going wrong?
This is only guessing, have you tried this
gl_FragColor = mix(beforeValue, afterValue, alpha / 255.0f);
What is the correct way of doing the following:
Render a scene into a texture using a FBO (fbo-a)
Then apply an effect using the texture (tex-a) and render this into another texture (tex-b) using the same fbo (fbo-a)
Then render this second texture, with the applied effect (tex-b) as a full screen quad.
My approach is this, but this gives me a texture filled with "noise" on window + the applied effect (all pixels are randomly colored red, green, blue white, black).
I'm using one FBO, with two textures set to GL_COLOR_ATTACHENT0 (tex-a) and GL_COLOR_ATTACHMENT1 (tex-b)
I bind my fbo, make sure it's rendered into the tex-a using glDrawBuffer(GL_COLOR_ATTACHMENT0)
Then I apply the effect in a shader with tex-a bound and set as 'sampler2D'. Using texture unit 1, and switch to the second color attachment (glDrawBuffer(GL_COLOR_ATTACHMENT1)). and render a full screen quad. Everything is now rendered into tex-b
Then I switch back to the default FBO (0) and use tex-b with a full screen quad to render the result.
Example of the result when applying my shader
This is the shader I'm using. I'm not aware this could be what is causing this, but maybe the noise is caused by a overflow?
Vertex shader
attribute vec4 a_pos;
attribute vec2 a_tex;
varying vec2 v_tex;
void main() {
mat4 ident = mat4(1.0);
v_tex = a_tex;
gl_Position = ident * a_pos;
}
Fragment shader
uniform int u_mode;
uniform sampler2D u_texture;
uniform float u_exposure;
uniform float u_decay;
uniform float u_density;
uniform float u_weight;
uniform float u_light_x;
uniform float u_light_y;
const int NUM_SAMPLES = 100;
varying vec2 v_tex;
void main() {
if (u_mode == 0) {
vec2 pos_on_screen = vec2(u_light_x, u_light_y);
vec2 delta_texc = vec2(v_tex.st - pos_on_screen.xy);
vec2 texc = v_tex;
delta_texc *= 1.0 / float(NUM_SAMPLES) * u_density;
float illum_decay = 1.0;
for(int i = 0; i < NUM_SAMPLES; i++) {
texc -= delta_texc;
vec4 sample = texture2D(u_texture, texc);
sample *= illum_decay * u_weight;
gl_FragColor += sample;
illum_decay *= u_decay;
}
gl_FragColor *= u_exposure;
}
else if(u_mode == 1) {
gl_FragColor = texture2D(u_texture, v_tex);
gl_FragColor.a = 1.0;
}
}
I've read this FBO article on opengl.org, where they describe a feedback loop at the bottom of the article. The description is not completely clear to me and I'm wondering if I'm exactly doing what they describe there.
Update 1:
Link to source code
Update 2:
When I first set gl_FragColor.rgb = vec3(0.0, 0.0, 0.0); before I start the sampling loop (with NUM_SAMPLES), it works find. No idea why though.
The problem is that you're not initializing gl_FragColor, and you're modifying it with the lines
gl_FragColor += sample;
and
gl_FragColor *= u_exposure;
both of which depend on the previous value of gl_FragColor. So you're getting some random junk (whatever happened to be in the register that the shader compiler decided to use for the gl_FragColor computation) added in. This has a strong possibility of working fine on some driver/hardware combinations (because the compiler decided to use a register that was always 0 for some reason) and not on others.