How to avoid int->float conversion when passing data to pixel shader? - opengl

I have a pixel shader:
varying vec2 f_texcoord;
uniform vec4 mycolor_mult;
uniform sampler2D mytexture;
void main(void) {
gl_FragColor = (texture2D(mytexture, f_texcoord) * mycolor_mult);
};
and corresponding C++ code:
GLint m_attr = glGetUniformLocation(m_program, "mycolor_mult");
// ...
unsigned int myColor = ...; // 0xAARRGGBB format
float a = (myColor >> 24) / 255.f;
float r = ((myColor >> 16) & 0xFF) / 255.f;
float g = ((myColor >> 8) & 0xFF) / 255.f;
float b = (myColor & 0xFF) / 255.f;
glUniform4f(m_attr, r, g, b, a);
I keep sprite's color as unsigned int and have to convert it to 4 floats to pass them to the shader.
Can it be optimized? I mean can I pass not floats, but unsigned chars as components to the shader and avoid "divide by 255" operations? What should I change in shader and in C++ code to do it?

There are a few aspects to this question.
Is it worth optimizing?
I agree with #Nick's comment. There's a high likelihood that you're trying to optimize something that is not performance critical at all. For example, if this code is only executed once per frame, the execution time of this code is absolutely insignificant. If this is executed many times per frame, things could look a bit different. Using a profiler can tell you how much time is spent in this code.
Are you optimizing the right thing?
Make sure that the glGetUniformLocation() call is only done once after linking the shader, not each time you set the uniform. Otherwise, that call will most likely be much more expensive than the rest of the code. It's not entirely clear from the code if you're already doing that.
Can you use more efficient OpenGL calls?
Not really, if you need the values as floats in the shader. There are no automatic format conversions for uniforms, so you cannot simply use a different call from the glUniform*() family. From the spec:
For all other uniform types the Uniform* command used must match the size and type of the uniform, as declared in the shader. No type conversions are done.
Can the code be optimized?
If you really want to do micro-optimizations, you can replace the divisions by multiplications. Divisions are much more expensive than multiplications on most CPUs. The code then looks like this:
const float COLOR_SCALE = 1.0f / 255.f;
float a = (myColor >> 24) * COLOR_SCALE;
float r = ((myColor >> 16) & 0xFF) * COLOR_SCALE;
float g = ((myColor >> 8) & 0xFF) * COLOR_SCALE;
float b = (myColor & 0xFF) * COLOR_SCALE;
You can't count on the compiler to perform this transformation for you, since changing operations can have effects of the precision/rounding of the operation. Some compilers have flags to enable these kinds of optimizations. See for example Optimizing a floating point division and conversion operation.

With modern OpenGL (GLSL >= 4.1), there is the unpackUnorm4x8 GLSL function which does exactly what you want: it takes a single 32 bit uint and creates a normalized floating point vector out of it. You just have to swizzle the result to match your byte order, that function will interpret the least significant byte as the first channel.
uniform uint mycolor_packed;
//...
vec4 mycolor_mult=unpackUnorm4x8(mycolor_packed).bgra;
This is potentially the most efficient way to do the conversion in the shader itself. However, it still remains doubtful if doing this once per fragment on the GPU is more efficient vs. only once per draw call on the CPU.

Related

Unexpeced value upon accessing an SSBO float

I am trying to calculate a morph offset for a gpu driven animation.
To that effect I have the following function (and SSBOS):
layout(std140, binding = 7) buffer morph_buffer
{
vec4 morph_targets[];
};
layout(std140, binding = 8) buffer morph_weight_buffer
{
float morph_weights[];
};
vec3 GetMorphOffset()
{
vec3 offset = vec3(0);
for(int target_index=0; target_index < target_count; target_index++)
{
float w1 = morph_weights[1];
offset += w1 * morph_targets[target_index * vertex_count + gl_VertexIndex].xyz;
}
return offset;
}
I am seeing strange behaviour so I opened renderdoc to trace the state:
As you can see, index 1 of the morph_weights SSBO is 0. However if I step over in the built in debugger for renderdoc I obtain:
Or in short, the variable I get back is 1, not 0.
So I did a little experiment and changed one of the values and now the SSBO looks like this:
And now I get this:
So my SSBO of type float is being treated like an ssbo of vec4's it seems. I am aware of alignment issues with vec3's, but IIRC floats are fair game. What is happenning?
Upon doing a little bit of asking around.
The issue is the SSBO is marked as std140, the correct std for a float array is std430.
For the vulkan GLSL dialect, an alternative is to use the scalar qualifier.

Can I put a R8G8B8A8 in a UBO, and use it as a vec4?

I try to optimize a working compute shader. Its purpose is to create an image: find the good color (using a little palette), and call imageStore(image, ivec2, vec4).
The colors are indexed, in an array of uint, in an UniformBuffer.
One color in this UBO is packed inside one uint, as {0-255, 0-255, 0-255, 0-255}.
Here the code:
struct Entry
{
*some other data*
uint rgb;
};
layout(binding = 0) uniform SConfiguration
{
Entry materials[MATERIAL_COUNT];
} configuration;
void main()
{
Entry material = configuration.materials[currentMaterialId];
float r = (material.rgb >> 16) / 255.;
float g = ((material.rgb & G_MASK) >> 8) / 255.;
float b = (material.rgb & B_MASK) / 255.;
imageStore(outImage, ivec2(gl_GlobalInvocationID.xy), vec4(r, g, b, 0.0));
}
I would like to clean/optimize a bit, because this color conversion looks bad/useless in the shader (and should be precomputed). My question is:
Is it possible to directly pack a vec4(r, g, b, 0.0) inside the UBO, using 4 bytes (like a R8G8B8A8) ?
Is it possible to do it directly? No.
But GLSL does have a number of functions for packing/unpacking normalized values. In your case, you can pass the value as a single uint uniform, then use unpackUnorm4x8 to convert it to a vec4. So your code becomes:
vec4 color = unpackUnorm4x8(material.rgb);
This is, of course, a memory-vs-performance tradeoff. So if memory isn't an issue, you should probably just pass a vec4 (never use vec3) directly.
Is it possible to directly pack a vec4(r, g, b, 0.0) inside the UBO, using 4 bytes (like a R8G8B8A8) ?
There is no way to express this directly as 4 single byte values; there is no appropriate data type in the shader to allow you to do declare this as a byte type.
However, why do you think you need to? Just upload it as 4 floats - it's a uniform so it's not like you are replicating it thousands of times, so the additional size is unlikely to be a problem in practice.

Pass array of floats to fragment shader via texture

I am trying to pass an array of floats (in my case an audio wave) to a fragment shader via texture. It works but I get some imperfections as if the value read from the 1px height texture wasn't reliable.
This happens with many combinations of bar widths and amounts.
I get the value from the texture with:
precision mediump float;
...
uniform sampler2D uDisp;
...
void main(){
...
float columnWidth = availableWidth / barsCount;
float barIndex = floor((coord.x-paddingH)/columnWidth);
float textureX = min( 1.0, (barIndex+1.0)/barsCount );
float barValue = texture2D(uDisp, vec2(textureX, 0.0)).r;
...
If instead of the value from the texture I use something else the issue doesn't seem to be there.
barValue = barIndex*0.1;
Any idea what could be the issue? Is using a texture for this purpose a bad idea?
I am using Pixi.JS as WebGL framework, so I don't have access to low level APIs.
With a gradient texture for the data and many bars the problems becomes pretty evident.
Update: Looks like the issue relates to the consistency of the value of textureX.
Trying different formulas like barIndex/(barsCount-1.0) results in less noise. Wrapping it on a min definitely adds more noise.
Turned out the issue wasn't in reading the values from the texture, but was in the drawing. Instead of using IFs I switched to step and the problem went away.
vec2 topLeft = vec2(
paddingH + (barIndex*columnWidth) + ((columnWidth-barWidthInPixels)*0.5),
top
);
vec2 bottomRight = vec2(
topLeft.x + barWidthInPixels,
bottom
);
vec2 tl = step(topLeft, coord);
vec2 br = 1.0-step(bottomRight, coord);
float blend = tl.x * tl.y * br.x * br.y;
I guess comparisons of floats through IFs are not very reliable in shaders.
Generally mediump is insufficient for texture coordinates for any non-trivial texture, so where possible use highp. This isn't always available on some older GPUs, so depending on the platform this may not solve your problem.
If you know you are doing 1:1 mapping then also use GL_NEAREST rather than GL_LINEAR, as the quantization effect will more likely hide some of the precision side-effects.
Given you probably know the number of columns and bars you can probably pre-compute some of the values on the CPU (e.g. precompute 1/columns and pass that as a uniform) at fp32 precision. Passing in small values between 0 and 1 is always much better at preserving floating point accuracy, rather than passing in big values and then dividing out.

GLSL Channel Selection

I have a GLSL shader that reads from one of the channels (e.g. R) of an input texture and then writes to the same channel in an output texture. This channel has to be selected by the user.
What I can think of right now is to just use an int uniform and tons of if-statements:
uniform sampler2D uTexture;
uniform int uChannelId;
varying vec2 vUv;
void main() {
//read in data from texture
vec4 t = texture2D(uTexture, vUv);
float data;
if (uChannelId == 0) {
data = t.r;
} else if (uChannelId == 1) {
data = t.g;
} else if (uChannelId == 2) {
data = t.b;
} else {
data = t.a;
}
//process the data...
float result = data * 2; //for example
//write out
if (uChannelId == 0) {
gl_FragColor = vec4(result, t.g, t.b, t.a);
} else if (uChannelId == 1) {
gl_FragColor = vec4(t.r, result, t.b, t.a);
} else if (uChannelId == 2) {
gl_FragColor = vec4(t.r, t.g, result, t.a);
} else {
gl_FragColor = vec4(t.r, t.g, t.b, result);
}
}
Is there any way of doing something like a dictionary access such as t[uChannelId]?
Or perhaps I should have 4 different versions of the same shader, each of which processes a different channel, so that I can avoid all the if-statements?
What is the best way to do this?
EDIT: To be more specific, I am using WebGL (Three.js)
There is such a way, and it is as simple as you actually wrote it in the question. Just use t[channelId]. To quote the GLSL Spec (This is from Version 3.30, Section 5.5, but applies to other versions as well):
Array subscripting syntax can also be applied to vectors to provide numeric indexing. So in
vec4 pos;
pos[2] refers to the third element of pos and is equivalent to pos.z. This allows variable indexing into a
vector, as well as a generic way of accessing components. Any integer expression can be used as the
subscript. The first component is at index zero. Reading from or writing to a vector using a constant
integral expression with a value that is negative or greater than or equal to the size of the vector is illegal.
When indexing with non-constant expressions, behavior is undefined if the index is negative, or greater
than or equal to the size of the vector.
Note that for the first part of your code, you use this to access a specific channel of a texture. You could also use the ARB_texture_swizzle functionality. In that case, you would just use a fxied channel, say r, for access in the shader and what swizzle the actual texture channels so that wahtever channel you want to access becomes r.
Update: as the target platform turned out to be webgl, these suggestions are not available. However, a simple solution would be to use a vec4 uniform in place of uChannelID which is 1.0 for the selected component and 0.0 for all others. Say this variable is called uChannelSel. You could use data=dot(t, uChannelSel) in the first part and gl_FragColor=(vec4(1.0)-uChannelSel) * t + uChannelSel*result for the second part.
as i'm sure you know, branching can be expensive in shaders. however, it sounds like it'll always be the same channel in a pass (yes?), so you might maintain enough cohesion to see good performance.
it's been a good while since i've used GLSL, but if you're using a newer version, maybe you could do some bitwise shifting (<< or >>) magic? you would read the texture into int instead of vec4, then shift it a number of bits depending on which channel you want to read.

How to debug a GLSL shader?

I need to debug a GLSL program but I don't know how to output intermediate result.
Is it possible to make some debug traces (like with printf) with GLSL ?
You can't easily communicate back to the CPU from within GLSL. Using glslDevil or other tools is your best bet.
A printf would require trying to get back to the CPU from the GPU running the GLSL code. Instead, you can try pushing ahead to the display. Instead of trying to output text, output something visually distinctive to the screen. For example you can paint something a specific color only if you reach the point of your code where you want add a printf. If you need to printf a value you can set the color according to that value.
void main(){
float bug=0.0;
vec3 tile=texture2D(colMap, coords.st).xyz;
vec4 col=vec4(tile, 1.0);
if(something) bug=1.0;
col.x+=bug;
gl_FragColor=col;
}
I have found Transform Feedback to be a useful tool for debugging vertex shaders. You can use this to capture the values of VS outputs, and read them back on the CPU side, without having to go through the rasterizer.
Here is another link to a tutorial on Transform Feedback.
GLSL Sandbox has been pretty handy to me for shaders.
Not debugging per se (which has been answered as incapable) but handy to see the changes in output quickly.
You can try this: https://github.com/msqrt/shader-printf which is an implementation called appropriately "Simple printf functionality for GLSL."
You might also want to try ShaderToy, and maybe watch a video like this one (https://youtu.be/EBrAdahFtuo) from "The Art of Code" YouTube channel where you can see some of the techniques that work well for debugging and visualising. I can strongly recommend his channel as he writes some really good stuff and he also has a knack for presenting complex ideas in novel, highly engaging and and easy to digest formats (His Mandelbrot video is a superb example of exactly that : https://youtu.be/6IWXkV82oyY)
I hope nobody minds this late reply, but the question ranks high on Google searches for GLSL debugging and much has of course changed in 9 years :-)
PS: Other alternatives could also be NVIDIA nSight and AMD ShaderAnalyzer which offer a full stepping debugger for shaders.
If you want to visualize the variations of a value across the screen, you can use a heatmap function similar to this (I wrote it in hlsl, but it is easy to adapt to glsl):
float4 HeatMapColor(float value, float minValue, float maxValue)
{
#define HEATMAP_COLORS_COUNT 6
float4 colors[HEATMAP_COLORS_COUNT] =
{
float4(0.32, 0.00, 0.32, 1.00),
float4(0.00, 0.00, 1.00, 1.00),
float4(0.00, 1.00, 0.00, 1.00),
float4(1.00, 1.00, 0.00, 1.00),
float4(1.00, 0.60, 0.00, 1.00),
float4(1.00, 0.00, 0.00, 1.00),
};
float ratio=(HEATMAP_COLORS_COUNT-1.0)*saturate((value-minValue)/(maxValue-minValue));
float indexMin=floor(ratio);
float indexMax=min(indexMin+1,HEATMAP_COLORS_COUNT-1);
return lerp(colors[indexMin], colors[indexMax], ratio-indexMin);
}
Then in your pixel shader you just output something like:
return HeatMapColor(myValue, 0.00, 50.00);
And can get an idea of how it varies across your pixels:
Of course you can use any set of colors you like.
At the bottom of this answer is an example of GLSL code which allows to output the full float value as color, encoding IEEE 754 binary32. I use it like follows (this snippet gives out yy component of modelview matrix):
vec4 xAsColor=toColor(gl_ModelViewMatrix[1][1]);
if(bool(1)) // put 0 here to get lowest byte instead of three highest
gl_FrontColor=vec4(xAsColor.rgb,1);
else
gl_FrontColor=vec4(xAsColor.a,0,0,1);
After you get this on screen, you can just take any color picker, format the color as HTML (appending 00 to the rgb value if you don't need higher precision, and doing a second pass to get the lower byte if you do), and you get the hexadecimal representation of the float as IEEE 754 binary32.
Here's the actual implementation of toColor():
const int emax=127;
// Input: x>=0
// Output: base 2 exponent of x if (x!=0 && !isnan(x) && !isinf(x))
// -emax if x==0
// emax+1 otherwise
int floorLog2(float x)
{
if(x==0.) return -emax;
// NOTE: there exist values of x, for which floor(log2(x)) will give wrong
// (off by one) result as compared to the one calculated with infinite precision.
// Thus we do it in a brute-force way.
for(int e=emax;e>=1-emax;--e)
if(x>=exp2(float(e))) return e;
// If we are here, x must be infinity or NaN
return emax+1;
}
// Input: any x
// Output: IEEE 754 biased exponent with bias=emax
int biasedExp(float x) { return emax+floorLog2(abs(x)); }
// Input: any x such that (!isnan(x) && !isinf(x))
// Output: significand AKA mantissa of x if !isnan(x) && !isinf(x)
// undefined otherwise
float significand(float x)
{
// converting int to float so that exp2(genType) gets correctly-typed value
float expo=float(floorLog2(abs(x)));
return abs(x)/exp2(expo);
}
// Input: x\in[0,1)
// N>=0
// Output: Nth byte as counted from the highest byte in the fraction
int part(float x,int N)
{
// All comments about exactness here assume that underflow and overflow don't occur
const float byteShift=256.;
// Multiplication is exact since it's just an increase of exponent by 8
for(int n=0;n<N;++n)
x*=byteShift;
// Cut higher bits away.
// $q \in [0,1) \cap \mathbb Q'.$
float q=fract(x);
// Shift and cut lower bits away. Cutting lower bits prevents potentially unexpected
// results of rounding by the GPU later in the pipeline when transforming to TrueColor
// the resulting subpixel value.
// $c \in [0,255] \cap \mathbb Z.$
// Multiplication is exact since it's just and increase of exponent by 8
float c=floor(byteShift*q);
return int(c);
}
// Input: any x acceptable to significand()
// Output: significand of x split to (8,8,8)-bit data vector
ivec3 significandAsIVec3(float x)
{
ivec3 result;
float sig=significand(x)/2.; // shift all bits to fractional part
result.x=part(sig,0);
result.y=part(sig,1);
result.z=part(sig,2);
return result;
}
// Input: any x such that !isnan(x)
// Output: IEEE 754 defined binary32 number, packed as ivec4(byte3,byte2,byte1,byte0)
ivec4 packIEEE754binary32(float x)
{
int e = biasedExp(x);
// sign to bit 7
int s = x<0. ? 128 : 0;
ivec4 binary32;
binary32.yzw=significandAsIVec3(x);
// clear the implicit integer bit of significand
if(binary32.y>=128) binary32.y-=128;
// put lowest bit of exponent into its position, replacing just cleared integer bit
binary32.y+=128*int(mod(float(e),2.));
// prepare high bits of exponent for fitting into their positions
e/=2;
// pack highest byte
binary32.x=e+s;
return binary32;
}
vec4 toColor(float x)
{
ivec4 binary32=packIEEE754binary32(x);
// Transform color components to [0,1] range.
// Division is inexact, but works reliably for all integers from 0 to 255 if
// the transformation to TrueColor by GPU uses rounding to nearest or upwards.
// The result will be multiplied by 255 back when transformed
// to TrueColor subpixel value by OpenGL.
return vec4(binary32)/255.;
}
I am sharing a fragment shader example, how i actually debug.
#version 410 core
uniform sampler2D samp;
in VS_OUT
{
vec4 color;
vec2 texcoord;
} fs_in;
out vec4 color;
void main(void)
{
vec4 sampColor;
if( texture2D(samp, fs_in.texcoord).x > 0.8f) //Check if Color contains red
sampColor = vec4(1.0f, 1.0f, 1.0f, 1.0f); //If yes, set it to white
else
sampColor = texture2D(samp, fs_in.texcoord); //else sample from original
color = sampColor;
}
The existing answers are all good stuff, but I wanted to share one more little gem that has been valuable in debugging tricky precision issues in a GLSL shader. With very large int numbers represented as a floating point, one needs to take care to use floor(n) and floor(n + 0.5) properly to implement round() to an exact int. It is then possible to render a float value that is an exact int by the following logic to pack the byte components into R, G, and B output values.
// Break components out of 24 bit float with rounded int value
// scaledWOB = (offset >> 8) & 0xFFFF
float scaledWOB = floor(offset / 256.0);
// c2 = (scaledWOB >> 8) & 0xFF
float c2 = floor(scaledWOB / 256.0);
// c0 = offset - (scaledWOB << 8)
float c0 = offset - floor(scaledWOB * 256.0);
// c1 = scaledWOB - (c2 << 8)
float c1 = scaledWOB - floor(c2 * 256.0);
// Normalize to byte range
vec4 pix;
pix.r = c0 / 255.0;
pix.g = c1 / 255.0;
pix.b = c2 / 255.0;
pix.a = 1.0;
gl_FragColor = pix;
The GLSL Shader source code is compiled and linked by the graphics driver and executed on the GPU.
If you want to debug the shader, then you have to use graphics debugger like RenderDoc or NVIDIA Nsight.
I found a very nice github library (https://github.com/msqrt/shader-printf)
You can use the printf function in a shader file.
sue this
vec3 dd(vec3 finalColor,vec3 valueToDebug){
//debugging
finalColor.x = (v_uv.y < 0.3 && v_uv.x < 0.3) ? valueToDebug.x : finalColor.x;
finalColor.y = (v_uv.y < 0.3 && v_uv.x < 0.3) ? valueToDebug.y : finalColor.y;
finalColor.z = (v_uv.y < 0.3 && v_uv.x < 0.3) ? valueToDebug.z : finalColor.z;
return finalColor;
}
//on the main function, second argument is the value to debug
colour = dd(colour,vec3(0.0,1.0,1.));
gl_FragColor = vec4(clamp(colour * 20., 0., 1.),1.0);
Do offline rendering to a texture and evaluate the texture's data.
You can find related code by googling for "render to texture" opengl
Then use glReadPixels to read the output into an array and perform assertions on it (since looking through such a huge array in the debugger is usually not really useful).
Also you might want to disable clamping to output values that are not between 0 and 1, which is only supported for floating point textures.
I personally was bothered by the problem of properly debugging shaders for a while. There does not seem to be a good way - If anyone finds a good (and not outdated/deprecated) debugger, please let me know.