GLSL for loop acting weird - opengl

I tried to implement something in glsl to do texture splatting, but the for loop is acting weird and gives different results for code that does exactly the same.
Code 1:
for(int i = 0; i < 5; ++i) {
if(i == 1) {
float fade = texture2D(alphaTextures[i], texCoord.st).r;
vec4 texCol = texture2D(textures[i], texCoord.ba);
texColor = mix(texColor, texCol, fade);
}
}
Code 2:
for(int i = 0; i < 6; ++i) {
if(i == 1) {
float fade = texture2D(alphaTextures[i], texCoord.st).r;
vec4 texCol = texture2D(textures[i], texCoord.ba);
texColor = mix(texColor, texCol, fade);
}
}
The if statement is just for testing purposes so that it should give the same result. The only difference is the loop condition. I really have no idea why only Code 1 gives the correct result. Here are two pictures:
Code1
Code2
The result should be like in picture 1.

According to this answer, you can't iterate over a sampler array. The index alphaTextures[i] is invalid, you can only use alphaTextures[1].
This changes in GLSL 4.00+ (OpenGL 4.0+), where you can have a variable index, but it cannot be from a shader input/derived value.

One reason could be that Graphic processors don't like branched texture fetches.
Try this instead:
for(int i = 0; i < 6; ++i) {
float fade = texture2D(alphaTextures[i], texCoord.st).r;
vec4 texCol = texture2D(textures[i], texCoord.ba);
if(i == 1) {
texColor = mix(texColor, texCol, fade);
}
}
(disclaimer) i am only guessing and this error is really weird.

Related

How to íterate through an array of sampler2DShadow in GLSL 1.40 (Open GL 3.1)?

I want to iterate through an array of sampler2DShadow for a shadow map computation (multiple lights), and it looks like the sampler can be accessed only for a constant index. I only show the relevant code for clarity:
smooth in vec4 fShadowTexCoord[5];
uniform sampler2DShadow shadowMapTexSampler[5];
void main() {
for (uint i = 0u; i < 5u; i++) { // light 'i'
float shadowCoeff = 0.0f;
// this line does not work properly
vec2 scale = 2.0f / textureSize(shadowMapTexSampler[i], 0);
float bias = 0.006f;
for (int j = -1; j <= 1; j++)
for (int k = -1; k <= 1; k++)
shadowCoeff += texture(shadowMapTexSampler[i], vec3(fShadowTexCoord[i].xy + vec2(j, k) * scale, fShadowTexCoord[i].z - bias));
shadowCoeff /= 9.0f;
// ...........
}
}
The function 'texture' works as expected, but 'textureSize' function behaves like no texture is bound.
If I replace 'i' with a constant, it works fine, but I didn't found how to cast an 'uint' to a 'const uint' in GLSL.
sorry for the necro post, but this seems to be a limitation of GLSL prior to 4.60 or so?
I do something ugly like this:
#define PCF_LOOP(idx) \
else if (faceIdx == idx) \
{ \
for (int i=0; i<PCF_SAMPLES; i++) \
{ \
...
} \
}
if (false);
PCF_LOOP(0)
PCF_LOOP(1)
PCF_LOOP(2)
PCF_LOOP(3)
PCF_LOOP(4)
PCF_LOOP(5)
it's ugly and (probably) slow, but works even on ancient hw

glsl texture access synchronization, openCL vs glsl image processing

This might be a trivial question.
I am curious about how glsl would synchronize when accessing texture data via a fragment shader.
Say I have a code like below in a fragment shader.
void main() {
vec3 texCoord = in_texCoord;
vec4 out_voxel_intensity = texture(image, vec3(texCoord.x , texCoord.y, texCoord.z));
out_voxel = float(out_voxel_intensity) ;
if(out_voxel <= threshold)
{
out_voxel = 0.0;
return;
}
for(int i = -int(kernalSize); i <= int(kernalSize);++i)
for(int j = -int(kernalSize); j <= int(kernalSize); ++j)
for(int k = -int(kernalSize); k <= int(kernalSize); ++k)
{
float x_o = texCoord.x + i / (imageSize.x);
float y_o = texCoord.y + j / (imageSize.y);
float z_o = texCoord.z + k / (imageSize.z);
if(x_o < 0.0 || x_o > 1.0
|| y_o < 0. || y_o > 1.0
|| z_o < 0. || z_o > 1.0)
continue;
if(float(texture(image, vec3(x_o, y_o, z_o))) <= threshold)
{
out_voxel = 0.0;
return;
}
}
}
as the code above access not only the current texture coordinate, but the values around it with the specified kernel size, how glsl takes care that no other parallel process access the same texture coordinates.
W.r.t that question, does the code above performs efficiently in a fragment shader given it access neighboring texture data or using openCL better?
Thanks

Very strange behaviour with sampler handling using OpenGL and GLSL

I have implemented cubemap shadow mapping successfully with just one point light.
To render this scene I use in the first render pass geometry shaders to dispatch the 6 frustrums. In the second render pass I use samplerCubeShadow in the fragment shader to computer the shadow factor.
I have OpenGL/GLSL version 4.40 with NVIDIA GeForce GTX 780M.
Here's a screenshot:
But now I want to implement multiple cubemap shadow mapping to render shadows using several point lights.
Here's some peace of code from my fragment shader:
[...]
/*
** Shadow Cube sampler array.
*/
uniform samplerCubeShadow ShadowCubeSampler[5]; //Max point light by scene = 5
[...]
float ConvertDistToClipSpace(vec3 lightDir_ws)
{
vec3 AbsVec = abs(lightDir_ws);
float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
float NormZComp = (NearFar.y + NearFar.x)/(NearFar.y - NearFar.x)
- (2.0f * NearFar.y * NearFar.x)/(LocalZcomp * NearFar.y - NearFar.x);
return ((NormZComp + 1) * 0.5f);
}
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, int idx)
{
vec3 lightToVertexDir_ws = vertexPosition_ws - LightPos_ws.xyz;
float LightToVertexClipDist = ConvertDistToClipSpace(lightToVertexDir_ws);
float LightToOccluderClipDist = texture(
ShadowCubeSampler[idx], vec4(lightToVertexDir_ws, LightToVertexClipDist));
if (LightToOccluderClipDist < LightToVertexClipDist)
{
shadowFactor = 0.0f;
}
return (shadowFactor);
}
void main(void)
{
[...]
for (int idx = 0; idx < 1; idx++) //Test first with 1 point light
{
float ShadowFactor = GetCubeShadowFactor(Position_ws.xyz, ShadowFactor, idx);
}
[...]
}
The problem is I have the error 1282 (INVALID_OPERATION). To resume the situation here, I want to display exactly the same scene like in the picture above with a SINGLE point light but this time using an array of samplerCubeShadow. What is amazing is if I replace the first parameter of the function 'texture' 'ShadowCubeSampler[idx]' by 'ShadowCubeSampler[0]' is works! However the value of 'idx' is always '0'. I tried the following code without success:
int toto = 0;
float LightToOccluderClipDist = texture(ShadowCubeSampler[toto], vec4(lightToVertexDir_ws, LightToVertexClipDist));
I already have the error 1282! The type of the index is the same (int)!
I have already use arrays of 'sampler2DShadow' or 'sampler2D' without problem.
So, Why it does not work correctly using 'samplerCubeShadow' and the solution 'ShadowCubeSampler[0]' works and not the others ?
PS: If I define an array of 2 and if I use 2 cubemaps so 2 point lights, it works. So, if I load a number of cubemaps inferior to the number specified in the fragment shader it fails!
I have no compilation error and no linkage error. Here's the code I use to check shader programs state:
void video::IEffectBase::Log(void) const
{
GLint errorLink = 0;
glGetProgramiv(this->m_Handle, GL_LINK_STATUS, &errorLink);
if (errorLink != GL_TRUE)
{
GLint sizeError = 0;
glGetProgramiv(this->m_Handle, GL_INFO_LOG_LENGTH, &sizeError);
char *erreur = new char[sizeError + 1];
glGetShaderInfoLog(this->m_Handle, sizeError, &sizeError, erreur);
erreur[sizeError] = '\0';
std::cerr << erreur << std::endl;
glDeleteProgram(this->m_Handle);
delete[] erreur;
}
}
And about the texture unit limits:
std::cout << GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS << std::endl;
std::cout << GL_MAX_TEXTURE_IMAGE_UNITS << std::endl;
$> 35660
$> 34930
If I use 'ShadowCubeSampler[0]', '0' written directly in the code I have the same display like the picture a the beginning of the my post without error. If I use 'ShadowCubeSampler[idx]' with idx = 0 I have the following display:
As you can see, all the geometry sharing this shader has not been rendered. However I don't have any linkage error. How can you explain that ? Is it possible the system unlink the shader program?
UPDATE
Let's suppose my array of samplerCubeShadow can contain 2 maximum samplers (uniform samplerCubeShadow tex_shadow2).
I noticed if I load just one point light, so one cubemap:
CASE 1
uniform samplerCubeShadow tex_shadow[1]; //MAX POINT LIGHT = 1
for (int i=0; i < 1; i++) {tex_shadow[i];} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 2
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 1; i++) {tex_shadow[i];} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 3
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 2; i++) {tex_shadow[i];} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[0], ...);} //OK
Conclusion: if the max number of sampler is equal to the number of sampler loaded, I can loop over the samplers contained in my array. If the number is inferior, it does not work! I can use a maximum of 32 texture units for each use of shader program. I have the same problem using the samplerCube keyword.
It's very strange because I don't have any problem using sampler2D or sampler2DShadow for spot light shadow computation.
I check with NSight where I put a break point in the fragment shader file and of course the break point is neaver reached. It's like the shader program is not linked but it's not the case.
Do you think it could be a problem concerning cubeMap samplers in general or the problem comes from the cubemap initialization ?
Does anyone can help me?
i have never use an array inside of glsl and infortuntly i dont have the equipments now to do so,
but have you tried using an unsigned int uint in glsl.
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, uint idx) {
....
}
also note that you cannot use infinite samplers in you shaders.
OpenGL has, depending on the targeted version, special restrictions on arrays of opaque types (textures are one of them). Before OpenGL 4 looping over such arrays is not possible. You can check the details here: OpenGL Wiki - Data Types

Compute shaders : error in the initialization of textures

I have an image2DArray in my compute shaders with 7 slices.
I can write in it with the function imageStore without problem and also display these textures.
My problem comes with the initialization, I try to initialize my textures but I can't. Indeed, I make a loop for the initialization :
for(int i=0; i<N; i++){
imageStore( outputTexture , ivec3(texel, i), vec4(0));
}
When N = 7, nothing is displayed but when N < 7 everything works well and my textures initialized.
Is someone can explain me why I can't initialize correctly my image2DArray ?
Edit :
What I test to see that : try to write in all slices of the texture and display it. It works fine but data from the previous frame stay if I don't initialize the texture. So, I initialize all pixels of the slices to 0 but nothing display anymore if N=7.
Some code :
#version 430 compatibility
layout(rgba8) coherent uniform image2DArray outputTexture;
...
void main(){
ivec2 texel = ivec2(gl_GlobalInvocationID.xy);
ivec2 outSize = imageSize( outputTexture ).xy;
if( texel.x >= outSize.x || texel.y >= outSize.y )
return;
initializeMeshSet( meshSet );
vec4 pWorld = texelFetch(gBuffer[0],texel,0);
pWorld /= pWorld.w;
vec4 nWorld = texelFetch(gBuffer[1],texel,0);
nWorld /= nWorld.w;
if( length(nWorld.xyz) < 0.1 ){
for(int i=0; i<4; i++){
imageStore( outputTexture , ivec3(texel, i), vec4(0));
}
return;
}
if(nbFrame == 0){
float value = treatment(texel, pWorld, nWorld.xyz, outSize.x);
imageStore( outputTexture, ivec3(texel, 0), vec4(vec3(value),1.0));
imageStore( outputTexture, ivec3(texel, 1), vec4(0.0,0.0,0.0, 1.0));
}
else if(nbFrame == 1){
float value = treatment2(texel, pWorld, nWorld.xyz, outSize.x);
vec3 previousValue = imageLoad(outputTexture, ivec3(texel, 1)).xyz * (nbFrame - 1);
value += previousValue;
value /= nbFrame;
imageStore( outputTexture, ivec3(texel, 1), vec4(vec3(value), 1.0));
}
}

Weird performance drop, caused by a single for loop

I'm currently writing an OpenGL 3.1 (with GLSL version 330) application on linux, (NVIDIA 360M card, with the 313.0 nv driver) that has about 15k lines. My problem is that in one of my vertex shaders, I can experience drastical perforamce drops by making minimal changes in the code that should actually be no-op.
For example:
// With this solution my program runs with 3-5 fps
for(int i = 0; i < 4; ++i) {
vout.shadowCoord[i] = uShadowCP[i] * w_pos;
}
// But with this it runs with 30+ fps
vout.shadowCoord[0] = uShadowCP[0] * w_pos;
vout.shadowCoord[1] = uShadowCP[1] * w_pos;
vout.shadowCoord[2] = uShadowCP[2] * w_pos;
vout.shadowCoord[3] = uShadowCP[3] * w_pos;
// This works with 30+ fps too
vec4 shadowCoords[4];
for(int i = 0; i < 4; ++i) {
shadowCoords[i] = uShadowCP[i] * w_pos;
}
for(int i = 0; i < 4; ++i) {
vout.shadowCoord[i] = shadowCoords[i];
}
Or consider this:
uniform int uNumUsedShadowMaps = 4; // edit: I called this "random_uniform" in the original question
// 8 fps
for(int i = 0; i < min(uNumUsedShadowMaps, 4); ++i) {
vout.shadowCoord[i] = vec4(1.0);
}
// 30+ fps
for(int i = 0; i < 4; ++i) {
if(i < uNumUsedShadowMaps) {
vout.shadowCoord[i] = vec4(1.0);
} else {
vout.shadowCoord[i] = vec4(0.0);
}
}
See the entire shader code here, where this problem appeared:
http://pastebin.com/LK5CNJPD
Like any idea would be appreciated, about what can cause these.
I finally managed to find what was the source of the problem, and also found a solution to it.
But before jumping in right for the solution, please let me paste the most minimal shader code, which with, I could reproduce this 'bug'.
Vertex Shader:
#version 330
vec3 CountPosition(); // Irrelevant how it is implemented.
uniform mat4 uProjectionMatrix, uCameraMatrix;
out VertexData {
vec3 c_pos, w_pos;
vec4 shadowCoord[4];
} vout;
void main() {
vout.w_pos = CountPosition();
vout.c_pos = (uCameraMatrix * vec4(vout.w_pos, 1.0)).xyz;
vec4 w_pos = vec4(vout.w_pos, 1.0);
// 20 fps
for(int i = 0; i < 4; ++i) {
vout.shadowCoord[i] = uShadowCP[i] * w_pos;
}
// 50 fps
vout.shadowCoord[0] = uShadowCP[0] * w_pos;
vout.shadowCoord[1] = uShadowCP[1] * w_pos;
vout.shadowCoord[2] = uShadowCP[2] * w_pos;
vout.shadowCoord[3] = uShadowCP[3] * w_pos;
gl_Position = uProjectionMatrix * vec4(vout.c_pos, 1.0);
}
Fragment Shader:
#version 330
in VertexData {
vec3 c_pos, w_pos;
vec4 shadowCoord[4];
} vin;
out vec4 frag_color;
void main() {
frag_color = vec4(1.0);
}
And funny thing is that with only a minimal modification of the vertex shader is needed to make both solutions work with 50 fps. The main function should be modified to be like this:
void main() {
vec4 w_pos = vec4(CountPosition(), 1.0);
vec4 c_pos = uCameraMatrix * w_pos;
vout.w_pos = vec3(w_pos);
vout.c_pos = vec3(c_pos);
// 50 fps
for(int i = 0; i < 4; ++i) {
vout.shadowCoord[i] = uShadowCP[i] * w_pos;
}
// 50 fps
vout.shadowCoord[0] = uShadowCP[0] * w_pos;
vout.shadowCoord[1] = uShadowCP[1] * w_pos;
vout.shadowCoord[2] = uShadowCP[2] * w_pos;
vout.shadowCoord[3] = uShadowCP[3] * w_pos;
gl_Position = uProjectionMatrix * c_pos;
}
What's the difference is that the upper code reads from the shaders out varyings, while the bottom one saves those values in temporary variables, and only writes to the out varyings.
The conclusion:
Reading a shader's out varying is often seen to be used as an optimisation to get off with one less temporary variable, or at least I have seen it at many places on the internet. Despite of the previous fact, reading an out varying might actually be an invalid OpenGL operation, and might get the GL into an undefined state, in which random changes in the code can trigger bad things.
The best thing about this, is that the GLSL 330 specification doesn't say anything about reading from an out varying, that was previously written into. Probably because it's not something I should be doing.
P.S.
Also note that the second example in the original code might look totally different, but it works exactly same in this small code snippet, if the out varyings are read, it gets quite slow with the i < min(uNumUsedShadowMaps, 4) as condition in the for loop, however if the out varyings are only written, it doesn't make any change in the performace, and the i < min(uNumUsedShadowMaps, 4) one works with 50 fps too.