How to send const data to shaders? - opengl

I want to send const int variable from CPU side to shader so I could initialize the array in the shader conveniently.
But if sending with usual glUniform1ui(programm, N) shader compiler says that N must be const.
#version 450 core
uniform const int N;
int myArray[N];
void main() {
//...
}
Is this possible ? If yes what are the workarounds ?
p.s.
I know that this is not related to the definition of uniform variables which clarifies that uniforms are immutable per shaders executing

A constant is as the name says constant and cannot be changed from outside via one of the glUniform methods and is NOT a uniform.
If you want to change the constant value, you've to recompile the hole shader and changing the shaders text before.
Sample ( Pseudocode )
GLchar** shaderSources = new GLChar*[3];
shaderSources[0] = "#version 450 core\n";
shaderSources[1] = "#define ARRAY_LENGTH 5\n";
shaderSources[2] = myShaderCode;
glShaderSource(shader, 2, shaderSources, LENGTH_OF_EACH(shaderSources, 2));
Shader:
int myArray[ARRAY_LENGTH];
void main()
{
for (int i = 0; i < ARRAY_LENGTH; i++) myArray[i] ....;
}
A different approach is using textures ( or SSBO ) instead of arrays and get values of the texture without interpolation between the values.

Related

vkCreateComputePipelines takes too long

I encountered a strange problem with compiling Vulkan compute shader.
I have this shader (which is not even all that complex)
#version 450
#extension GL_GOOGLE_include_directive : enable
//#extension GL_EXT_debug_printf : enable
#extension GL_KHR_shader_subgroup_basic : enable
#extension GL_KHR_shader_subgroup_arithmetic : enable
#define IS_AVAILABLE_BUFFER_ANN_ENTITIES
#define IS_AVAILABLE_BUFFER_GLOBAL_MUTABLES
#define IS_AVAILABLE_BUFFER_BONES
#define IS_AVAILABLE_BUFFER_WORLD
//#define IS_AVAILABLE_BUFFER_COLLISION_GRID
#include "descriptors_compute.comp"
layout (local_size_x_id = GROUP_SIZE_CONST_ID) in;
#include "utils.comp"
shared float[ANN_MAX_SIZE] tmp1;
shared float[ANN_MAX_SIZE] tmp2;
shared uint[ANN_TOUCHED_BLOCK_COUNT] touched_block_ids;
mat3 rotation_mat_from_yaw_and_pitch(vec2 yaw_and_pitch){
const vec2 Ss = sin(yaw_and_pitch); // let S denote sin(yaw) and s denote sin(pitch)
const vec2 Cc = cos(yaw_and_pitch); // let C denote cos(yaw) and c denote cos(pitch)
const vec4 Cs_cC_Sc_sS = vec4(Cc,Ss) * vec4(Ss.y,Cc,Ss.x);
return mat3(Cs_cC_Sc_sS.y,-Ss.y,-Cs_cC_Sc_sS.z,Cs_cC_Sc_sS.x,Cc.y,-Cs_cC_Sc_sS.w,Ss.x,0,Cc.x);
}
void main() {
const uint entity_id = gl_WorkGroupID.x;
const uint lID = gl_LocalInvocationID.x;
const uint entities_count = global_mutables.ann_entities;
if (entity_id < entities_count){
const AnnEntity entity = ann_entities[entity_id];
const Bone bone = bones[entity.bone_idx];
const mat3 rotation = rotation_mat_from_yaw_and_pitch(bone.yaw_and_pitch);
const uint BLOCK_TOUCH_SENSE_OFFSET = 0;
const uint LIDAR_LENGTH_SENSE_OFFSET = BLOCK_EXTENDED_SENSORY_FEATURES_LEN*ANN_TOUCHED_BLOCK_COUNT;
for(uint i=lID;i<ANN_LIDAR_COUNT;i+=GROUP_SIZE){
const vec3 rotated_lidar_direction = rotation * entity.lidars[i].direction;
const RayCastResult ray = ray_cast(bone.new_center, rotated_lidar_direction);
tmp1[LIDAR_LENGTH_SENSE_OFFSET+i] = ray.ratio_of_traversed_length;
}
for(uint i = lID;i<ANN_OUTPUT_SIZE;i+=GROUP_SIZE){
const AnnSparseOutputNeuron neuron = entity.ann_output[i];
float sum = neuron.bias;
for(uint j=0;j<neuron.incoming.length();j++){
sum += tmp1[neuron.incoming[j].src_neuron] * neuron.incoming[j].weight;
}
tmp2[i] = max(0,sum);//ReLU activation
}
vec2 rotation_change = vec2(0,0);
for(uint i = lID;i<ANN_OUTPUT_ROTATION_MUSCLES_SIZE;i+=GROUP_SIZE){
rotation_change += tmp2[ANN_OUTPUT_ROTATION_MUSCLES_OFFSET+i] * ANN_IMPULSES_OF_ROTATION_MUSCLES[i];
}
rotation_change = subgroupAdd(rotation_change);
if(lID==0){
bones[entity.bone_idx].yaw_and_pitch += rotation_change;
}
}
}
The function ray_cast is probably the most complex part of this shader, but I also reuse this exact same function in many other shaders that compile instantly. I was wondering whether GL_KHR_shader_subgroup_arithmetic might be slowing down vkCreateComputePipelines, but if removing it makes no difference. It takes Vulkan over a minute to finish vkCreateComputePipelines. I also have a bunch of utility functions included but I only use a few constants from there and ray_cast, so 90% of that code is unused and should be removed by glslc. Could it be that Vulkan is quietly trying to perform any other kind of optimisation and it's causing the delay? I thought that all optimisations are done by glslc and there is not much postprocessing done on SPIR-V. I use
Nvidia with their proprietary drivers by the way.
It really puzzles me why this shader is so slow to create, even though I have other shaders that are ten times longer and more complex and yet they load instantly.
Is there any way to profile this?
Upon closer inspection I noticed that normally all the generated SPIR-V files for my shaders take about 10-30KB. However, this one shader takes 178KB.
With help of spirv-dis I looked inside the generated assembly and noticed that vast majority of the op-codes was OpConstant. It was because I had structs that looked like
struct AnnSparseOutputNeuron{
AnnSparseConnection[ANN_LATENT_CONNECTIONS_PER_OUTPUT_NEURON] incoming;
float bias;
};
They contain large arrays. As a result both
const AnnEntity entity = ann_entities[entity_id];
and
const AnnSparseOutputNeuron neuron = entity.ann_output[i];
would be compiled to lots of op-codes that write those constant values for every single element of the array. So instead of writing code of the form
const A a = buffer_of_As[i];
f(a.some_filed)
it's better to use
f(buffer_of_As[i].some_filed)
This seems to have solved the problem. I thought that glslc would be smart enough to figure out such optimizations but apparently it's not.

Can't loop over a sampler2D array after specifying WebGL2 context in Three.js

I have been using a sampler2D array in my fragment shader (those are shadow maps, there can be up to 16 of them, so an array is more preferable than using 16 separate variables, of course). Then I added the WebGL2 context (const context = canvas.getContext('webgl2');) to the THREE.WebGLRenderer that I'm using and now I can't get the program to work: it says array index for samplers must be constant integral expressions when I attempt to access the sampler array elements in a loop like this:
uniform sampler2D samplers[MAX_SPLITS];
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
...
}
}
Is there really no way around this? Do I have to use sixteen separate variables?
(there is no direct #version directive in the shader but THREE.js seems to add a #version 300 es by default)
You can't use dynamic indexing with samplers in GLSL ES 3.0. From the spec
12.29 Samplers
Should samplers be allowed as l-values? The specification already allows an equivalent behavior:
Current specification:
uniform sampler2D sampler[8];
int index = f(...);
vec4 tex = texture(sampler[index], xy); // allowed
Using assignment of sampler types:
uniform sampler2D s;
s = g(...);
vec4 tex = texture(s, xy); // not allowed
RESOLUTION: Dynamic indexing of sampler arrays is now prohibited by the specification. Restrict indexing of sampler arrays to constant integral expressions.
and
12.30 Dynamic Indexing
For GLSL ES 1.00, support of dynamic indexing of arrays, vectors and matrices was not mandated
because it was not directly supported by some implementations. Software solutions (via program
transforms) exist for a subset of cases but lead to poor performance. Should support for dynamic indexing
be mandated for GLSL ES 3.00?
RESOLUTION: Mandate support for dynamic indexing of arrays except for sampler arrays, fragment
output arrays and uniform block arrays.
Should support for dynamic indexing of vectors and matrices be mandated in GLSL ES 3.00?
RESOLUTION: Yes.
Indexing of arrays of samplers by constant-index-expressions is supported
in GLSL ES 1.00. A constant index-expression is an expression formed from
constant-expressions and certain loop indices, defined for
a subset of loop constructs. Should this functionality be included in GLSL ES 3.00?
RESOLUTION: No. Arrays of samplers may only be indexed by constant-integral-expressions.
Can you use a 2D_ARRAY texture to solve your issue? Put each of your current 2D textures into a layer of a 2D_ARRAY texture then the z coord is just an integer layer index. Advantage, you can use many more layers with a 2D_ARRAY then you get samplers. WebGL2 implementations generally only have 32 samplers but allow hundreds or thousands of layers in a 2D_ARRAY texture.
or use GLSL 1.0
const vs1 = `
void main() { gl_Position = vec4(0); }
`;
const vs3 = `#version 300 es
void main() { gl_Position = vec4(0); }
`;
const fs1 = `
precision highp float;
#define MAX_SPLITS 4
uniform sampler2D samplers[MAX_SPLITS];
uniform int splitCount;
bool func(sampler2D s) {
return texture2D(s, vec2(0)).r > 0.5;
}
void main() {
float v = 0.0;
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
v += 1.0;
}
}
gl_FragColor = vec4(v);
}
`;
const fs3 = `#version 300 es
precision highp float;
#define MAX_SPLITS 4
uniform sampler2D samplers[MAX_SPLITS];
uniform int splitCount;
bool func(sampler2D s) {
return texture(s, vec2(0)).r > 0.5;
}
out vec4 color;
void main() {
float v = 0.0;
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
v += 1.0;
}
}
color = vec4(v);
}
`;
function main() {
const gl = document.createElement('canvas').getContext('webgl2');
if (!gl) {
return alert('need WebGL2');
}
test('glsl 1.0', vs1, fs1);
test('glsl 3.0', vs3, fs3);
function test(msg, vs, fs) {
const p = twgl.createProgram(gl, [vs, fs]);
log(msg, ':', p ? 'success' : 'fail');
}
}
main();
function log(...args) {
const elem = document.createElement('pre');
elem.textContent = [...args].join(' ');
document.body.appendChild(elem);
}
<script src="https://twgljs.org/dist/4.x/twgl.min.js"></script>

Strange behaviour using in/out block data with OpenGL/GLSL

I have implemented normal mapping shader in my OpenGL/GLSL application. To compute the bump and shadow factor in the fragment shader I need to send from the vertex shader some data like the light direction in tangent space and the vertex position in light space for each light of my scene. So to do job I need the declare 2 output variables like below (vertex shader):
#define MAX_LIGHT_COUNT 5
[...]
out vec4 ShadowCoords[MAX_LIGHT_COUNT]; //Vertex position in light space
out vec3 lightDir_TS[MAX_LIGHT_COUNT]; //light direction in tangent space
uniform int LightCount;
[...]
for (int idx = 0; idx < LightCount; idx++)
{
[...]
lightDir_TS[idx] = TBN * lightDir_CS;
ShadowCoords[idx] = ShadowInfos[idx].ShadowMatrix * VertexPosition;
[...]
}
And in the fragment shader I recover these variables thanks to the followings input declarations:
in vec3 lightDir_TS[MAX_LIGHT_COUNT];
in vec4 ShadowCoords[MAX_LIGHT_COUNT];
The rest of the code is not important to explain my problem.
So now here's the result in image:
As you can see until here all is ok!
But now, for a sake of simplicity I want to use a single output declaration rather than 2! So the logical choice is to use an input/output data block like below:
#define MAX_LIGHT_COUNT 5
[...]
out LightData_VS
{
vec3 lightDir_TS;
vec4 ShadowCoords;
} LightData_OUT[MAX_LIGHT_COUNT];
uniform int LightCount;
[...]
for (int idx = 0; idx < LightCount; idx++)
{
[...]
LightData_OUT[idx].lightDir_TS = TBN * lightDir_CS;
LightData_OUT[idx].ShadowCoords = ShadowInfos[idx].ShadowMatrix * VertexPosition;
[...]
}
And in the fragment shader the input data block:
in LightData_VS
{
vec3 lightDir_TS;
vec4 ShadowCoords;
} LightData_IN[MAX_LIGHT_COUNT];
But this time when I execute my program I have the following display:
As you can see the specular light is not the same than in the first case above!
However I noticed if I replace the line:
for (int idx = 0; idx < LightCount; idx++) //Use 'LightCount' uniform variable
by the following one:
for (int idx = 0; idx < 1; idx++) //'1' value hard coded
or
int count = 1;
for (int idx = 0; idx < count; idx++)
the shading result is correct!
The problem seems to come from the fact I use uniform variable in the 'for' condition. However this works when I used seperates output variables like in the first case!
I checked: the uniform variable 'LightCount' is correct and equal to '1'; (I tried unsigned int data type without success and it's the same thing using a 'while' loop)
How can you explain a such result?
I use:
OpenGL: 4.4.0 NVIDIA driver 344.75
GLSL: 4.40 NVIDIA via Cg compiler
I already used input/output data block without problem but it was not arrays but just simple blocks like below:
[in/out] VertexData_VS
{
vec3 viewDir_TS;
vec4 Position_CS;
vec3 Normal_CS;
vec2 TexCoords;
} VertexData_[IN/OUT];
Do you think it's not possible to use input/output data blocks as arrays in a loop using a uniform variable in the for conditions ?
UPDATE
I tried using 2 vec4 (for a sake of data alignment like for uniform block (for this case data need to be aligned on a vec4)) into the data structure like below:
[in/out] LightData_VS
{
vec4 lightDir_TS; //vec4((TBN * lightDir_CS), 0.0f);
vec4 ShadowCoords;
} LightData_[IN/OUT][MAX_LIGHT_COUNT];
without success...
UPDATE 2
Here's the code concerning shader compilation log:
core::FileSystem file(filename);
std::ifstream ifs(file.GetFullName());
if (ifs)
{
GLint compilationError = 0;
std::string fileContent, line;
char const *sourceCode;
while (std::getline(ifs, line, '\n'))
fileContent.append(line + '\n');
sourceCode = fileContent.c_str();
ifs.close();
this->m_Handle = glCreateShader(this->m_Type);
glShaderSource(this->m_Handle, 1, &sourceCode, 0);
glCompileShader(this->m_Handle);
glGetShaderiv(this->m_Handle, GL_COMPILE_STATUS, &compilationError);
if (compilationError != GL_TRUE)
{
GLint errorSize = 0;
glGetShaderiv(this->m_Handle, GL_INFO_LOG_LENGTH, &errorSize);
char *errorStr = new char[errorSize + 1];
glGetShaderInfoLog(this->m_Handle, errorSize, &errorSize, errorStr);
errorStr[errorSize] = '\0';
std::cout << errorStr << std::endl;
delete[] errorStr;
glDeleteShader(this->m_Handle);
}
}
And the code concerning the program log:
GLint errorLink = 0;
glGetProgramiv(this->m_Handle, GL_LINK_STATUS, &errorLink);
if (errorLink != GL_TRUE)
{
GLint sizeError = 0;
glGetProgramiv(this->m_Handle, GL_INFO_LOG_LENGTH, &sizeError);
char *error = new char[sizeError + 1];
glGetShaderInfoLog(this->m_Handle, sizeError, &sizeError, error);
error[sizeError] = '\0';
std::cerr << error << std::endl;
glDeleteProgram(this->m_Handle);
delete[] error;
}
Unfortunatly, I don't have any error log!

Very strange behaviour with sampler handling using OpenGL and GLSL

I have implemented cubemap shadow mapping successfully with just one point light.
To render this scene I use in the first render pass geometry shaders to dispatch the 6 frustrums. In the second render pass I use samplerCubeShadow in the fragment shader to computer the shadow factor.
I have OpenGL/GLSL version 4.40 with NVIDIA GeForce GTX 780M.
Here's a screenshot:
But now I want to implement multiple cubemap shadow mapping to render shadows using several point lights.
Here's some peace of code from my fragment shader:
[...]
/*
** Shadow Cube sampler array.
*/
uniform samplerCubeShadow ShadowCubeSampler[5]; //Max point light by scene = 5
[...]
float ConvertDistToClipSpace(vec3 lightDir_ws)
{
vec3 AbsVec = abs(lightDir_ws);
float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
float NormZComp = (NearFar.y + NearFar.x)/(NearFar.y - NearFar.x)
- (2.0f * NearFar.y * NearFar.x)/(LocalZcomp * NearFar.y - NearFar.x);
return ((NormZComp + 1) * 0.5f);
}
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, int idx)
{
vec3 lightToVertexDir_ws = vertexPosition_ws - LightPos_ws.xyz;
float LightToVertexClipDist = ConvertDistToClipSpace(lightToVertexDir_ws);
float LightToOccluderClipDist = texture(
ShadowCubeSampler[idx], vec4(lightToVertexDir_ws, LightToVertexClipDist));
if (LightToOccluderClipDist < LightToVertexClipDist)
{
shadowFactor = 0.0f;
}
return (shadowFactor);
}
void main(void)
{
[...]
for (int idx = 0; idx < 1; idx++) //Test first with 1 point light
{
float ShadowFactor = GetCubeShadowFactor(Position_ws.xyz, ShadowFactor, idx);
}
[...]
}
The problem is I have the error 1282 (INVALID_OPERATION). To resume the situation here, I want to display exactly the same scene like in the picture above with a SINGLE point light but this time using an array of samplerCubeShadow. What is amazing is if I replace the first parameter of the function 'texture' 'ShadowCubeSampler[idx]' by 'ShadowCubeSampler[0]' is works! However the value of 'idx' is always '0'. I tried the following code without success:
int toto = 0;
float LightToOccluderClipDist = texture(ShadowCubeSampler[toto], vec4(lightToVertexDir_ws, LightToVertexClipDist));
I already have the error 1282! The type of the index is the same (int)!
I have already use arrays of 'sampler2DShadow' or 'sampler2D' without problem.
So, Why it does not work correctly using 'samplerCubeShadow' and the solution 'ShadowCubeSampler[0]' works and not the others ?
PS: If I define an array of 2 and if I use 2 cubemaps so 2 point lights, it works. So, if I load a number of cubemaps inferior to the number specified in the fragment shader it fails!
I have no compilation error and no linkage error. Here's the code I use to check shader programs state:
void video::IEffectBase::Log(void) const
{
GLint errorLink = 0;
glGetProgramiv(this->m_Handle, GL_LINK_STATUS, &errorLink);
if (errorLink != GL_TRUE)
{
GLint sizeError = 0;
glGetProgramiv(this->m_Handle, GL_INFO_LOG_LENGTH, &sizeError);
char *erreur = new char[sizeError + 1];
glGetShaderInfoLog(this->m_Handle, sizeError, &sizeError, erreur);
erreur[sizeError] = '\0';
std::cerr << erreur << std::endl;
glDeleteProgram(this->m_Handle);
delete[] erreur;
}
}
And about the texture unit limits:
std::cout << GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS << std::endl;
std::cout << GL_MAX_TEXTURE_IMAGE_UNITS << std::endl;
$> 35660
$> 34930
If I use 'ShadowCubeSampler[0]', '0' written directly in the code I have the same display like the picture a the beginning of the my post without error. If I use 'ShadowCubeSampler[idx]' with idx = 0 I have the following display:
As you can see, all the geometry sharing this shader has not been rendered. However I don't have any linkage error. How can you explain that ? Is it possible the system unlink the shader program?
UPDATE
Let's suppose my array of samplerCubeShadow can contain 2 maximum samplers (uniform samplerCubeShadow tex_shadow2).
I noticed if I load just one point light, so one cubemap:
CASE 1
uniform samplerCubeShadow tex_shadow[1]; //MAX POINT LIGHT = 1
for (int i=0; i < 1; i++) {tex_shadow[i];} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 2
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 1; i++) {tex_shadow[i];} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[i], ...);} //NOT OK - 1282
for (int i=0; i < 1; i++) {texture(tex_shadow[0], ...);} //OK
CASE 3
uniform samplerCubeShadow tex_shadow[2]; //MAX POINT LIGHT = 2
for (int i=0; i < 2; i++) {tex_shadow[i];} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[i], ...);} //OK
for (int i=0; i < 2; i++) {texture(tex_shadow[0], ...);} //OK
Conclusion: if the max number of sampler is equal to the number of sampler loaded, I can loop over the samplers contained in my array. If the number is inferior, it does not work! I can use a maximum of 32 texture units for each use of shader program. I have the same problem using the samplerCube keyword.
It's very strange because I don't have any problem using sampler2D or sampler2DShadow for spot light shadow computation.
I check with NSight where I put a break point in the fragment shader file and of course the break point is neaver reached. It's like the shader program is not linked but it's not the case.
Do you think it could be a problem concerning cubeMap samplers in general or the problem comes from the cubemap initialization ?
Does anyone can help me?
i have never use an array inside of glsl and infortuntly i dont have the equipments now to do so,
but have you tried using an unsigned int uint in glsl.
float GetCubeShadowFactor(vec3 vertexPosition_ws, float shadowFactor, uint idx) {
....
}
also note that you cannot use infinite samplers in you shaders.
OpenGL has, depending on the targeted version, special restrictions on arrays of opaque types (textures are one of them). Before OpenGL 4 looping over such arrays is not possible. You can check the details here: OpenGL Wiki - Data Types

Order independent transparency with MSAA

I have implemented OIT based on the demo in "OpenGL Programming Guide" 8th edition.(The red book).Now I need to add MSAA.Just enabling MSAA screws up the transparency as the layered pixels are resolved x times equal to the number of sample levels.I have read this article on how it is done with DirectX where they say the pixel shader should be run per sample and not per pixel.How id it done in OpenGL.
I won't put out here the whole implementation but the fragment shader chunk in which the final resolution of the layered pixels occurs:
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
/// Retrieving the next fragment from the stack:
vec4 modulator = unpackUnorm4x8(fragment_list[i].y) ;
/// Perform alpha blending:
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color ;
Update:
I have tried the solution proposed here but it still doesn't work.Here are the full fragment shader for the list build and resolve passes:
List build pass :
#version 420 core
layout (early_fragment_tests) in;
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
layout (binding = 1, rgba32ui) uniform writeonly uimageBuffer list_buffer;
layout (binding = 0, offset = 0) uniform atomic_uint list_counter;
layout (location = 0) out vec4 color;//dummy output
in vec3 frag_position;
in vec3 frag_normal;
in vec4 surface_color;
in int gl_SampleMaskIn[];
uniform vec3 light_position = vec3(40.0, 20.0, 100.0);
void main(void)
{
uint index;
uint old_head;
uvec4 item;
vec4 frag_color;
index = atomicCounterIncrement(list_counter);
old_head = imageAtomicExchange(head_pointer_image, ivec2(gl_FragCoord.xy), uint(index));
vec4 modulator =surface_color;
item.x = old_head;
item.y = packUnorm4x8(modulator);
item.z = floatBitsToUint(gl_FragCoord.z);
item.w = int(gl_SampleMaskIn[0]);
imageStore(list_buffer, int(index), item);
frag_color = modulator;
color = frag_color;
}
List resolve :
#version 420 core
// The per-pixel image containing the head pointers
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
// Buffer containing linked lists of fragments
layout (binding = 1, rgba32ui) uniform uimageBuffer list_buffer;
// This is the output color
layout (location = 0) out vec4 color;
// This is the maximum number of overlapping fragments allowed
#define MAX_FRAGMENTS 40
// Temporary array used for sorting fragments
uvec4 fragment_list[MAX_FRAGMENTS];
void main(void)
{
uint current_index;
uint fragment_count = 0;
current_index = imageLoad(head_pointer_image, ivec2(gl_FragCoord).xy).x;
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
int coverage = int(fragment.w);
//if((coverage &(1 << gl_SampleID))!=0) {
fragment_list[fragment_count] = fragment;
current_index = fragment.x;
//}
fragment_count++;
}
uint i, j;
if (fragment_count > 1)
{
for (i = 0; i < fragment_count - 1; i++)
{
for (j = i + 1; j < fragment_count; j++)
{
uvec4 fragment1 = fragment_list[i];
uvec4 fragment2 = fragment_list[j];
float depth1 = uintBitsToFloat(fragment1.z);
float depth2 = uintBitsToFloat(fragment2.z);
if (depth1 < depth2)
{
fragment_list[i] = fragment2;
fragment_list[j] = fragment1;
}
}
}
}
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
vec4 modulator = unpackUnorm4x8(fragment_list[i].y);
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color;
}
Without knowing how your code actually works, you can do it very much the same way that your linked DX11 demo does, since OpenGL provides the same features needed.
So in the first shader that just stores all the rendered fragments, you also store the sample coverage mask for each fragment (along with the color and depth, of course). This is given as fragment shader input variable int gl_SampleMaskIn[] and for each sample with id 32*i+j, bit j of glSampleMaskIn[i] is set if the fragment covers that sample (since you probably won't use >32xMSAA, you can usually just use glSampleMaskIn[0] and only need to store a single int as coverage mask).
...
fragment.color = inColor;
fragment.depth = gl_FragCoord.z;
fragment.coverage = gl_SampleMaskIn[0];
...
Then the final sort and render shader is run for each sample instead of just for each fragment. This is achieved implicitly by making use of the input variable int gl_SampleID, which gives us the ID of the current sample. So what we do in this shader (in addition to the non-MSAA version) is that the sorting step just accounts for the sample, by only adding a fragment to the final (to be sorted) fragment list if the current sample is actually covered by this fragment:
What was something like (beware, pseudocode extrapolated from your small snippet and the DX-link):
while(fragment.next != 0xFFFFFFFF)
{
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
is now
while(fragment.next != 0xFFFFFFFF)
{
if(fragment.coverage & (1 << gl_SampleID))
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
Or something along those lines.
EDIT: To your updated code, you have to increment fragment_count only inside the if(covered) block, since we don't want to add the fragment to the list if the sample is not covered. Incrementing it always will likely result in the artifacts you see at the edges, which are the regions where the MSAA (and thus the coverage) comes into play.
On the other hand the list pointer has to be forwarded (current_index = fragment.x) in each loop iteration and not only if the sample is covered, as otherwise it can result in an infinite loop, like in your case. So your code should look like:
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
uint coverage = fragment.w;
if((coverage &(1 << gl_SampleID))!=0)
fragment_list[fragment_count++] = fragment;
current_index = fragment.x;
}
The OpenGL 4.3 Spec states in 7.1 about the gl_SampleID builtin variable:
Any static use of this variable in a fragment shader causes the entire shader to be evaluated per-sample.
(This has already been the case in the ARB_sample_shading and is also the case for gl_SamplePosition or a custom variable declared with the sample qualifier)
Therefore it is quite automatic, because you will probably need the SampleID anyway.