GLSL linker error(Sampler needs to be a uniform (global or parameter to main)) - opengl

we have a GLSL fragment shader :
but the problem is in this code
vec4 TFSelection(StrVolumeColorMap volumeColorMap , vec4 textureCoordinate)
{
vec4 finalColor = vec4(0.0);
if(volumeColorMap.TransferFunctions[0].numberOfBits == 0)
{
return texture(volumeColorMap.TransferFunctions[0].TransferFunctionID,textureCoordinate.x);
}
if(textureCoordinate.x == 0)
return finalColor;
float deNormalize = textureCoordinate.x *65535/*255*/;
for(int i = 0; i < volumeColorMap.TransferFunctions.length(); i++)
{
int NormFactor = volumeColorMap.TransferFunctions[i].startBit + volumeColorMap.TransferFunctions[i].numberOfBits;
float minval = CalculatePower(2, volumeColorMap.TransferFunctions[i].startBit);
if(deNormalize >= minval)
{
float maxval = CalculatePower(2, NormFactor);
if(deNormalize <maxval)
{
//float tempPower = CalculatePower(2 , NormFactor);
float coord = deNormalize /maxval/*tempPower*/;
return texture(volumeColorMap.TransferFunctions[i].TransferFunctionID,coord);
}
}
}
return finalColor;
}
when we compile and link shader this message logs:
Sampler needs to be a uniform (global or parameter to main), need to
inline function or resolve conditional expression
with a simple change like maybe the shader link successfully like changing
float `coord = deNormalize /maxval
to
float coord = deNormalize .`
driver:nvidia 320.49

Related

shaderc IncluderInterface, include fails?

I am trying to support #include directives for glsl in a Vulkan project.
To my understanding all that is required is to properly implement the IncluderInterface and set it, which I did like this:
class NEShaderIncluder : public CompileOptions::IncluderInterface
{
shaderc_include_result* GetInclude(
const char* requested_source,
shaderc_include_type type,
const char* requesting_source,
size_t include_depth)
{
cout << requested_source << endl;
cout << to_string(type) << endl;
cout << requesting_source << endl;
cout << include_depth << endl;
const string name = string(requested_source);
const string contents = ReadFile(name);
auto container = new std::array<std::string, 2>;
(*container)[0] = name;
(*container)[1] = contents;
auto data = new shaderc_include_result;
data->user_data = container;
data->source_name = (*container)[0].data();
data->source_name_length = (*container)[0].size();
data->content = (*container)[1].data();
data->content_length = (*container)[1].size();
cout << "!!!!!!!!!!!!!!!!!!!" << endl;
cout << data->content << endl;
return data;
};
void ReleaseInclude(shaderc_include_result* data) override
{
delete static_cast<std::array<std::string, 2>*>(data->user_data);
delete data;
};
};
CompileOptions SetShaderCompilationOptions()
{
CompileOptions options;
options.SetIncluder(std::make_unique<NEShaderIncluder>());
options.SetGenerateDebugInfo();
return options;
}
And then I compile my shaders like this:
Compiler compiler;
CompileOptions options = SetShaderCompilationOptions();
shaderc::SpvCompilationResult result =
compiler.CompileGlslToSpv(source, shader_type, shader_name.c_str(), options);
All the print statements I added to that function work and print what I expect, for example this is the last print:
vec4 BlinnPhong(vec3 pos, vec3 normal, vec3 camera_position)
{
vec4 color = vec4(0);
vec3 l = vec3(1);
vec3 c = vec3(0, 0, 0.7);
vec3 n = normalize(normal);
vec3 e = camera_position - pos;
e = normalize(e);
vec3 h = normalize(e + l);
color = vec4(
c * (vec3(0.1) + 0.9 * max(0, dot(n, l))) +
vec3(0.9) * max(0, pow(dot(h, n), 100)),
1);
return color;
}
However, shaderc doesn’t seem to be replacing the include, I get this as an error message from the result object:
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_GOOGLE_include_directive : require
#include "shaders/Example2/phong_lighting.glsl"
layout(location = 0) out vec4 color_out;
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 tex_coord;
layout(location = 2) in vec3 normal;
layout(binding = 1) uniform CameraInfo {
vec3 camera_position;
};
void main()
{
color_out = BlinnPhong(position, normal);
}:
fragment_shader:19: error: 'BlinnPhong' : no matching overloaded function found
fragment_shader:19: error: 'assign' : cannot convert from ' const float' to 'layout( location=0) out highp 4-component vector of float'
I am not entirely sure what is wrong with my use of the API, and I have not been able to find examples online to cross reference.
I suspect one must call PreprocessGlsl beforehand but I am not sure how you are supposed to chain preprocessing and compilation.
The problem is that you are supposed to call a preprocessing stage in addition to setting up the include, the correct call site looks like this:
shaderc::PreprocessedSourceCompilationResult pre_result =
compiler.PreprocessGlsl(source, shader_type, shader_name.c_str(), options);
Assert(
pre_result.GetCompilationStatus() == shaderc_compilation_status_success,
"Preprocess failed for file " + source + ":\n" + pre_result.GetErrorMessage());
string pre_passed_source(pre_result.begin());
shaderc::SpvCompilationResult result = compiler.CompileGlslToSpv(
pre_passed_source, shader_type, shader_name.c_str(), options);
Assert(
result.GetCompilationStatus() == shaderc_compilation_status_success,
pre_passed_source + ":\n" + result.GetErrorMessage());
I don't think it's a problem with the include, but rather with your code. The BlinnPhong from your include takes in 3 arguments, but when calling it you only pass 2 arguments. That matches the first error message.

Skinning with Assimp.Net and OpenTK

I'm trying to implement skeletal animation using Assimp.net and OpenTK and have been following this tutorial but I cannot get it to work.
The model appears fine with identity matrices but is terribly garbled when using the transforms I generate from Assimp.
I suspect the issue is the way I am combining all of the matrices or that there is a difference in OpenTK that I am not realising. I have making similar adjustments from the tutorial as suggested here: Matrix calculations for gpu skinning
but it is still garbled, just differently, I have also tried converting all Assimp matrices to OpenTK matrices before performing any multiplication. These are the areas of the code related to the matrices, I can provide more if needed:
Matrix Conversion
public static OpenTK.Matrix4 TKMatrix(Assimp.Matrix4x4 input)
{
return new OpenTK.Matrix4(input.A1, input.B1, input.C1, input.D1,
input.A2, input.B2, input.C2, input.D2,
input.A3, input.B3, input.C3, input.D3,
input.A4, input.B4, input.C4, input.D4);
}
Storing the GLobal Inverse
public class LoaderMesh
{
public Scene mScene;
public Mesh mMesh;
public OpenTK.Matrix4 GlobalInverseTransform { get; set; }
public LoaderMesh(Scene aiScene, Mesh aiMesh)
{
mScene = aiScene;
mMesh = aiMesh;
GlobalInverseTransform = Util.TKMatrix(mScene.RootNode.Transform);
GlobalInverseTransform.Invert();
}
Loading the bones
public void LoadBones(List<VBO.Vtx_BoneWeight.Vtx> boneData)
{
for (uint iBone = 0; iBone < mMesh.BoneCount; ++iBone)
{
uint boneIndex = 0;
String bonename = mMesh.Bones[iBone].Name;
if (!BoneMapping.ContainsKey(bonename))
{
boneIndex = (uint)NumBones;
NumBones++;
BoneInfo bi = new BoneInfo();
BoneInfos.Add(bi);
}
else
{
boneIndex = BoneMapping[bonename];
}
BoneMapping[bonename] = boneIndex;
BoneInfos[(int)boneIndex].OffsetMatrix = Util.TKMatrix(mMesh.Bones[iBone].OffsetMatrix);
for (uint iWeight = 0; iWeight < mMesh.Bones[iBone].VertexWeightCount; iWeight++)
{
uint VertexID = /*m_Entries[MeshIndex].BaseVertex*/ mMesh.Bones[iBone].VertexWeights[iWeight].VertexID;
float Weight = mMesh.Bones[iBone].VertexWeights[iWeight].Weight;
VBO.Vtx_BoneWeight.Vtx vtx = boneData[(int)VertexID];
VBO.Vtx_BoneWeight.AddWeight(ref vtx, boneIndex, Weight);
boneData[(int)VertexID] = vtx;
}
}
}
Calculating the Transforms
public void ReadNodeHierarchy(float animationTime, Node aiNode, ref OpenTK.Matrix4 parentTransform)
{
String NodeName = aiNode.Name;
Animation animation = mScene.Animations[0];
OpenTK.Matrix4 NodeTransformation = Util.TKMatrix(aiNode.Transform);
NodeAnimationChannel nodeAnim = FindNodeAnim(animation, NodeName);
OpenTK.Matrix4 localTransform = OpenTK.Matrix4.Identity;
if (nodeAnim != null)
{
// Interpolate scaling and generate scaling transformation matrix
Vector3D Scaling = new Vector3D();
CalcInterpolatedScaling(ref Scaling, animationTime, nodeAnim);
Console.WriteLine("Scaling: " + Scaling.ToString());
OpenTK.Matrix4 ScalingM = Util.TKMatrix(Matrix4x4.FromScaling(Scaling));
// Interpolate rotation and generate rotation transformation matrix
Quaternion RotationQ = new Quaternion();
CalcInterpolatedRotation(ref RotationQ, animationTime, nodeAnim);
Console.WriteLine("Rotation: " + RotationQ.ToString());
OpenTK.Matrix4 RotationM = Util.TKMatrix(RotationQ.GetMatrix());
// Interpolate translation and generate translation transformation matrix
Vector3D Translation = new Vector3D();
CalcInterpolatedPosition(ref Translation, animationTime, nodeAnim);
Console.WriteLine("Transform: " + Translation.ToString());
OpenTK.Matrix4 TranslationM = Util.TKMatrix(Matrix4x4.FromTranslation(Translation));
// Combine the above transformations
NodeTransformation = TranslationM * RotationM * ScalingM;
localTransform = TranslationM * RotationM * ScalingM;
}
OpenTK.Matrix4 GlobalTransformation = parentTransform * NodeTransformation;
OpenTK.Matrix4 parentPass = OpenTK.Matrix4.Identity;
if (BoneMapping.ContainsKey(NodeName) == true)
{
uint BoneIndex = BoneMapping[NodeName];
//BoneInfos[(int)BoneIndex].FinalTransformation = GlobalInverseTransform * BoneInfos[(int)BoneIndex].OffsetMatrix * GlobalTransformation;
BoneInfos[(int)BoneIndex].NodeTransformation = parentTransform * Util.TKMatrix(aiNode.Transform) * localTransform;
parentPass = BoneInfos[(int)BoneIndex].NodeTransformation;
BoneInfos[(int)BoneIndex].FinalTransformation = GlobalInverseTransform * BoneInfos[(int)BoneIndex].NodeTransformation * BoneInfos[(int)BoneIndex].OffsetMatrix;
}
for (uint i = 0; i < aiNode.ChildCount; i++)
{
ReadNodeHierarchy(animationTime, aiNode.Children[i], ref parentPass);
}
}
And this is the vertex shader code
#version 400
layout(location = 0)in vec4 vert;
layout(location = 1)in vec4 normal;
layout(location = 2)in vec4 texCoord;
layout(location = 3)in vec4 tanCoord;
layout(location = 4)in ivec4 boneIDs;
layout(location = 5)in vec4 boneWeights;
uniform mat4 projectionMtx;
uniform mat4 viewMtx;
uniform mat4 modelMtx;
const int MAX_BONES = 100;
uniform mat4 bones[MAX_BONES];
out vec3 positionFrg_CS;
out vec3 normalFrg_CS;
out vec3 tanCoordFrg_CS;
out vec3 bitCoordFrg_CS;
out vec4 texCoordFrg;
void main()
{
mat4 BoneTransform = bones[boneIDs[0]] * boneWeights[0];
BoneTransform += bones[boneIDs[1]] * boneWeights[1];
BoneTransform += bones[boneIDs[2]] * boneWeights[2];
BoneTransform += bones[boneIDs[3]] * boneWeights[3];
gl_Position = projectionMtx * viewMtx * modelMtx * BoneTransform * vert;
}
Is there anything I am doing wrong multiplying the matrices together?
In reply to livin_amuk, I have got this working, at least well enough for my needs, however I fixed this 6 months ago and my memory is vague...
If I remember correctly my main issue was the bone/vertex indices, I think I messed up the BaseVertex because I was in a rush. Here is my current working LoadBones function.
public void LoadBones(List<VBO.Vtx_BoneWeight.Vtx> boneData, SubMesh mesh)
{
for (int iBone = 0; iBone < mesh.mMesh.BoneCount; ++iBone)
{
uint boneIndex = 0;
String bonename = mesh.mMesh.Bones[iBone].Name;
if (!BoneMapping.ContainsKey(bonename))
{
boneIndex = (uint)NumBones;
NumBones++;
BoneInfo bi = new BoneInfo();
BoneInfos.Add(bi);
//Note, I have these two lines included inside the if statement, the original tut does not. Not sure if it makes a difference.
BoneMapping[bonename] = boneIndex;
BoneInfos[(int)boneIndex].OffsetMatrix = AssimpToOpenTK.TKMatrix(mesh.mMesh.Bones[iBone].OffsetMatrix);
}
else
{
boneIndex = BoneMapping[bonename];
}
for (int iWeight = 0; iWeight < mesh.mMesh.Bones[iBone].VertexWeightCount; iWeight++)
{
//My question has the mesh.BaseVertex commented out. it is important!
long VertexID = mesh.BaseVertex + mesh.mMesh.Bones[iBone].VertexWeights[iWeight].VertexID;
float Weight = mesh.mMesh.Bones[iBone].VertexWeights[iWeight].Weight;
VBO.Vtx_BoneWeight.Vtx vtx = boneData[(int)VertexID];
VBO.Vtx_BoneWeight.AddWeight(ref vtx, boneIndex, Weight);
boneData[(int)VertexID] = vtx;
}
}
}
I also had the transforms backwards. Read node hierarchy function.
public void ReadNodeHierarchy(float animationTime, Node aiNode, ref OpenTK.Matrix4 parentTransform)
{
String NodeName = aiNode.Name;
Animation animation = mScene.Animations[0];
OpenTK.Matrix4 NodeTransformation = AssimpToOpenTK.TKMatrix(aiNode.Transform);
NodeAnimationChannel nodeAnim = FindNodeAnim(animation, NodeName);
if (nodeAnim != null)
{
// Interpolate scaling and generate scaling transformation matrix
Vector3D Scaling = new Vector3D();
CalcInterpolatedScaling(ref Scaling, animationTime, nodeAnim);
OpenTK.Matrix4 ScalingM = AssimpToOpenTK.TKMatrix(Matrix4x4.FromScaling(Scaling));
// Interpolate rotation and generate rotation transformation matrix
Quaternion RotationQ = new Quaternion();
CalcInterpolatedRotation(ref RotationQ, animationTime, nodeAnim);
OpenTK.Matrix4 RotationM = AssimpToOpenTK.TKMatrix(RotationQ.GetMatrix());
// Interpolate translation and generate translation transformation matrix
Vector3D Translation = new Vector3D();
CalcInterpolatedPosition(ref Translation, animationTime, nodeAnim);
OpenTK.Matrix4 TranslationM = AssimpToOpenTK.TKMatrix(Matrix4x4.FromTranslation(Translation));
// Combine the above transformations
//All that local transform stuff is gone. The order of the transforms is reversed from my question AND the original tut.
NodeTransformation = ScalingM * RotationM * TranslationM;
}
//Also reversed.
OpenTK.Matrix4 GlobalTransformation = NodeTransformation * parentTransform;
//GlobalTransformation = OpenTK.Matrix4.Identity;
if (BoneMapping.ContainsKey(NodeName) == true)
{
uint BoneIndex = BoneMapping[NodeName];
//Also, Also, reversed.
BoneInfos[(int)BoneIndex].FinalTransformation = BoneInfos[(int)BoneIndex].OffsetMatrix * GlobalTransformation * GlobalInverseTransform;
}
for (int i = 0; i < aiNode.ChildCount; i++)
{
ReadNodeHierarchy(animationTime, aiNode.Children[i], ref GlobalTransformation);
}
}
The Matrix conversion at the top is also correct, as is the Shader code.

Strange behaviour using in/out block data with OpenGL/GLSL

I have implemented normal mapping shader in my OpenGL/GLSL application. To compute the bump and shadow factor in the fragment shader I need to send from the vertex shader some data like the light direction in tangent space and the vertex position in light space for each light of my scene. So to do job I need the declare 2 output variables like below (vertex shader):
#define MAX_LIGHT_COUNT 5
[...]
out vec4 ShadowCoords[MAX_LIGHT_COUNT]; //Vertex position in light space
out vec3 lightDir_TS[MAX_LIGHT_COUNT]; //light direction in tangent space
uniform int LightCount;
[...]
for (int idx = 0; idx < LightCount; idx++)
{
[...]
lightDir_TS[idx] = TBN * lightDir_CS;
ShadowCoords[idx] = ShadowInfos[idx].ShadowMatrix * VertexPosition;
[...]
}
And in the fragment shader I recover these variables thanks to the followings input declarations:
in vec3 lightDir_TS[MAX_LIGHT_COUNT];
in vec4 ShadowCoords[MAX_LIGHT_COUNT];
The rest of the code is not important to explain my problem.
So now here's the result in image:
As you can see until here all is ok!
But now, for a sake of simplicity I want to use a single output declaration rather than 2! So the logical choice is to use an input/output data block like below:
#define MAX_LIGHT_COUNT 5
[...]
out LightData_VS
{
vec3 lightDir_TS;
vec4 ShadowCoords;
} LightData_OUT[MAX_LIGHT_COUNT];
uniform int LightCount;
[...]
for (int idx = 0; idx < LightCount; idx++)
{
[...]
LightData_OUT[idx].lightDir_TS = TBN * lightDir_CS;
LightData_OUT[idx].ShadowCoords = ShadowInfos[idx].ShadowMatrix * VertexPosition;
[...]
}
And in the fragment shader the input data block:
in LightData_VS
{
vec3 lightDir_TS;
vec4 ShadowCoords;
} LightData_IN[MAX_LIGHT_COUNT];
But this time when I execute my program I have the following display:
As you can see the specular light is not the same than in the first case above!
However I noticed if I replace the line:
for (int idx = 0; idx < LightCount; idx++) //Use 'LightCount' uniform variable
by the following one:
for (int idx = 0; idx < 1; idx++) //'1' value hard coded
or
int count = 1;
for (int idx = 0; idx < count; idx++)
the shading result is correct!
The problem seems to come from the fact I use uniform variable in the 'for' condition. However this works when I used seperates output variables like in the first case!
I checked: the uniform variable 'LightCount' is correct and equal to '1'; (I tried unsigned int data type without success and it's the same thing using a 'while' loop)
How can you explain a such result?
I use:
OpenGL: 4.4.0 NVIDIA driver 344.75
GLSL: 4.40 NVIDIA via Cg compiler
I already used input/output data block without problem but it was not arrays but just simple blocks like below:
[in/out] VertexData_VS
{
vec3 viewDir_TS;
vec4 Position_CS;
vec3 Normal_CS;
vec2 TexCoords;
} VertexData_[IN/OUT];
Do you think it's not possible to use input/output data blocks as arrays in a loop using a uniform variable in the for conditions ?
UPDATE
I tried using 2 vec4 (for a sake of data alignment like for uniform block (for this case data need to be aligned on a vec4)) into the data structure like below:
[in/out] LightData_VS
{
vec4 lightDir_TS; //vec4((TBN * lightDir_CS), 0.0f);
vec4 ShadowCoords;
} LightData_[IN/OUT][MAX_LIGHT_COUNT];
without success...
UPDATE 2
Here's the code concerning shader compilation log:
core::FileSystem file(filename);
std::ifstream ifs(file.GetFullName());
if (ifs)
{
GLint compilationError = 0;
std::string fileContent, line;
char const *sourceCode;
while (std::getline(ifs, line, '\n'))
fileContent.append(line + '\n');
sourceCode = fileContent.c_str();
ifs.close();
this->m_Handle = glCreateShader(this->m_Type);
glShaderSource(this->m_Handle, 1, &sourceCode, 0);
glCompileShader(this->m_Handle);
glGetShaderiv(this->m_Handle, GL_COMPILE_STATUS, &compilationError);
if (compilationError != GL_TRUE)
{
GLint errorSize = 0;
glGetShaderiv(this->m_Handle, GL_INFO_LOG_LENGTH, &errorSize);
char *errorStr = new char[errorSize + 1];
glGetShaderInfoLog(this->m_Handle, errorSize, &errorSize, errorStr);
errorStr[errorSize] = '\0';
std::cout << errorStr << std::endl;
delete[] errorStr;
glDeleteShader(this->m_Handle);
}
}
And the code concerning the program log:
GLint errorLink = 0;
glGetProgramiv(this->m_Handle, GL_LINK_STATUS, &errorLink);
if (errorLink != GL_TRUE)
{
GLint sizeError = 0;
glGetProgramiv(this->m_Handle, GL_INFO_LOG_LENGTH, &sizeError);
char *error = new char[sizeError + 1];
glGetShaderInfoLog(this->m_Handle, sizeError, &sizeError, error);
error[sizeError] = '\0';
std::cerr << error << std::endl;
glDeleteProgram(this->m_Handle);
delete[] error;
}
Unfortunatly, I don't have any error log!

DirectX 11 Compute Shader 5 loop

I have the following compute shader code for computing depth of field. However, very unusually, the loop executes just once, even if g_rayCount is 10. Please have a look in the main function raycastercs where the for loop lies.
//--------------------------------------------------------------------------------------
// Compute Shader
//-------------------------------------------------------------------------------------
SamplerState SSLinear
{
Filter = Min_Mag_Linear_Mip_Point;
AddressU = Border;
AddressV = Border;
AddressW = Border;
};
float3 CalculateDoF(uint seedIndex, uint2 fragPos)
{
;
}
[numthreads(RAYCASTER_THREAD_BLOCK_SIZE, RAYCASTER_THREAD_BLOCK_SIZE, 1)]
void RaycasterCS(in uint3 threadID: SV_GroupThreadID, in uint3 groupID: SV_GroupID, in uint3 dispatchThreadID :SV_DispatchThreadID)
{
uint2 fragPos = groupID.xy * RAYCASTER_THREAD_BLOCK_SIZE + threadID.xy;
float4 dstColor = g_texFinal[fragPos];
uint seedIndex = dispatchThreadID.x * dispatchThreadID.y;
float3 final = float3(0, 0, 0);
float color = 0;
[loop][allow_uav_condition]
for (int i = 0; i < g_rayCount; ++i);
{
float3 dof = CalculateDoF(seedIndex, fragPos);
final += dof;
}
final *= 1.0f / ((float) g_rayCount);
g_texFinalRW[fragPos] = float4(final, 1);
}
//--------------------------------------------------------------------------------------
technique10 Raycaster
{
pass RaycastDefault
{
SetVertexShader(NULL);
SetGeometryShader(NULL);
SetPixelShader(NULL);
SetComputeShader(CompileShader(cs_5_0, RaycasterCS()));
}
}
Remove the semicolon at the end of the for statement
for (int i = 0; i < g_rayCount; ++i) // removed semicolon
{
float3 dof = CalculateDoF(seedIndex, fragPos);
final += dof;
}
As I guess you know, the semicolon was just running an empty for loop, then the code in braces was thereafter executed just once.

Order independent transparency with MSAA

I have implemented OIT based on the demo in "OpenGL Programming Guide" 8th edition.(The red book).Now I need to add MSAA.Just enabling MSAA screws up the transparency as the layered pixels are resolved x times equal to the number of sample levels.I have read this article on how it is done with DirectX where they say the pixel shader should be run per sample and not per pixel.How id it done in OpenGL.
I won't put out here the whole implementation but the fragment shader chunk in which the final resolution of the layered pixels occurs:
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
/// Retrieving the next fragment from the stack:
vec4 modulator = unpackUnorm4x8(fragment_list[i].y) ;
/// Perform alpha blending:
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color ;
Update:
I have tried the solution proposed here but it still doesn't work.Here are the full fragment shader for the list build and resolve passes:
List build pass :
#version 420 core
layout (early_fragment_tests) in;
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
layout (binding = 1, rgba32ui) uniform writeonly uimageBuffer list_buffer;
layout (binding = 0, offset = 0) uniform atomic_uint list_counter;
layout (location = 0) out vec4 color;//dummy output
in vec3 frag_position;
in vec3 frag_normal;
in vec4 surface_color;
in int gl_SampleMaskIn[];
uniform vec3 light_position = vec3(40.0, 20.0, 100.0);
void main(void)
{
uint index;
uint old_head;
uvec4 item;
vec4 frag_color;
index = atomicCounterIncrement(list_counter);
old_head = imageAtomicExchange(head_pointer_image, ivec2(gl_FragCoord.xy), uint(index));
vec4 modulator =surface_color;
item.x = old_head;
item.y = packUnorm4x8(modulator);
item.z = floatBitsToUint(gl_FragCoord.z);
item.w = int(gl_SampleMaskIn[0]);
imageStore(list_buffer, int(index), item);
frag_color = modulator;
color = frag_color;
}
List resolve :
#version 420 core
// The per-pixel image containing the head pointers
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
// Buffer containing linked lists of fragments
layout (binding = 1, rgba32ui) uniform uimageBuffer list_buffer;
// This is the output color
layout (location = 0) out vec4 color;
// This is the maximum number of overlapping fragments allowed
#define MAX_FRAGMENTS 40
// Temporary array used for sorting fragments
uvec4 fragment_list[MAX_FRAGMENTS];
void main(void)
{
uint current_index;
uint fragment_count = 0;
current_index = imageLoad(head_pointer_image, ivec2(gl_FragCoord).xy).x;
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
int coverage = int(fragment.w);
//if((coverage &(1 << gl_SampleID))!=0) {
fragment_list[fragment_count] = fragment;
current_index = fragment.x;
//}
fragment_count++;
}
uint i, j;
if (fragment_count > 1)
{
for (i = 0; i < fragment_count - 1; i++)
{
for (j = i + 1; j < fragment_count; j++)
{
uvec4 fragment1 = fragment_list[i];
uvec4 fragment2 = fragment_list[j];
float depth1 = uintBitsToFloat(fragment1.z);
float depth2 = uintBitsToFloat(fragment2.z);
if (depth1 < depth2)
{
fragment_list[i] = fragment2;
fragment_list[j] = fragment1;
}
}
}
}
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
vec4 modulator = unpackUnorm4x8(fragment_list[i].y);
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color;
}
Without knowing how your code actually works, you can do it very much the same way that your linked DX11 demo does, since OpenGL provides the same features needed.
So in the first shader that just stores all the rendered fragments, you also store the sample coverage mask for each fragment (along with the color and depth, of course). This is given as fragment shader input variable int gl_SampleMaskIn[] and for each sample with id 32*i+j, bit j of glSampleMaskIn[i] is set if the fragment covers that sample (since you probably won't use >32xMSAA, you can usually just use glSampleMaskIn[0] and only need to store a single int as coverage mask).
...
fragment.color = inColor;
fragment.depth = gl_FragCoord.z;
fragment.coverage = gl_SampleMaskIn[0];
...
Then the final sort and render shader is run for each sample instead of just for each fragment. This is achieved implicitly by making use of the input variable int gl_SampleID, which gives us the ID of the current sample. So what we do in this shader (in addition to the non-MSAA version) is that the sorting step just accounts for the sample, by only adding a fragment to the final (to be sorted) fragment list if the current sample is actually covered by this fragment:
What was something like (beware, pseudocode extrapolated from your small snippet and the DX-link):
while(fragment.next != 0xFFFFFFFF)
{
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
is now
while(fragment.next != 0xFFFFFFFF)
{
if(fragment.coverage & (1 << gl_SampleID))
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
Or something along those lines.
EDIT: To your updated code, you have to increment fragment_count only inside the if(covered) block, since we don't want to add the fragment to the list if the sample is not covered. Incrementing it always will likely result in the artifacts you see at the edges, which are the regions where the MSAA (and thus the coverage) comes into play.
On the other hand the list pointer has to be forwarded (current_index = fragment.x) in each loop iteration and not only if the sample is covered, as otherwise it can result in an infinite loop, like in your case. So your code should look like:
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
uint coverage = fragment.w;
if((coverage &(1 << gl_SampleID))!=0)
fragment_list[fragment_count++] = fragment;
current_index = fragment.x;
}
The OpenGL 4.3 Spec states in 7.1 about the gl_SampleID builtin variable:
Any static use of this variable in a fragment shader causes the entire shader to be evaluated per-sample.
(This has already been the case in the ARB_sample_shading and is also the case for gl_SamplePosition or a custom variable declared with the sample qualifier)
Therefore it is quite automatic, because you will probably need the SampleID anyway.