I am compiling a GLSL file to SPIR-V using the command:
C:/VulkanSDK/1.2.148.1/Bin/glslc C:/Users/jonat/Projects/sum.comp -o C:/Users/jonat/Projects/sum.spv
Getting the error:
error: 'subgroup op' : requires SPIR-V 1.3
The error occurs on lines 32 and 45, which are both sum = subgroupAdd(sum);
The full GLSL code:
#version 450
#extension GL_KHR_shader_subgroup_arithmetic : enable
layout(std430, binding = 0) buffer Input
{
float inputs[];
};
layout(std430, binding = 1) buffer Output
{
float outputs[];
};
layout (local_size_x_id = 1) in;
layout (constant_id = 2) const int sumSubGroupSize = 64;
layout(push_constant) uniform PushConsts
{
int n;
} consts;
shared float sdata[sumSubGroupSize];
void main()
{
float sum = 0.0;
if (gl_GlobalInvocationID.x < consts.n)
{
sum = inputs[gl_GlobalInvocationID.x];
}
sum = subgroupAdd(sum);
if (gl_SubgroupInvocationID == 0)
{
sdata[gl_SubgroupID] = sum;
}
memoryBarrierShared();
barrier();
if (gl_SubgroupID == 0)
{
sum = gl_SubgroupInvocationID < gl_NumSubgroups ? sdata[gl_SubgroupInvocationID] : 0;
sum = subgroupAdd(sum);
}
if (gl_LocalInvocationID.x == 0)
{
outputs[gl_WorkGroupID.x] = sum;
}
}
I have got the latest version of VulkanSDK.
Looks like you need --target-env=vulkan1.1 for glslc to emit SPIR-V 1.3:
4.2.6. --target-env=
...
Generated code uses SPIR-V 1.0, except for code compiled for Vulkan 1.1, which uses SPIR-V 1.3, and code compiled for Vulkan 1.5, which uses SPIR-V 1.5.
If this option is not specified, a default of vulkan1.0 is used.
Related
I've just downloaded the lastest Vulkan SDK version (1.3.224.1) and when I try to compile a shader using shaderc I get this error: error: OpenGL compatibility profile is not supported.
I've cloned Hazel-2D Engine: https://github.com/TheCherno/Hazel by The Cherno just to check if it works fine because it builds the same Vulkan SDK things as in my project and has the same shaderc and spirv-cross code. It works in hazel but does not work inside my project.
shader code:
#version 450 core
layout(location = 0) in vec2 a_position;
layout(location = 1) in vec2 a_uv;
layout(location = 0) out vec2 v_uv;
void main()
{
v_uv = a_uv;
vec4 position = vec4(a_position, 0.0, 1.0);
gl_Position = position;
}
#version 450 core
layout(location = 0) out vec4 f_color;
layout(location = 0) in vec2 v_uv;
void main()
{
f_color = vec4(1.0f, 1.0f, 1.0f, 1.0f);
}
I guess it's not important to show the whole premake script since the project builds fine.
I'll show just the most important part:
dependencies are defined like this:
VULKAN_SDK = os.getenv("VULKAN_SDK")
IncludeDir["VulkanSDK"] = "%{VULKAN_SDK}/Include"
LibraryDir = {}
LibraryDir["VulkanSDK"] = "%{VULKAN_SDK}/Lib"
Library = {}
Library["ShaderC_Debug"] = "%{LibraryDir.VulkanSDK}/shaderc_sharedd.lib"
Library["SPIRV_Cross_Debug"] = "%{LibraryDir.VulkanSDK}/spirv-cross-cored.lib"
Library["SPIRV_Cross_GLSL_Debug"] = "%{LibraryDir.VulkanSDK}/spirv-cross-glsld.lib"
Library["SPIRV_Tools_Debug"] = "%{LibraryDir.VulkanSDK}/SPIRV-Toolsd.lib"
and linked like this:
includedirs {
--other things...
"%{IncludeDir.VulkanSDK}",
}
filter "configurations:Debug"
symbols "On"
links {
"%{Library.ShaderC_Debug}",
"%{Library.SPIRV_Cross_Debug}",
"%{Library.SPIRV_Cross_GLSL_Debug}",
"%{Library.SPIRV_Tools_Debug}",
}
What I do first is compile Vulkan GLSL code into a spirv binaries (+ caching, but to provide a minimal example I removed it):
std::unordered_map<uint32_t, std::vector<uint32_t>> Shader::CompileGLSLToVulkanSpirvAndCache(const std::unordered_map<uint32_t, std::string>& shaderSource)
{
std::unordered_map<uint32_t, std::vector<uint32_t>> resultBinaries;
shaderc::Compiler compiler;
shaderc::CompileOptions options;
options.SetTargetEnvironment(shaderc_target_env_vulkan, shaderc_env_version_vulkan_1_2);
uint32_t shadersCount = static_cast<uint32_t>(shaderSource.size());
resultBinaries.reserve(shadersCount);
for(const auto& [type, source] : shaderSource) {
shaderc::SpvCompilationResult compilationResult = compiler.CompileGlslToSpv(source, GLShaderTypeToShadercShaderType(type), filepath.generic_string().c_str(), options);
if(compilationResult.GetCompilationStatus() != shaderc_compilation_status_success) {
EngineLogError(compilationResult.GetErrorMessage());
__debugbreak();
}
resultBinaries[type] = std::vector<uint32_t>(compilationResult.cbegin(), compilationResult.cend());
}
return resultBinaries;
}
and then using spirv_cross::CompilerGLSL I compile the vulkan spirv binaries to opengl spirv.
std::unordered_map<uint32_t, std::vector<uint32_t>> Shader::CompileFromVulkanToOpengl(const std::unordered_map<uint32_t, std::vector<uint32_t>>& vulkanBinaries)
{
std::unordered_map<uint32_t, std::vector<uint32_t>> resultBinaries;
shaderc::Compiler compiler;
shaderc::CompileOptions options;
options.SetTargetEnvironment(shaderc_target_env_opengl_compat, shaderc_env_version_opengl_4_5);
uint32_t shadersCount = static_cast<uint32_t>(vulkanBinaries.size());
resultBinaries.reserve(shadersCount);
for(const auto& [type, binaries] : vulkanBinaries) {
spirv_cross::CompilerGLSL glsl(binaries);
std::string source = glsl.compile();
shaderc::SpvCompilationResult compilationResult = compiler.CompileGlslToSpv(source, GLShaderTypeToShadercShaderType(type), filepath.generic_string().c_str(), options);
if(compilationResult.GetCompilationStatus() != shaderc_compilation_status_success) {
EngineLogError(compilationResult.GetErrorMessage());
__debugbreak();
}
resultBinaries[type] = std::vector<uint32_t>(compilationResult.cbegin(), compilationResult.cend());
}
return resultBinaries;
}
and CompileFromVulkanToOpengl this function returns this error: error: OpenGL compatibility profile is not supported.
How to fix it? Why it works in hazel 2d and does not work in my project?
OpenGL Info:
OpenGL Info:
Vendor: NVIDIA Corporation
Renderer: NVIDIA GeForce GTX 1080/PCIe/SSE2
Version: 4.6.0 NVIDIA 516.94
I have been using a sampler2D array in my fragment shader (those are shadow maps, there can be up to 16 of them, so an array is more preferable than using 16 separate variables, of course). Then I added the WebGL2 context (const context = canvas.getContext('webgl2');) to the THREE.WebGLRenderer that I'm using and now I can't get the program to work: it says array index for samplers must be constant integral expressions when I attempt to access the sampler array elements in a loop like this:
uniform sampler2D samplers[MAX_SPLITS];
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
...
}
}
Is there really no way around this? Do I have to use sixteen separate variables?
(there is no direct #version directive in the shader but THREE.js seems to add a #version 300 es by default)
You can't use dynamic indexing with samplers in GLSL ES 3.0. From the spec
12.29 Samplers
Should samplers be allowed as l-values? The specification already allows an equivalent behavior:
Current specification:
uniform sampler2D sampler[8];
int index = f(...);
vec4 tex = texture(sampler[index], xy); // allowed
Using assignment of sampler types:
uniform sampler2D s;
s = g(...);
vec4 tex = texture(s, xy); // not allowed
RESOLUTION: Dynamic indexing of sampler arrays is now prohibited by the specification. Restrict indexing of sampler arrays to constant integral expressions.
and
12.30 Dynamic Indexing
For GLSL ES 1.00, support of dynamic indexing of arrays, vectors and matrices was not mandated
because it was not directly supported by some implementations. Software solutions (via program
transforms) exist for a subset of cases but lead to poor performance. Should support for dynamic indexing
be mandated for GLSL ES 3.00?
RESOLUTION: Mandate support for dynamic indexing of arrays except for sampler arrays, fragment
output arrays and uniform block arrays.
Should support for dynamic indexing of vectors and matrices be mandated in GLSL ES 3.00?
RESOLUTION: Yes.
Indexing of arrays of samplers by constant-index-expressions is supported
in GLSL ES 1.00. A constant index-expression is an expression formed from
constant-expressions and certain loop indices, defined for
a subset of loop constructs. Should this functionality be included in GLSL ES 3.00?
RESOLUTION: No. Arrays of samplers may only be indexed by constant-integral-expressions.
Can you use a 2D_ARRAY texture to solve your issue? Put each of your current 2D textures into a layer of a 2D_ARRAY texture then the z coord is just an integer layer index. Advantage, you can use many more layers with a 2D_ARRAY then you get samplers. WebGL2 implementations generally only have 32 samplers but allow hundreds or thousands of layers in a 2D_ARRAY texture.
or use GLSL 1.0
const vs1 = `
void main() { gl_Position = vec4(0); }
`;
const vs3 = `#version 300 es
void main() { gl_Position = vec4(0); }
`;
const fs1 = `
precision highp float;
#define MAX_SPLITS 4
uniform sampler2D samplers[MAX_SPLITS];
uniform int splitCount;
bool func(sampler2D s) {
return texture2D(s, vec2(0)).r > 0.5;
}
void main() {
float v = 0.0;
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
v += 1.0;
}
}
gl_FragColor = vec4(v);
}
`;
const fs3 = `#version 300 es
precision highp float;
#define MAX_SPLITS 4
uniform sampler2D samplers[MAX_SPLITS];
uniform int splitCount;
bool func(sampler2D s) {
return texture(s, vec2(0)).r > 0.5;
}
out vec4 color;
void main() {
float v = 0.0;
for (int i = 0; i < MAX_SPLITS; ++i) {
if (i >= splitCount) {
break;
}
if (func(samplers[i])) { // that's where the error happens
v += 1.0;
}
}
color = vec4(v);
}
`;
function main() {
const gl = document.createElement('canvas').getContext('webgl2');
if (!gl) {
return alert('need WebGL2');
}
test('glsl 1.0', vs1, fs1);
test('glsl 3.0', vs3, fs3);
function test(msg, vs, fs) {
const p = twgl.createProgram(gl, [vs, fs]);
log(msg, ':', p ? 'success' : 'fail');
}
}
main();
function log(...args) {
const elem = document.createElement('pre');
elem.textContent = [...args].join(' ');
document.body.appendChild(elem);
}
<script src="https://twgljs.org/dist/4.x/twgl.min.js"></script>
I am trying to make calculations on the fragment shader in WebGL2. And I've noticed that the calculations there are not as precise as on C++. I know that the high precision float contains 32 bits either in the fragment shader or in C++.
I am trying to compute 1.0000001^(10000000) and get around 2.8 on C++ and around 3.2 on the shader. Do you know the reason that the fragment shader calculations are not as precise as the same calculations on C++?
code on C++
#include <iostream>
void main()
{
const float NEAR_ONE = 1.0000001;
float result = NEAR_ONE;
for (int i = 0; i < 10000000; i++)
{
result = result * NEAR_ONE;
}
std::cout << result << std::endl; // result is 2.88419
}
Fragment shader code:
#version 300 es
precision highp float;
out vec4 color;
void main()
{
const float NEAR_ONE = 1.0000001;
float result = NEAR_ONE;
for (int i = 0; i < 10000000; i++)
{
result = result * NEAR_ONE;
}
if ((result > 3.2) && (result < 3.3))
{
// The screen is colored by red and this is how we know
// that the value of result is in between 3.2 and 3.3
color = vec4(1.0, 0.0, 0.0, 1.0); // Red
}
else
{
// We never come here.
color = vec4(0.0, 0.0, 0.0, 1.0); // Black
}
}
Update:
Here one can find the html file with the full code for the WebGL2 example
OpenGL ES 3.0 on which WebGL2 is based does not require floating point on the GPU to work the same as it does in C++
From the spec
2.1.1 Floating-Point Computation
The GL must perform a number of floating-point operations during the course of
its operation. In some cases, the representation and/or precision of such operations
is defined or limited; by the OpenGL ES Shading Language Specification for operations in shaders, and in some cases implicitly limited by the specified format
of vertex, texture, or renderbuffer data consumed by the GL. Otherwise, the representation of such floating-point numbers, and the details of how operations on
them are performed, is not specified. We require simply that numbers’ floating point parts contain enough bits and that their exponent fields are large enough so
that individual results of floating-point operations are accurate to about 1 part in
105
. The maximum representable magnitude for all floating-point values must be
at least 232
. x· 0 = 0 ·x = 0 for any non-infinite and non-NaN x. 1 ·x = x· 1 = x.
x + 0 = 0 + x = x. 0
0 = 1. (Occasionally further requirements will be specified.)
Most single-precision floating-point formats meet these requirements.
Just for fun let's do it and print the results. Using WebGL1 so can test on more devices
function main() {
const gl = document.createElement('canvas').getContext('webgl');
const ext = gl.getExtension('OES_texture_float');
if (!ext) { return alert('need OES_texture_float'); }
// not required - long story
gl.getExtension('WEBGL_color_buffer_float');
const fbi = twgl.createFramebufferInfo(gl, [
{ type: gl.FLOAT, minMag: gl.NEAREST, wrap: gl.CLAMP_TO_EDGE, }
], 1, 1);
const vs = `
void main() {
gl_Position = vec4(0, 0, 0, 1);
gl_PointSize = 1.0;
}
`;
const fs = `
precision highp float;
void main() {
const float NEAR_ONE = 1.0000001;
float result = NEAR_ONE;
for (int i = 0; i < 10000000; i++) {
result = result * NEAR_ONE;
}
gl_FragColor = vec4(result);
}
`;
const prg = twgl.createProgram(gl, [vs, fs]);
gl.useProgram(prg);
gl.viewport(0, 0, 1, 1);
gl.drawArrays(gl.POINTS, 0, 1);
const values = new Float32Array(4);
gl.readPixels(0, 0, 1, 1, gl.RGBA, gl.FLOAT, values);
console.log(values[0]);
}
main();
<script src="https://twgljs.org/dist/4.x/twgl.js"></script>
My results:
Intel Iris Pro : 2.884186029434204
NVidia GT 750 M : 3.293879985809326
NVidia GeForce GTX 1060 : 3.2939157485961914
Intel UHD Graphics 617 : 3.292219638824464
The difference is precision. In fact, if you compile the c++ fragment using double (64-bit floating-point, with 53-bit mantissa) instead of float (32-bit floating-point, with 24-bit mantissa), you obtain as result 3.29397, which is the result you get using the shader.
we have a GLSL fragment shader :
but the problem is in this code
vec4 TFSelection(StrVolumeColorMap volumeColorMap , vec4 textureCoordinate)
{
vec4 finalColor = vec4(0.0);
if(volumeColorMap.TransferFunctions[0].numberOfBits == 0)
{
return texture(volumeColorMap.TransferFunctions[0].TransferFunctionID,textureCoordinate.x);
}
if(textureCoordinate.x == 0)
return finalColor;
float deNormalize = textureCoordinate.x *65535/*255*/;
for(int i = 0; i < volumeColorMap.TransferFunctions.length(); i++)
{
int NormFactor = volumeColorMap.TransferFunctions[i].startBit + volumeColorMap.TransferFunctions[i].numberOfBits;
float minval = CalculatePower(2, volumeColorMap.TransferFunctions[i].startBit);
if(deNormalize >= minval)
{
float maxval = CalculatePower(2, NormFactor);
if(deNormalize <maxval)
{
//float tempPower = CalculatePower(2 , NormFactor);
float coord = deNormalize /maxval/*tempPower*/;
return texture(volumeColorMap.TransferFunctions[i].TransferFunctionID,coord);
}
}
}
return finalColor;
}
when we compile and link shader this message logs:
Sampler needs to be a uniform (global or parameter to main), need to
inline function or resolve conditional expression
with a simple change like maybe the shader link successfully like changing
float `coord = deNormalize /maxval
to
float coord = deNormalize .`
driver:nvidia 320.49
I have implemented OIT based on the demo in "OpenGL Programming Guide" 8th edition.(The red book).Now I need to add MSAA.Just enabling MSAA screws up the transparency as the layered pixels are resolved x times equal to the number of sample levels.I have read this article on how it is done with DirectX where they say the pixel shader should be run per sample and not per pixel.How id it done in OpenGL.
I won't put out here the whole implementation but the fragment shader chunk in which the final resolution of the layered pixels occurs:
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
/// Retrieving the next fragment from the stack:
vec4 modulator = unpackUnorm4x8(fragment_list[i].y) ;
/// Perform alpha blending:
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color ;
Update:
I have tried the solution proposed here but it still doesn't work.Here are the full fragment shader for the list build and resolve passes:
List build pass :
#version 420 core
layout (early_fragment_tests) in;
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
layout (binding = 1, rgba32ui) uniform writeonly uimageBuffer list_buffer;
layout (binding = 0, offset = 0) uniform atomic_uint list_counter;
layout (location = 0) out vec4 color;//dummy output
in vec3 frag_position;
in vec3 frag_normal;
in vec4 surface_color;
in int gl_SampleMaskIn[];
uniform vec3 light_position = vec3(40.0, 20.0, 100.0);
void main(void)
{
uint index;
uint old_head;
uvec4 item;
vec4 frag_color;
index = atomicCounterIncrement(list_counter);
old_head = imageAtomicExchange(head_pointer_image, ivec2(gl_FragCoord.xy), uint(index));
vec4 modulator =surface_color;
item.x = old_head;
item.y = packUnorm4x8(modulator);
item.z = floatBitsToUint(gl_FragCoord.z);
item.w = int(gl_SampleMaskIn[0]);
imageStore(list_buffer, int(index), item);
frag_color = modulator;
color = frag_color;
}
List resolve :
#version 420 core
// The per-pixel image containing the head pointers
layout (binding = 0, r32ui) uniform uimage2D head_pointer_image;
// Buffer containing linked lists of fragments
layout (binding = 1, rgba32ui) uniform uimageBuffer list_buffer;
// This is the output color
layout (location = 0) out vec4 color;
// This is the maximum number of overlapping fragments allowed
#define MAX_FRAGMENTS 40
// Temporary array used for sorting fragments
uvec4 fragment_list[MAX_FRAGMENTS];
void main(void)
{
uint current_index;
uint fragment_count = 0;
current_index = imageLoad(head_pointer_image, ivec2(gl_FragCoord).xy).x;
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
int coverage = int(fragment.w);
//if((coverage &(1 << gl_SampleID))!=0) {
fragment_list[fragment_count] = fragment;
current_index = fragment.x;
//}
fragment_count++;
}
uint i, j;
if (fragment_count > 1)
{
for (i = 0; i < fragment_count - 1; i++)
{
for (j = i + 1; j < fragment_count; j++)
{
uvec4 fragment1 = fragment_list[i];
uvec4 fragment2 = fragment_list[j];
float depth1 = uintBitsToFloat(fragment1.z);
float depth2 = uintBitsToFloat(fragment2.z);
if (depth1 < depth2)
{
fragment_list[i] = fragment2;
fragment_list[j] = fragment1;
}
}
}
}
vec4 final_color = vec4(0,0,0,0);
for (i = 0; i < fragment_count; i++)
{
vec4 modulator = unpackUnorm4x8(fragment_list[i].y);
final_color = mix(final_color, modulator, modulator.a);
}
color = final_color;
}
Without knowing how your code actually works, you can do it very much the same way that your linked DX11 demo does, since OpenGL provides the same features needed.
So in the first shader that just stores all the rendered fragments, you also store the sample coverage mask for each fragment (along with the color and depth, of course). This is given as fragment shader input variable int gl_SampleMaskIn[] and for each sample with id 32*i+j, bit j of glSampleMaskIn[i] is set if the fragment covers that sample (since you probably won't use >32xMSAA, you can usually just use glSampleMaskIn[0] and only need to store a single int as coverage mask).
...
fragment.color = inColor;
fragment.depth = gl_FragCoord.z;
fragment.coverage = gl_SampleMaskIn[0];
...
Then the final sort and render shader is run for each sample instead of just for each fragment. This is achieved implicitly by making use of the input variable int gl_SampleID, which gives us the ID of the current sample. So what we do in this shader (in addition to the non-MSAA version) is that the sorting step just accounts for the sample, by only adding a fragment to the final (to be sorted) fragment list if the current sample is actually covered by this fragment:
What was something like (beware, pseudocode extrapolated from your small snippet and the DX-link):
while(fragment.next != 0xFFFFFFFF)
{
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
is now
while(fragment.next != 0xFFFFFFFF)
{
if(fragment.coverage & (1 << gl_SampleID))
fragment_list[count++] = vec2(fragment.depth, fragment.color);
fragment = fragments[fragment.next];
}
Or something along those lines.
EDIT: To your updated code, you have to increment fragment_count only inside the if(covered) block, since we don't want to add the fragment to the list if the sample is not covered. Incrementing it always will likely result in the artifacts you see at the edges, which are the regions where the MSAA (and thus the coverage) comes into play.
On the other hand the list pointer has to be forwarded (current_index = fragment.x) in each loop iteration and not only if the sample is covered, as otherwise it can result in an infinite loop, like in your case. So your code should look like:
while (current_index != 0 && fragment_count < MAX_FRAGMENTS )
{
uvec4 fragment = imageLoad(list_buffer, int(current_index));
uint coverage = fragment.w;
if((coverage &(1 << gl_SampleID))!=0)
fragment_list[fragment_count++] = fragment;
current_index = fragment.x;
}
The OpenGL 4.3 Spec states in 7.1 about the gl_SampleID builtin variable:
Any static use of this variable in a fragment shader causes the entire shader to be evaluated per-sample.
(This has already been the case in the ARB_sample_shading and is also the case for gl_SamplePosition or a custom variable declared with the sample qualifier)
Therefore it is quite automatic, because you will probably need the SampleID anyway.