My glsl code:
ivec2 readCoord = ivec2(gl_FragCoord);
readCoord.x += int(sin(time) * 100);
vec2 c = imageLoad(image, readCoord).rg;
memoryBarrier();
imageStore(image, ivec2(gl_FragCoord), vec4(c, 0, 0));
time is an uniform of type float.
A frame from the animation:
And here's how it's supposed to look:
Any idea what's going on? :)
Note: In this demo they do the same thing as me, and they don't even use memoryBarrier(). Same in this example
Related
I am trying to calculate a morph offset for a gpu driven animation.
To that effect I have the following function (and SSBOS):
layout(std140, binding = 7) buffer morph_buffer
{
vec4 morph_targets[];
};
layout(std140, binding = 8) buffer morph_weight_buffer
{
float morph_weights[];
};
vec3 GetMorphOffset()
{
vec3 offset = vec3(0);
for(int target_index=0; target_index < target_count; target_index++)
{
float w1 = morph_weights[1];
offset += w1 * morph_targets[target_index * vertex_count + gl_VertexIndex].xyz;
}
return offset;
}
I am seeing strange behaviour so I opened renderdoc to trace the state:
As you can see, index 1 of the morph_weights SSBO is 0. However if I step over in the built in debugger for renderdoc I obtain:
Or in short, the variable I get back is 1, not 0.
So I did a little experiment and changed one of the values and now the SSBO looks like this:
And now I get this:
So my SSBO of type float is being treated like an ssbo of vec4's it seems. I am aware of alignment issues with vec3's, but IIRC floats are fair game. What is happenning?
Upon doing a little bit of asking around.
The issue is the SSBO is marked as std140, the correct std for a float array is std430.
For the vulkan GLSL dialect, an alternative is to use the scalar qualifier.
I'm working on a WebGL project to create isolines on a 3D surface on macOS/amd GPU. My idea is to colour the pixels based on elevation in fragment shader. With some optimizations, I can achieve a relatively consistent line width and I am happy about that. However when I tested it on windows it behaves differently.
Then I figured out it's because of fwidth(). I use fwidth() to prevent fragment shader from coloring the whole horizontal plane when it happens to locate at a isolevel. Please see the screenshot:
I solved this issue by adding the follow glsl line:
if (fwidth(vPositionZ) < 0.001) { /**then do not colour isoline on these pixels**/ };
It works very well on macOS since I got this:
.
However, on windows/nvidia GPU all isolines are gone because fwidth(vPositionZ) always evaluates to 0.0. Which doesn't make sense to me.
What am I doing wrong? Is there any better way to solve the issue presented in the first screenshot? Thank you all!
EDIT:
Here I attach my fragment shader. It's simplified but I think that's all relevant. I know looping is slow but for now I'm not worried about it.
uniform float zmin; // min elevation
uniform vec3 lineColor;
varying float vPositionZ; // elevation value for each vertex
float interval;
vec3 originColor = finalColor.rgb; // original surface color
for ( int i = 0; i < COUNT; i ++ ) {
float elevation = zmin + float( i + 1 ) * interval;
lineColor = mix( originColor, lineColor, step( 0.001, fwidth(vPositionZ)));
if ( vPositionZ <= elevation + lineWidth && vPositionZ >= elevation - lineWidth ) {
finalColor.rgb = lineColor;
}
// same thing but without condition:
// finalColor.rgb = mix( mix( originColor, lineColor, step(elevation - lineWidth, vPositionZ) ),
// originColor,
// step(elevation + lineWidth, vPositionZ) );
}
gl_FragColor = finalColor;
Environment: WebGL2.0, es version 300, chrome browser.
Put fwidth(vPosistionZ) before the loop will work. Otherwise, fwidth() evaluates anything to 0 if it's inside a loop.
I suspect this is a bug with Nvidia GPU.
While upgrading our source code from OpenInventor 9.8 to 10.4.2 I encountered that some color computed in the fragment-shader is always black in 10.4.2 while in 9.8 everything works fine. Normally we use our own computed texture, but for debugging purposes I used an example texture from the OpenInventor examples:
SoSwitch* root = new SoSwitch;
// Fill root with geometry
root->addChild(...)
SoSeparator* localRoot = new SoSeparator;
SoFragmentShader* fragShader = new SoFragmentShader;
SoShaderProgram* shaderProgram = new SoShaderProgram;
SoTexture2 texture = new SoTexture2;
texture->filename = "pathToImage\image.png"
SoTextureUnit textureUnit = new SoTextureUnit;
texture Unit->unit = 1;
localRoot->addChild(textureUnit);
localRoot->addChild(texture);
fragShader->addShaderParameter1i("myTexture", 1);
shaderProgram->shaderObject.set1Value(1, fragShader);
root->addChild(localRoot);
root->addChild(shaderProgram);
This is the fragment-shader which works fine with 9.8:
#version 120
uniform sampler2D myTexture;
in vec3 coord; // Computed in vertex-shader
int main() {
gl_FragColor = texture2D(myTexture, coord.xy);
// For Debugging:
// gl_FragColor = vec4(coord.xy, 0, 1);
}
This is the fragment-shader which does not work with 10.4.2:
#version 410 compatibility
//!oiv_include <Inventor/oivShaderState.h>
//!oiv_include <Inventor/oivShapeAttribute.h>
//!oiv_include <Inventor/oivShaderVariables.h>
uniform sampler2D myTexture;
in vec3 coord;
int main() {
OivFragmentOutput(texture(myTexture, coord.xy)); // Is the same as gl_FragColor =
// For Debugging:
// gl_FragColor = vec4(coord.xy, 0, 1);
}
The viewer stays completely black, so I assume the call to texture returns always zero.
Uncommenting gl_FragColor = vec4(coord.xy, 0, 1); gives the same result. Therefor I assume that coordis computed correctly.
As we are jumping from version #120 to #410, I could imagine that I need to do something else to get texture work in our fragment-shader. Were there any relevant changes in GLSL. What do I need to do to get the shader working?
If relevant, here are some system information:
Operating System: Windows 10
Graphics board: NVIDIA Quadro K2200
The issue here is in your scene graph as both texture and textureUnit nodes are under a SoSeparator and are not visible to the shaderProgam, which is outside the SoSeparator localRoot. Please move these nodes out of localRoot and add them as a child to the root node to render correctly.
It was working for you with Open Inventor 9.8 because of a bug which is fixed since Open Inventor 10. Hope this helps and let us know if the issue is resolved for you.
In future, please feel free to contact Open Inventor support ( FRBOR.3d_hotline#thermofisher.com) with your questions.
Via mail the Open Inventor Support suggested another solution which is also working:
Replace
SoSeparator* localRoot = new SoSeparator;
with
SoGroup* localRoot = new SoGroup;
I have an uniform buffer like this (GLSL/GPU):
layout(std140) uniform UConstantBufferPS1
{
float m_LuminanceHistory[8];
};
I upload my data like this (C++/CPU):
SHistoryBuffer* pHistogramHistory = static_cast<SHistoryBuffer*>(Gfx::BufferManager::MapConstantBuffer(m_BufferSetPtr->GetBuffer(1)));
pHistogramHistory->m_LuminanceHistory[0] = 1.0f;
pHistogramHistory->m_LuminanceHistory[1] = 1.0f;
pHistogramHistory->m_LuminanceHistory[2] = 1.0f;
pHistogramHistory->m_LuminanceHistory[3] = 1.0f;
pHistogramHistory->m_LuminanceHistory[4] = 1.0f;
// ...
Gfx::BufferManager::UnmapConstantBuffer(m_BufferSetPtr->GetBuffer(1));
On the GLSL side everything is 0, except the first and second float value (m_LuminanceHistory[0]). It seems to be packed in a certain way!?
One bad solution is to define an array of float vectors (vec4) on CPU and GPU. Then I can iterate inside this array and read every x-value of the array. But then I have a big overhead.
Is there any good solution? Thx 4 ur help!
EDIT:
I used the following solution:
layout(std140) uniform UConstantBufferPS1
{
vec4 m_LuminanceHistory[2];
};
float History[8];
History[0] = m_LuminanceHistory[0].x;
History[1] = m_LuminanceHistory[0].y;
History[2] = m_LuminanceHistory[0].z;
History[3] = m_LuminanceHistory[0].w;
History[4] = m_LuminanceHistory[1].x;
History[5] = m_LuminanceHistory[1].y;
History[6] = m_LuminanceHistory[1].z;
History[7] = m_LuminanceHistory[1].w;
This solution works as expected but I don't know why I can't use float[8] directly.
Without having much detail on your Gtx::BufferManager, I'm guessing the following possibilities that might help you to debug.
Try commenting out the layout(std140).
Double check that you have called glBindBufferBase() to bind the target of buffer object with the CORRECT binding point (By default it should start with 0, if you didn't specify it as you declare UConstantBufferPS1 in the shader).
I am writing on some 3D textures that are defined as follows:
layout (binding = 0, rgba8) coherent uniform image3D volumeGeom[MIPLEVELS];
layout (binding = 4, rgba8) coherent uniform image3D volumeNormal[MIPLEVELS];
and I am writing the following values
float df = dot(normalize(In.WorldNormal), -gSpotLight.Direction);
if(df < 0) df = 0;
else df = 1;
fragmentColor = texture2D(gSampler, In.TexCoord0.xy);
imageStore(volumeGeom[In.GeomInstance], ivec3(coords1), fragmentColor);
imageStore(volumeNormal[In.GeomInstance], ivec3(coords1), vec4(normalize(In.WorldNormal), 1.0));
fragmentColor = vec4(fragmentColor.xyz*CalcShadowFactor(In.LightSpacePos)*df,1.0);
as you can see I write the first fragmentColor and I purposely put the other fragmentColor after the store even though I don't do anything with it. This configuration runs at 21 FPS. If I then do this
float df = dot(normalize(In.WorldNormal), -gSpotLight.Direction);
if(df < 0) df = 0;
else df = 1;
fragmentColor = texture2D(gSampler, In.TexCoord0.xy);
fragmentColor = vec4(fragmentColor.xyz*CalcShadowFactor(In.LightSpacePos)*df,1.0);
imageStore(volumeGeom[In.GeomInstance], ivec3(coords1), fragmentColor);
imageStore(volumeNormal[In.GeomInstance], ivec3(coords1), vec4(normalize(In.WorldNormal), 1.0));
in which the second fragmentColor is computed before and stored in volumeGeom the entire thing runs at 13 FPS. This means that imageStore is running slower depending on the values I am writing in. Is that because of some compiler optimization?
Basically the second fragmentColor is 0 for surfaces that are in shadow or backface the light.
Not sure how complicated your CalcShadowFactor method is, but it is likely that in the first version, the image stores can overlap with your calculation, while in the second, your calculation has to end first before the stores can be called. This, plus the increased register pressure, can be already enough to make the #2 version much slower than the #1.
If this is the complete source code, it is also likely that in the #1 case the call to CalcShadowFactor is completely optimized out, as the result is never used (if fragmentColor is never read again and is not an output.)