Slow texture fetch in fragment shader using Vulkan - glsl

I am doing a SSAO shader with a kernel size of 64.
SSAO fragment shader:
const int kernelSize = 64;
for (int i = 0; i < kernelSize; i++) {
//Get sample position
vec3 s = tbn * ubo.kernel[i].xyz;
s = s * radius + origin;
vec4 offset = vec4(s, 1.0);
offset = ubo.projection * offset;
offset.xy /= offset.w;
offset.xy = offset.xy * 0.5 + 0.5;
float sampleDepth = texture(samplerposition, offset.xy).z;
float rangeCheck = abs(origin.z - sampleDepth) < radius ? 1.0 : 0.0;
occlusion += (sampleDepth >= s.z ? 1.0 : 0.0) * rangeCheck;
}
The samplerposition texture has the format VK_FORMAT_R16G16B16A16_SFLOAT and is uploaded with the flag VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT.
Im using a laptop with a nvidia K1100M graphic card. If I run the code in renderdoc, this shader takes 114 ms. And if I change the kernelSize to 1, it takes 1 ms.
Is this texture fetch time normal? Or can it be that I have set up something wrong somewhere?
Like the layout transition did not go through, so the texture is in VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL instead of VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.

GPU memory relies on heavy cache usage, which is very limited if fragments close to each other do not sample texels that are next to each other - also known as a lack of spatial coherence. I would expect about 10x slowdowns or more on random access to a texture versus linear, coherent access. SSAO is very prone to this when used with large radii.
I recommend using smaller radii and optimizing the texture accesses. You're sampling 4 16 bit floats, but you're only using one. Blitting the depth to a separate 16 bit depth only image should give you an easy 4x speedup.

You are calculating the Texture coordinates on the fragment shader which means you are not allowing the GPU to pre-fetch the textures. Better calculate all texture coordinates on the Vertex shader and pass it as varying.
Updated:
I would suggest adding some advanced tricks on SSAO than trying to purely calculate the AO map.
1. You can render a much smaller AO Map and upscale it by adding some blur filter. This will give much better results.
2. If you are trying to do realtime rendering, then AO Map does not need to be calculated every frame. You can fake it based on your setup.
Disclaimer: I do a lot of OpenGL ES based shaders, and my knowledge is mostly limited to Mobile Platforms.

Related

OpenGL GLSL Shadows not working correctly

I`m trying to implement shadowmaps in Java/OpenGL with GLSL.
It seems to be impossible to create shadow maps with Java/OpenGL, there is almost no working example with perspective projection.
What I think is, that the matrix calculation isnt working well.
Here is my shadow result (camera view/proj = shadow view/proj):
And here I have mapped the linearized depth buffer on a rectangle, its a little bit rotated:
It seems like the depth buffer is flipped, because on every surface I have mapped it, it is x or/and y flipped. But maybe its just a UV bug.
So the major question is: Can you give me a hint what may happened?
Here are some code snippets:
Final Shader: Depth & Shadow calculation (uSamplerShadow is sampler2D)
float shadowValue=0.0;
vec4 lightVertexPosition2=vShadowCoord;
lightVertexPosition2/=lightVertexPosition2.w;
for(float x=-0.001;x<=0.001;x+=0.0005)
for(float y=-0.001;y<=0.001;y+=0.0005)
{
if(texture2D(uSamplerShadow,lightVertexPosition2.xy+vec2(x,y)).r>=lightVertexPosition2.z)
shadowValue+=1.0;
}
shadowValue/=16.0;
float f = 100.0;
float n = 0.1;
float z = (2 * n) / (f + n - texture2D(uSamplerShadow,vTexCoords).x * (f - n));
outColor = vec4(vec3(z) , 1.0);
Final Shader: Shadow coord calulation: (No bias matrix implemented yet)
vShadowCoord = uProjectionMatrix * uShadowViewMatrix * uWorldMatrix * vec4(aPosition,1.0);
Depth Shader
fragmentdepth = gl_FragCoord.z;
You can check my texture properties too, but I have already tried all combinations I found in on google :)
shadowTextureProperties.setMagFilter(EnumTextureFilter.NEAREST);
shadowTextureProperties.setMinFilter(EnumTextureFilter.NEAREST);
shadowTextureProperties.setWrapS(EnumTextureWrap.CLAMP_TO_EDGE);
shadowTextureProperties.setWrapT(EnumTextureWrap.CLAMP_TO_EDGE);
shadowTextureProperties.setInternalColorFormat(EnumTextureColorFormat.DEPTH_COMPONENT16);
shadowTextureProperties.setSrcColorFormat(EnumTextureColorFormat.DEPTH_COMPONENT);
shadowTextureProperties.setValueFormat(EnumValueFormat.FLOAT);
shadowTextureProperties.setPname(new int[]{GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_TEXTURE_COMPARE_FUNC});
shadowTextureProperties.setParam(new int[]{GL11.GL_NONE, GL11.GL_LEQUAL});
First thing:
shadowTextureProperties.setPname(new int[]{GL14.GL_TEXTURE_COMPARE_MODE, GL14.GL_TEXTURE_COMPARE_FUNC});
GL_TEXTURE_COMPARE_FUNC is not a valid parameter for GL_TEXTURE_COMPARE_MODE. According to the reference, only GL_NONE and GL_COMPARE_R_TO_TEXTURE are allowed.
GL_COMPARE_R_TO_TEXTURE
One has to use a shadow sampler (sampler2DShadow) and the corresponding texture overload:
float texture( sampler2DShadow sampler, vec3 P)
Here, the sampler is sampled at location P.xy and the read value is compared to P.z. The result of this operation is a lightning factor (0.0 when completely shadowed, 1.0 when no shadow).
GL_NONE
When you want to do the comparison yourself, then you have to set the GL_COMPARE_R_TO_TEXTURE to GL_NONE

GLSL - Using a 2D texture for 3D Perlin noise instead of procedural 3D noise

I implemented a shader for the sun surface which uses simplex noise from ashima/webgl-noise. But it costs too much GPU time, especially if I'm going to use it on mobile devices. I need to do the same effect but using a noise texture. My fragment shader is below:
#ifdef GL_ES
precision highp float;
#endif
precision mediump float;
varying vec2 v_texCoord;
varying vec3 v_normal;
uniform sampler2D u_planetDay;
uniform sampler2D u_noise; //noise texture (not used yet)
uniform float u_time;
#include simplex_noise_source from Ashima
float noise(vec3 position, int octaves, float frequency, float persistence) {
float total = 0.0; // Total value so far
float maxAmplitude = 0.0; // Accumulates highest theoretical amplitude
float amplitude = 1.0;
for (int i = 0; i < octaves; i++) {
// Get the noise sample
total += ((1.0 - abs(snoise(position * frequency))) * 2.0 - 1.0) * amplitude;
//I USE LINE BELOW FOR 2D NOISE
total += ((1.0 - abs(snoise(position.xy * frequency))) * 2.0 - 1.0) * amplitude;
// Make the wavelength twice as small
frequency *= 2.0;
// Add to our maximum possible amplitude
maxAmplitude += amplitude;
// Reduce amplitude according to persistence for the next octave
amplitude *= persistence;
}
// Scale the result by the maximum amplitude
return total / maxAmplitude;
}
void main()
{
vec3 position = v_normal *2.5+ vec3(u_time, u_time, u_time);
float n1 = noise(position.xyz, 2, 7.7, 0.75) * 0.001;
vec3 ground = texture2D(u_planetDay, v_texCoord+n1).rgb;
gl_FragColor = vec4 (color, 1.0);
}
How can I correct this shader to work with a noise texture and what should the texture look like?
As far as I know, OpenGL ES 2.0 doesn't support 3D textures. Moreover, I don't know how to create 3D texture.
I wrote this 3D noise from a 2D texture function. It still uses hardware interpolation for x/y directions and then manually interpolates for z. To get noise along the z direction I've sampled the same texture at different offsets. This will probably lead to some repetition, but I haven't noticed any in my application and my guess is using primes helps.
The thing that had me stumped for a while on shadertoy.com was that texture mipmapping was enabled which caused seams at the change in value of the floor() function. A quick solution was passing a -999 bias to texture2D.
This was hard coded for a 256x256 noise texture, so adjust accordingly.
float noise3D(vec3 p)
{
p.z = fract(p.z)*256.0;
float iz = floor(p.z);
float fz = fract(p.z);
vec2 a_off = vec2(23.0, 29.0)*(iz)/256.0;
vec2 b_off = vec2(23.0, 29.0)*(iz+1.0)/256.0;
float a = texture2D(iChannel0, p.xy + a_off, -999.0).r;
float b = texture2D(iChannel0, p.xy + b_off, -999.0).r;
return mix(a, b, fz);
}
Update: To extend to perlin noise, sum samples at different frequencies:
float perlinNoise3D(vec3 p)
{
float x = 0.0;
for (float i = 0.0; i < 6.0; i += 1.0)
x += noise3D(p * pow(2.0, i)) * pow(0.5, i);
return x;
}
Trying to evaluate noise at run-time is often a bad practice unless you want to do some research work or to quickly check / debug your noise function (or see what your noise parameters visually look like).
It will always consume too much processing budget (not worth it at all), so just forget about evaluating noise at run-time.
If you store your noise results off-line, you will reduce the charge (by say over 95 %) to a simple access to memory.
I suggest to reduce all this to a texture look-up over a pre-baked 2D noise image. You are so far only impacting the fragment pipeline so a 2D noise texture is definitely the way to go (you can also use this 2D lookup for vertex positions deformation).
In order to map it on a sphere without any continuity issue, you may generate a loopable 2D image with a 4D noise, feeding the function with the coordinates of two 2D circles.
As for animating it, there are various hackish tricks either by deforming your lookup results with the time semantic in the fragment pipeline, or baking an image sequence in case you really need noise "animated with noise".
3D textures are just stacks of 2D textures, so they are too heavy to manipulate (even without animation) for what you want to do, and since you apparently need only a decent sun surface, it would be overkill.

The cost of texture2D in glsl

Quite confuse about this function
In the codes of glsl, I always see something like this
uniform sampler2D source;
varying vec2 textureCordi;
void main()
{
vec2 uv = textureCordi.xy;
vec3 t1 = texture2D(source, vec2(uv.x - step_w, uv.y - step_h)).rgb; //2
float average = (t1.r + t1.b + t1.g) / 3.0;
//.....
}
In //2, the t1 save the data of the source(I think it is data), but how many data do it copy?
The coordinates of texture are between 0~1, assume that texture is an image
and the size of the image is 1024 * 768
the t1 would save 1024 * 768 number of pixels?
What would gpu do under this command?
If t1 do heavy copy job, could I ask the texture2D return
the reference of the source to t1(like c++)?
The operation looks up exactly one texel, that corresponds to the texture coordinate you passed into (taking filtering, mipmaps etc into account). The texture coordinate is the normalized coordinate of the texel you want to fetch.
Edit: t1 holds the rgb value of the one texel the operation requested (as a vector with 3 components). The normalized texture coordinate is the input into texture2D. The following line calculates the average intensity of the three channels of that one texel, not the average of the whole texture.
That is operation per fragment, and for each fragment, with texture2D you sample single texel. Many parallel operations are done on gpu for whole primitive, and in some way, yes, all data is "stored" in the end in some output buffer, but each main() functions runs only for current fragment. It is unaware what's going on with other fragments, so every operation is per-fragment.
To clarify this further more, this should help, but it might be overkill:
GPGPU - http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html
Fragment shader pipeline - http://www.lighthouse3d.com/tutorials/glsl-tutorial/fragment-processor/
Hope this helps.

Is a position buffer required for deferred rendering?

I'm trying to avoid the uses of a Position Buffer by projecting Screen Space Points back into View Space to use with lighting. I have tried multiplying by the inverse projection matrix, but this does not give back the View Space point. Is it worth it to add matrix multiplication to avoid the Position Buffer?
Final-pass Shader:
vec3 ScreenSpace = vec3(0.0,0.0,0.0);
ScreenSpace.xy = (texcoord.xy * 2.0) - 1.0;
ScreenSpace.z = texture2D(depthtex, texcoord.xy);
vec4 ViewSpace = InvProjectionMatrix * vec3(ScreenSpace, 1.0);
ViewSpace.xyz = ViewSpace.w;
Most of your answer can be found on this answer, which is far too long and involved to repost. However, part of your problem is that you're using texcoord and not gl_FragCoord.
You want to use gl_FragCoord, because this is guaranteed by OpenGL to be the right value (assuming your deferred pass and your lighting pass use images with the same size), no matter what. Also, it keeps you from having to pass a value from the vertex shader to the fragment shader.
The downside is that you need the size of the output screen to interpret it. But that's easy enough, assuming again that the two passes use images of the same size:
ivec2 size = textureSize(depthtex, 0);
You can use size for the size of the viewport to convert gl_FragCoord.xy into texture coordinates and window-space positions.

OpenGL GLSL SSAO Implementation

I try to implement Screen Space Ambient Occlusion (SSAO) based on the R5 Demo found here: http://blog.nextrevision.com/?p=76
In Fact I try to adapt their SSAO - Linear shader to fit into my own little engine.
1) I calculate View Space surface normals and Linear depth values.
I Store them in a RGBA texture using the following shader:
Vertex:
varNormalVS = normalize(vec3(vmtInvTranspMatrix * vertexNormal));
depth = (modelViewMatrix * vertexPosition).z;
depth = (-depth-nearPlane)/(farPlane-nearPlane);
gl_Position = pvmtMatrix * vertexPosition;
Fragment:
gl_FragColor = vec4(varNormalVS.x,varNormalVS.y,varNormalVS.z,depth)
For my linear depth calculation I referred to: http://www.gamerendering.com/2008/09/28/linear-depth-texture/
Is it correct?
Texture seem to be correct, but maybe it is not?
2) The actual SSAO Implementation:
As mentioned above the original can be found here: http://blog.nextrevision.com/?p=76
or faster: on pastebin http://pastebin.com/KaGEYexK
In contrast to the original I only use 2 input textures since one of my textures stores both, normals as RGB and Linear Depht als Alpha.
My second Texture, the random normal texture, looks like this:
http://www.gamerendering.com/wp-content/uploads/noise.png
I use almost exactly the same implementation but my results are wrong.
Before going into detail I want to clear some questions first:
1) ssao shader uses projectionMatrix and it's inverse matrix.
Since it is a post processing effect rendered onto a screen aligned quad via orthographic projection, the projectionMatrix is the orthographic matrix. Correct or Wrong?
2) Having a combined normal and Depth texture instead of two seperate ones.
In my opinion this is the biggest difference between the R5 implementation and my implementation attempt. I think this should not be a big problem, however, due to different depth textures this is most likley to cause problems.
Please note that R5_clipRange looks like this
vec4 R5_clipRange = vec4(nearPlane, farPlane, nearPlane * farPlane, farPlane - nearPlane);
Original:
float GetDistance (in vec2 texCoord)
{
//return texture2D(R5_texture0, texCoord).r * R5_clipRange.w;
const vec4 bitSh = vec4(1.0 / 16777216.0, 1.0 / 65535.0, 1.0 / 256.0, 1.0);
return dot(texture2D(R5_texture0, texCoord), bitSh) * R5_clipRange.w;
}
I have to admit I do not understand the code snippet. My depth his stored in the alpha of my texture and I thought it should be enought to just do this
return texture2D(texSampler0, texCoord).a * R5_clipRange.w;
Correct or Wrong?
Your normal texture seems wrong. My guess is that your vmtInvTranspMatrix is a model-view matrix. However it should be model-view-projection matrix (note you need screen space normals, not view space normals). The depth calculation is correct.
I've implemented SSAO once and the normal texture looks like this (note there is no blue here):
1) ssao shader uses projectionMatrix and it's inverse matrix.
Since it is a post processing effect rendered onto a screen aligned quad via orthographic projection, the projectionMatrix is the orthographic matrix. Correct or Wrong ?
If you mean the second pass where you are rendering a quad to compute the actual SSAO, yes. You can avoid the multiplication by the orthogonal projection matrix altogether. If you render screen quad with [x,y] dimensions ranging from -1 to 1, you can use really simple vertex shader:
const vec2 madd=vec2(0.5,0.5);
void main(void)
{
gl_Position = vec4(in_Position, -1.0, 1.0);
texcoord = in_Position.xy * madd + madd;
}
2) Having a combined normal and Depth texture instead of two seperate
ones.
Nah, that won't cause problems. It's a common practice to do so.