I am working on a 2d project and I noticed the following issue:
As you can see in the gif above, when the object is making small movements, its vertices jitter.
To render, every frame I clear a VBO, calculate the new positions of the vertices and then insert them to the VBO. Every frame, I create the exact same structure, but from a different origin.
Is there a way to get smooth motion even when the displacement between each frame is so minor?
I am using SDL2 so double buffering is enabled by default.
This is a minor issue, but it becomes very annoying once I apply a texture to the model.
Here is the vertex shader I am using:
#version 330 core
layout (location = 0) in vec2 in_position;
layout (location = 1) in vec2 in_uv;
layout (location = 2) in vec3 in_color;
uniform vec2 camera_position, camera_size;
void main() {
gl_Position = vec4(2 * (in_position - camera_position) / camera_size, 0.0f, 1.0f);
}
What you see is caused by the rasterization algorithm. Consider the following two rasterizations of the same geometry (red lines) offset by only half a pixel:
As can be seen, shifting by just half a pixel can change the perceived spacing between the vertical lines from three pixels to two pixels. Moreover, the horizontal lines didn't shift, therefore their appearance didn't change.
This inconsistent behavior is what manifests as "wobble" in your animation.
One way to solve this is to enable anti-aliasing with glEnable(GL_LINE_SMOOTH). Make sure to have correct blending enabled. This will, however, result in blurred lines when they fall right between the pixels.
If instead you really need the crisp jagged line look (eg pixel art), then you need to make sure that your geometry only ever moves by an integer number of pixels:
vec2 scale = 2/camera_size;
vec2 offset = -scale*camera_position;
vec2 pixel_size = 2/viewport_size;
offset = round(offset/pixel_size)*pixel_size; // snap to pixels
gl_Position = vec4(scale*in_position + offset, 0.0f, 1.0f);
Add viewport_size as a uniform.
Related
I'm trying to implement omni-directional shadow mapping by following this tutorial from learnOpenGL, its idea is very simple: in the shadow pass, we're going to capture the scene from the light's perspective into a cubemap (shadow map), and we can use the geometry shader to build the depth cubemap with just one render pass. Here's the shader code for generating our shadow map:
vertex shader
#version 330 core
layout (location = 0) in vec3 aPos;
uniform mat4 model;
void main() {
gl_Position = model * vec4(aPos, 1.0);
}
geometry shader
#version 330 core
layout (triangles) in;
layout (triangle_strip, max_vertices=18) out;
uniform mat4 shadowMatrices[6];
out vec4 FragPos; // FragPos from GS (output per emitvertex)
void main() {
for (int face = 0; face < 6; ++face) {
gl_Layer = face; // built-in variable that specifies to which face we render.
for (int i = 0; i < 3; ++i) // for each triangle vertex {
FragPos = gl_in[i].gl_Position;
gl_Position = shadowMatrices[face] * FragPos;
EmitVertex();
}
EndPrimitive();
}
}
fragment shader
#version 330 core
in vec4 FragPos;
uniform vec3 lightPos;
uniform float far_plane;
void main() {
// get distance between fragment and light source
float lightDistance = length(FragPos.xyz - lightPos);
// map to [0;1] range by dividing by far_plane
lightDistance = lightDistance / far_plane;
// write this as modified depth
gl_FragDepth = lightDistance;
}
Compared to classic shadow mapping, the main difference here is that we are explicitly writing to the depth buffer, with linear depth values between 0.0 and 1.0. Using this code I can correctly cast shadows in my own scene, but I cannot fully understand the fragment shader, and I think this code is flawed, here is why:
Image that we have 3 spheres sitting on a floor, and a point light above the spheres. Looking down the floor from the point light, we can see the -z slice of the shadow map: (in RenderDoc textures are displayed bottom up, sorry for that).
If we write gl_FragDepth = lightDistance in the fragment shader, we are manually updating the depth buffer so the hardware cannot perform the early z test, as a result, every fragment will go through our shader code to update the depth buffer, no fragment is discarded early to save performance. Now what if we draw the floor after the spheres?
The sphere fragments will write to the depth buffer first (per sample), followed by the floor fragments, but since the floor is farther away from the point light, it will overwrite the depth values of the sphere with larger values, and the shadow map will be incorrect. In this case, the order of drawing is important, distant objects must be drawn first, but it's not always possible to sort depth values for complex geometry. Perhaps we need something like order-independent transparency here?
To make sure that only the closest depth values are written to the shadow map, I modified the fragment shader a little bit:
// solution 1
gl_FragDepth = min(gl_FragDepth, lightDistance);
// solution 2
if (lightDistance < gl_FragDepth) {
gl_FragDepth = lightDistance;
}
// solution 3
gl_FragDepth = 1.0;
gl_FragDepth = min(gl_FragDepth, lightDistance);
However, according to the OpenGL specification, none of them is going to work. Solution 2 cannot work because, if we were to update gl_FragDepth manually, we must update it in all execution paths. As for solution 1, when we clear the depth buffer using glClearNamedFramebufferfv(id, GL_DEPTH, 0, &clear_depth), the depth buffer will be filled with value clear_depth, which is usually 1.0, but the default value of gl_FragDepth variable is not the same as clear_depth, it is actually undefined, so could be anything between 0 and 1. On my driver the default value is 0, so gl_FragDepth = min(0.0, lightDistance) is 0, the shadow map will be completely black. Solution 3 also won't work because we are still overwriting the previous depth value.
I learned that for OpenGL 4.2 and above, we can enforce the early z test by redeclaring the gl_FragDepth variable using:
layout (depth_<condition>) out float gl_FragDepth;
since my depth comparision function is the default glDepthFunc(GL_LESS), the condition needs to be depth_greater in order for the hardware to do early z. Unfortunately, this also won't work as we are writing linear depth values to the buffer, which are always less than the default non-linear depth value gl_FragCoord.z, so the condition is really depth_less. Now I'm completely stuck, the depth buffer seems to be way more difficult than I thought.
Where might my reasoning be wrong?
You said:
The sphere fragments will write to the depth buffer first (per sample),
followed by the floor fragments, but since the floor is farther away from the
point light, it will overwrite the depth values of the sphere with larger
values, and the shadow map will be incorrect.
But if your fragment shader is not using early depth tests, then the hardware will perform depth testing after the fragment shader has executed.
From the OpenGL 4.6 specification, section 14.9.4:
When...the active program was linked with early fragment tests disabled,
these operations [including depth buffer test] are performed only after
fragment program execution
So if you write to gl_FragDepth in the fragment shader, the hardware cannot take advantage of the speed gain of early depth testing, as you said, but that doesn't mean that depth testing won't occur. So long as you are using GL_LESS or GL_LEQUAL for the depth test, objects that are further away won't obscure objects that are closer.
I'm trying to develop a map for a 2D tile based game, the approach I'm using is to save the map images in a large texture (tileset) and draw only the desired tiles on the screen by updating the positions through vertex shader, however on a 10x10 map involves 100 glDrawArrays calls, looking through the task manager, this consumes 5% of CPU usage and 4 ~ 5% of GPU, imagine if it was a complete game with dozens of calls, there is a way to optimize this, such as preparing the whole scene and just make 1 draw call, drawing all at once, or some other approach?
void GameMap::draw() {
m_shader - > use();
m_texture - > bind();
glBindVertexArray(m_quadVAO);
for (size_t r = 0; r < 10; r++) {
for (size_t c = 0; c < 10; c++) {
m_tileCoord - > setX(c * m_tileHeight);
m_tileCoord - > setY(r * m_tileHeight);
m_tileCoord - > convert2DToIso();
drawTile(0);
}
}
glBindVertexArray(0);
}
void GameMap::drawTile(GLint index) {
glm::mat4 position_coord = glm::mat4(1.0 f);
glm::mat4 texture_coord = glm::mat4(1.0 f);
m_srcX = index * m_tileWidth;
GLfloat clipX = m_srcX / m_texture - > m_width;
GLfloat clipY = m_srcY / m_texture - > m_height;
texture_coord = glm::translate(texture_coord, glm::vec3(glm::vec2(clipX, clipY), 0.0 f));
position_coord = glm::translate(position_coord, glm::vec3(glm::vec2(m_tileCoord - > getX(), m_tileCoord - > getY()), 0.0 f));
position_coord = glm::scale(position_coord, glm::vec3(glm::vec2(m_tileWidth, m_tileHeight), 1.0 f));
m_shader - > setMatrix4("texture_coord", texture_coord);
m_shader - > setMatrix4("position_coord", position_coord);
glDrawArrays(GL_TRIANGLES, 0, 6);
}
--Vertex Shader
#version 330 core
layout (location = 0) in vec4 vertex; // <vec2 position, vec2 texCoords>
out vec4 TexCoords;
uniform mat4 texture_coord;
uniform mat4 position_coord;
uniform mat4 projection;
void main()
{
TexCoords = texture_coord * vec4(vertex.z, vertex.w, 1.0, 1.0);
gl_Position = projection * position_coord * vec4(vertex.xy, 0.0, 1.0);
}
-- Fragment Shader
#version 330 core
out vec4 FragColor;
in vec4 TexCoords;
uniform sampler2D image;
uniform vec4 spriteColor;
void main()
{
FragColor = vec4(spriteColor) * texture(image, vec2(TexCoords.x, TexCoords.y));
}
The Basic Technique
The first thing you want to do is set up your 10x10 grid vertex buffer. Each square in the grid is actually two triangles. And all the triangles will need their own vertices because the UV coordinates for adjacent tiles are not the same, even though the XY coordinates are the same. This way each triangle can copy the area out of the texture atlas that it needs to and it doesn't need to be contiguous in UV space.
Here's how the vertices of two adjacent quads in the grid will be set up:
1: xy=(0,0) uv=(Left0 ,Top0)
2: xy=(1,0) uv=(Right0,Top0)
3: xy=(1,1) uv=(Right0,Bottom0)
4: xy=(1,1) uv=(Right0,Bottom0)
5: xy=(0,1) uv=(Left0 ,Bottom0)
6: xy=(0,0) uv=(Left0 ,Top0)
7: xy=(1,0) uv=(Left1 ,Top1)
8: xy=(2,0) uv=(Right1,Top1)
9: xy=(2,1) uv=(Right1,Bottom1)
10: xy=(2,1) uv=(Right1,Bottom1)
11: xy=(1,1) uv=(Left1 ,Bottom1)
12: xy=(1,0) uv=(Left1 ,Top1)
These 12 vertices define 4 triangles. The Top, Left, Bottom, Right UV coordinates for the first square can be completely different from the coordinates of the second square, thus allowing each square to be textured by a different area of the texture atlas. E.g. see below to see how the UV coordinates for each triangle map to a tile in the texture atlas.
In your case with your 10x10 grid, you would have 100 quads, or 200 triangles. With 200 triangles at 3 vertices each, that would be 600 vertices to define. But it's a single draw call of 200 triangles (600 vertices). Each vertex has its own x, y, u, v, coordinates. To change which tile a quad is, you have to update the uv coordinates of 6 vertices in your vertex buffer.
You will likely find that this is the most convenient and efficient approach.
Advanced Approaches
There are more memory efficient or convenient ways of setting this up with multiple streams to reduce duplication of vertices and leverage shaders to do the work of setting it up if you're willing to trade off computation time for memory or convenience. Find the balance that is right for you. But you should grasp the basic technique first before trying to optimize.
But in the multiple-stream approach, you could specify all the xy vertices separately from all the uv vertices to avoid duplication. You could also specify a second set of texture coordinates which was just the top-left corner of the tile in the atlas and let the uv coordinates just go from 0,0 (top left) to 1,1 (bottom right) for each quad, then let your shader scale and transform the uv coordinates to arrive at final texture coordinates. You could also specify a single uv coordinate of the top-left corner of the source area for each primitive and let a geometry shader complete the squares. And even smarter, you could specify only the x,y coordinates (omitting the uv coordinates entirely) and in your vertex shader, you can sample a texture that contains the "tile numbers" of each quad. You would sample this texture at coordinates based on the x,y values in the grid, and then based on the value you read, you could transform that into the uv coordinates in the atlas. To change the tile in this system, you just change the one pixel in the tile map texture. And finally, you could skip generating the primitives entirely and derive them entirely from a single list sent to the geometry shader and generate the x,y coordinates of the grid which gets sent downstream to the vertex shader to complete the triangle geometry and uv coordinates of the grid, this is the most memory efficient, but relies on the GPU to compute the setup at runtime.
With a static 6-vertices-per-triangle setup, you free up GPU processing at the cost of a little extra memory. Depending on what you need for performance, you may find that using up more memory to get higher fps is desirable. Vertex buffers are tiny compared to textures anyway.
So as I said, you should start with the basic technique first as it's likely also the optimal solution for performance as well, especially if your map doesn't change very often.
You can upload all parameters to gpu memory and draw everything using only one draw call.
This way it's not required to update vertex shader uniforms and you should have zero cpu load.
It's been 3 years since I used OpenGL so I can only point you into the right direction.
Start reading some material like for instance:
https://ferransole.wordpress.com/2014/07/09/multidrawindirect/
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glDrawArraysIndirect.xhtml
Also, keep in mind this is GL 4.x stuff, check your target platform (software+hardware) GL version support.
I'm working on the SSAO (Screen-Space Ambient Occlusion) algorithm using Oriented-Hemisphere rendering technique.
I) The algorithm
This algorithm requires as inputs:
1 array containing precomputed samples (loaded before the main loop -> In my example I use 64 samples oriented according to the z axis).
1 noise texture containing normalized rotation vectors also oriented according to the z axis (this texture is generated once).
2 textures from the GBuffer: the 'PositionSampler' and the 'NormalSampler' containing the positions and normal vectors in view space.
Here's the fragment shader source code I use:
#version 400
/*
** Output color value.
*/
layout (location = 0) out vec4 FragColor;
/*
** Vertex inputs.
*/
in VertexData_VS
{
vec2 TexCoords;
} VertexData_IN;
/*
** Inverse Projection Matrix.
*/
uniform mat4 ProjMatrix;
/*
** GBuffer samplers.
*/
uniform sampler2D PositionSampler;
uniform sampler2D NormalSampler;
/*
** Noise sampler.
*/
uniform sampler2D NoiseSampler;
/*
** Noise texture viewport.
*/
uniform vec2 NoiseTexOffset;
/*
** Ambient light intensity.
*/
uniform vec4 AmbientIntensity;
/*
** SSAO kernel + size.
*/
uniform vec3 SSAOKernel[64];
uniform uint SSAOKernelSize;
uniform float SSAORadius;
/*
** Computes Orientation matrix.
*/
mat3 GetOrientationMatrix(vec3 normal, vec3 rotation)
{
vec3 tangent = normalize(rotation - normal * dot(rotation, normal)); //Graham Schmidt process
vec3 bitangent = cross(normal, tangent);
return (mat3(tangent, bitangent, normal)); //Orientation according to the normal
}
/*
** Fragment shader entry point.
*/
void main(void)
{
float OcclusionFactor = 0.0f;
vec3 gNormal_CS = normalize(texture(
NormalSampler, VertexData_IN.TexCoords).xyz * 2.0f - 1.0f); //Normal vector in view space from GBuffer
vec3 rotationVec = normalize(texture(NoiseSampler,
VertexData_IN.TexCoords * NoiseTexOffset).xyz * 2.0f - 1.0f); //Rotation vector required for Graham Schmidt process
vec3 Origin_VS = texture(PositionSampler, VertexData_IN.TexCoords).xyz; //Origin vertex in view space from GBuffer
mat3 OrientMatrix = GetOrientationMatrix(gNormal_CS, rotationVec);
for (int idx = 0; idx < SSAOKernelSize; idx++) //For each sample (64 iterations)
{
vec4 Sample_VS = vec4(Origin_VS + OrientMatrix * SSAOKernel[idx], 1.0f); //Sample translated in view space
vec4 Sample_HS = ProjMatrix * Sample_VS; //Sample in homogeneus space
vec3 Sample_CS = Sample_HS.xyz /= Sample_HS.w; //Perspective dividing (clip space)
vec2 texOffset = Sample_CS.xy * 0.5f + 0.5f; //Recover sample texture coordinates
vec3 SampleDepth_VS = texture(PositionSampler, texOffset).xyz; //Sample depth in view space
if (Sample_VS.z < SampleDepth_VS.z)
if (length(Sample_VS.xyz - SampleDepth_VS) <= SSAORadius)
OcclusionFactor += 1.0f; //Occlusion accumulation
}
OcclusionFactor = 1.0f - (OcclusionFactor / float(SSAOKernelSize));
FragColor = vec4(OcclusionFactor);
FragColor *= AmbientIntensity;
}
And here's the result (without blur render pass):
Until here all seems to be correct.
II) The performance
I noticed NSight Debugger a very strange behaviour concerning the performance:
If I move my camera closer and closer toward the dragon the performances are drastically impacted.
But, in my mind, it should be not the case because SSAO algorithm is apply in Screen-Space and do not depend on the number of primitives of the dragon for example.
Here's 3 screenshots with 3 different camera positions (with those 3 case all 1024*768 pixel shaders are executed using all the same algorithm):
a) GPU idle : 40% (pixel impacted: 100%)
b) GPU idle : 25% (pixel impacted: 100%)
c) GPU idle : 2%! (pixel impacted: 100%)
My rendering engine uses in my example exaclly 2 render passes:
the Material Pass (filling the position and normal samplers)
the Ambient pass (filling the SSAO texture)
I thought the problem comes from the addition of the execution of these two passes but it's not the case because I've added in my client code a condition to not compute for nothing the material pass if the camera is stationary. So when I took these 3 pictures above there was just the Ambient Pass executed. So this lack of performance in not related to the material pass. An other argument I could give you is if I remove the dragon mesh (the scene with just the plane) the result is the same: more my camera is close to the plane, more the lack of performance is huge!
For me this behaviour is not logical! Like I said above, in these 3 cases all the pixel shaders are executed applying exactly the same pixel shader code!
Now I noticed another strange behaviour if I change a little piece of code directly within the fragment shader:
If I replace the line:
FragColor = vec4(OcclusionFactor);
By the line:
FragColor = vec4(1.0f, 1.0f, 1.0f, 1.0f);
The lack of performance disappears!
It means that if the SSAO code is correctly executed (I tried to place some break points during the execution to check it) and I don't use this OcclusionFactor at the end to fill the final output color, so there is no lack of performance!
I think we can conclude that the problem does not come from the shader code before the line "FragColor = vec4(OcclusionFactor);"... I think.
How can yo explain a such behaviour?
I tried a lot of combination of code both in the client code and in the fragment shader code but I can't find the solution to this problem! I'm really lost.
Thank you very much in advance for your help!
The short answer is cache efficiency.
To understand this let's look at the following lines from the inner loop:
vec4 Sample_VS = vec4(Origin_VS + OrientMatrix * SSAOKernel[idx], 1.0f); //Sample translated in view space
vec4 Sample_HS = ProjMatrix * Sample_VS; //Sample in homogeneus space
vec3 Sample_CS = Sample_HS.xyz /= Sample_HS.w; //Perspective dividing (clip space)
vec2 texOffset = Sample_CS.xy * 0.5f + 0.5f; //Recover sample texture coordinates
vec3 SampleDepth_VS = texture(PositionSampler, texOffset).xyz; //Sample depth in view space
What you are doing here is:
Translate orignal point in view space
Transform it to clip space
Sample the texture
So how does that correspond to cache efficiency?
Caches work well when accessing neighbouring pixels. For example if you are using a gaussian blur you are accessing only the neighbours, which have a high probability to be already loaded in the cache.
So let's say your object is now very far away. Then the pixels sampled in clip space are also very close to the orignal point -> high locality -> good cache performance.
If the camera is very close to your object, the sample points generated are further away (in clip space) and you are getting a random memory access pattern. That will decrease your performance drastically although you didn't actually do more operations.
Edit:
To improve performance you could reconstruct the view space position from the depth buffer of the previous pass.
If you're using a 32 bit depth buffer that decreases the amount of data required for one sample from 12 byte to 4 byte.
The position reconstruciton looks like this:
vec4 reconstruct_vs_pos(vec2 tc){
float depth = texture(depthTexture,tc).x;
vec4 p = vec4(tc.x,tc.y,depth,1) * 2.0f + 1.0f; //tranformed to unit cube [-1,1]^3
vec4 p_cs = invProj * p; //invProj: inverse projection matrix (pass this by uniform)
return p_cs / p_cs.w;
}
While you're at it, another optimization you can make is to render the SSAO texture at a reduced size, preferably half the size of your main viewport. If you do this, be sure to copy your depth texture to another half-size texture (glBlitFramebuffer) and sample your positions from that. I'd expect this to increase performance by an order of magnitude, especially in the worst-case scenario you've given.
I've written the following shader to perform a bright pass of my scene so I can extract luminance for later blurring as part of a "glow" effect.
// "Bright" pixel shader.
#version 420
uniform sampler2D Map_Diffuse;
uniform float uniform_Threshold;
in vec2 attrib_Fragment_Texture;
out vec4 Out_Colour;
void main(void)
{
vec3 luminances = vec3(0.2126, 0.7152, 0.0722);
vec4 texel = texture2D(Map_Diffuse, attrib_Fragment_Texture);
float luminance = dot(luminances, texel.rgb);
luminance = max(0.0, luminance - uniform_Threshold);
texel.rgb *= sign(luminance);
texel.a = 1.0;
Out_Colour = texel;
}
The bright areas are successfully extracted however there are some unstable features in the scene sometimes, resulting in pixels that flicker on and off for a while. When this is blurred the effect is more pronounced, with bits of glow kind-of flickering too. The artifacts occur in, for example, the third image in the screenshot I've posted, where the object is in shadow and so there's far less luminance in the scene. They're mostly present in transition from away to towards the light of course (during rotation of the object), where the edge is just hitting the light.
My question is to ask whether there's a way you can detect and mitigate this in the shader. Note that the bright pass is part of a general down-sample, from screen resolution to 512x512.
You could read the surrounding pixels also and do your math based on that.
Kind of like is done here.
I implemented a fairly simple shadow map. I have a simple obj imported plane as ground and a bunch of trees.
I have a weird shadow on the plane which I think is the plane's self shadow. I am not sure what code to post. If it would help please tell me and I'll do so then.
First image, camera view of the scene. The weird textured lowpoly sphere is just for reference of the light position.
Second image, the depth texture stored in the framebuffer. I calculated shadow coords from light perspective with it. Since I can't post more than 2 links, I'll leave this one.
Third image, depth texture with a better view of the plane projecting the shadow from a different light position above the whole scene.
LE: the second picture http://i41.tinypic.com/23h3wqf.jpg (Depth Texture of first picture)
Tried some fixes, adding glCullFace(GL_BACK) before drawing the ground in the first pass removes it from the depth texture but still appears in the final render(like in the first picture, the back part of the ground) - i tried adding CullFace in the second pass also, still showing the shadow on the ground , tried all combinations of Front and Back facing. Can it be because of the values in the ortographic projection ?
Shadow fragment shader:
#version 330 core
layout(location = 0) out vec3 color;
in vec2 texcoord;
in vec4 ShadowCoord;
uniform sampler2D textura1;
uniform sampler2D textura2;
uniform sampler2D textura_depth;
uniform int has_alpha;
void main(){
vec3 tex1 = texture(textura1, texcoord).xyz;
vec3 tex2 = texture(textura2, texcoord).xyz;
if(has_alpha>0.5) if((tex2.r<0.1) && (tex2.g<0.1) && (tex2.b<0.1)) discard;
//Z value of depth texture from pass 1
float hartaDepth=texture( textura_depth,(ShadowCoord.xy/ShadowCoord.w)).z;
float shadowValue=1.0;
if(hartaDepth < ShadowCoord.z-0.005)
shadowValue=0.5;
color = shadowValue * tex1 ;
}