I need to output 24 indices per fragment in a shader. I already reached the maximum amount of rendertargets because I'm using four other rendertargets for my gbuffer. So I tried to output the data with an SSBO, indexing it with the gl_FragCoord of the pixel. The problem is, that it needs to be depth correct. So I tried to use layout(early_fragment_tests) in; and watched over the indices. I can see strange per pixel errors on some spots now and it looks like the indices from the triangles below are coming through plus it stops when I'm moving the camera closer to those spots.
I double checked the indexing of the ssbo and it's correct + the indices should be the same for a whole triangle, but the flickering is per pixel. So I think the depth test works only for the rasterized per pixel output and not for the whole fragment shader code. Could it be the problem or does somebody know if the depth test should stop the whole processing of the fragment? If that's not the case, even a separate depth pre pass couldn't help me.
Here is a fragment shader example:
#version 440 core
layout(early_fragment_tests) in;
layout(location = 0) uniform sampler2D texSampler;
#include "../Header/MaterialData.glslh"
#include "../Header/CameraUBO.glslh"
layout(location = 3) uniform uint screenWidth;//horizontal screen resolution in pixel
layout(location = 0) out vec4 fsout_color;
layout(location = 1) out vec4 fsout_normal;
layout(location = 2) out vec4 fsout_material;
coherent layout(std430, binding = 3) buffer frameCacheIndexBuffer
{
uvec4 globalCachesIndices[];
};
in vec3 gsout_normal;
in vec2 gsout_texCoord;
flat in uvec4 gsout_cacheIndices[6];
flat in uint gsout_instanceIndex;
void main()
{
uint frameBufferIndex = 6 * (uint(gl_FragCoord.x) + uint(gl_FragCoord.y) * screenWidth);
for(uint i = 0; i < 6; i++)
{
globalCachesIndices[frameBufferIndex + i] = gsout_cacheIndices[i];//only the closest fragment should output
}
fsout_normal = vec4(gsout_normal * 0.5f + 0.5f, 0);
fsout_color = vec4(texture(texSampler, gsout_texCoord));
MaterialData thisMaterial = material[materialIndex[gsout_instanceIndex]];
fsout_material = vec4(thisMaterial.diffuseStrength,
thisMaterial.specularStrength,
thisMaterial.ambientStrength,
thisMaterial.specularExponent);
}
Related
UPDATE: So it turns out this was due to a bug in the C side of things, causing some of the matrix to become malformed. The shaders are all fine. So if adding uniforms causes weird things to happen, my advice would be to use a debugger to check the value of ALL uniforms and make sure that they are all being set correctly.
So I am trying to render depth to a cube map to use as a shadow map, but when I add and use a uniform in the fragment shader everything becomes white as if the shader isn't being used. No warnings or errors are generated when compiling/linking the shader.
The shader program I am using to render the depth map (setting the depth simply to the fragment z position as a test) is as follows:
//vertex shader
#version 430
in layout(location=0) vec4 vertexPositionModel;
uniform mat4 modelToWorldMatrix;
void main() {
gl_Position = modelToWorldMatrix * vertexPositionModel;
}
//geometry shader
#version 430
layout (triangles) in;
layout (triangle_strip, max_vertices=18) out;
out vec4 fragPositionWorld;
uniform mat4 projectionMatrices[6];
void main() {
for (int face = 0; face < 6; face++) {
gl_Layer = face;
for (int i = 0; i < 3; i++) {
fragPositionWorld = gl_in[i].gl_Position;
gl_Position = projectionMatrices[face] * fragPositionWorld;
EmitVertex();
}
EndPrimitive();
}
}
//Fragment shader
#version 430
in vec4 fragPositionWorld;
void main() {
gl_FragDepth = abs(fragPositionWorld.z);
}
The main shader samples from the cubemap and simply renders the depth as greyscale colour:
vec3 lightDirection = fragPositionWorld - pointLight.position;
float closestDepth = texture(shadowMap, lightDirection).r;
finalColour = vec4(vec3(closestDepth), 1.0);
The scene is a small cube in a larger cubic room, and renders as expected, dark near z = 0 and the cube projected back onto the wall (The depth map is being rendered from the centre of the room):
Good:
[2
I can move the small cube around and the projection projects correctly onto all the sides of the cubemap. All good so far.
The problem is when I add a uniform to the fragment shader, i.e:
#version 430
in vec4 fragPositionWorld;
uniform vec3 lightPos;
void main() {
gl_FragDepth = min(lightPos.y, 0.5);
}
Everything renders as white, same as if the render failed to compile:
Bad:
gDEBugger reports that the uniform is set correctly (0,4,0) but regardless of what that lightPos is, gl_FragDepth should be set to a value less than 0.5 and appear a shade of grey (which is what happens if I set gl_FragDepth = 0.5 directly), so I can only conclude that the fragment shader is not being used for some reason and the default one is being use instead. Unfortunately I have no idea why.
I'm trying to render a shadow cubemap in one pass, using layered rendering.
I've tried to be as thorough as possible :
I have bound a cubemap both depth attachment (GL_DEPTH_ATTACHMENT_32F) and color attachment 0 (GL_R32F) using glFramebufferTexture
I made sure to check whether, once the textures are attached to the FBO, that the framebuffer's completeness - it is complete
I have tried both geometry shader instancing using "layout(triangles, invocations=6) in;" and without (resorting to a for(int layer=0;layer<6;++layer) loop, setting gl_Layer = l, first for each primitive, then for each vertex)
Long story short, the first layer (ie. X+ in this case) gets rendered, but none of the others do, be it in the depth or color attachment.
It seems documentation on layered rendering is pretty sparse, even the red book spends at most half a page on it... Anyway :
The code :
Shaders :
Vertex :
#version 440 core
layout(location = 0) in vec3 attrPosition;
void main()
{
gl_Position = vec4(attrPosition, 1.0);
}
Geometry :
#version 440 core
layout(triangles, invocations = 6) in;
layout(triangle_strip, max_vertices = 18) out;
uniform mat4 dkModelMatrix;
uniform mat4 dkViewMatrices[6];
uniform mat4 dkProjectionMatrix;
void main()
{
gl_Layer = gl_InvocationID;
for(int i = 0; i < 3; ++i)
{
gl_Layer = gl_InvocationID;
gl_Position = dkProjectionMatrix * dkViewMatrices[gl_InvocationID] * dkModelMatrix * gl_in[i].gl_Position;
EmitVertex();
}
EndPrimitive();
}
Fragment :
#version 440 core
layout(location = 0) out vec4 dkFragCoord;
void main()
{
dkFragCoord = vec4( vec3(float(gl_Layer) * 0.1 + 0.5) , 1.0);
}
C++ (mostly using my engine's classes, which actually do the bare minimum and has already been tested, in the case of FBOs, with 2D (spot) shadowmaps) :
Shadowmap-related variables creation : https://gist.github.com/xtrium-lnx/77d8989b3c2370607cfc
Shadowmap rendering : https://gist.github.com/xtrium-lnx/387b97c077525be60bb4
I am trying to get a hold of how memoryBarrier() works in OpenGL 4.4
I tried the following once with a texture image and once with Shader Storage Buffer Object (SSBO).
The basic idea is to create an array of flags for however many objects that need to be rendered in my scene and then perform a simple test in the geometry shader.
For each primitive in GS, if at least one vertex passes the test, it
sets the corresponding flag in the array at the location specified
by this primitive's object ID (Object IDs are passed to GS as vertex
attributes).
I then perform a memoryBarrier() to make sure all threads have written their values.
Next, I have all primitives read from the flags array and only emit a vertex if the flag is set.
Here is some code from my shaders to explain:
// Vertex Shader:
#version 440
uniform mat4 model_view;
uniform mat4 projection;
layout(location = 0) in vec3 in_pos;
layout(location = 1) in vec3 in_color;
layout(location = 2) in int lineID;
out VS_GS_INTERFACE
{
vec4 position;
vec4 color;
int lineID;
} vs_out;
void main(void) {
vec4 pos = vec4(in_pos, 1.0);
vs_out.position = pos;
vs_out.color = vec4(in_colo, 1.0);
vs_out.lineID = lineID;
gl_Position = projection * model_view * pos;
}
and here is a simple Geometry shader in which I use only a simple test based on lineID ( I realize this test doesn't need a shared data structure but this is just to test program behavior)
#version 440
layout (lines) in;
layout (line_strip, max_vertices = 2) out;
layout (std430, binding = 0) buffer BO {
int IDs[];
};
in VS_GS_INTERFACE
{
vec4 position;
vec4 color;
int lineID;
} gs_in[];
out vec4 f_color;
void main()
{
if(gs_in[0].lineID < 500)
{
IDs[gs_in[0].lineID] = 1;
}
else
{
IDs[gs_in[0].lineID] = -1;
}
memoryBarrier();
// read back the flag value
int flag = IDs[gs_in[0].lineID];
if ( flag > 0)
{
int n;
for( n = 0; n < gl_in.length(), n++)
{
f_color = gs_in[n].color;
gl_Position = gl_in[n].gl_Position;
emitVertex();
}
}
}
No matter what value I put instead of 500, this code always renders only 2 objects. If I change the condition for rendering in the GS to if( flag > = 0) it seems to me that all objects are rendered which means the -1 is never written by the time these IDs are read back by the shader.
Can someone please explain why the writes are not coherently visible to all shader invocations despite the memoryBarrier() and what would be the most efficient work around to get this to work?
Thanks.
I am writing some font drawing shaders in OpenGL 3.3. I will render my font into a texture atlas and then generate some display lists for some text I want to draw. I would like the rendering of text to consume the least amount of resources (CPU, GPU memory, GPU time). How can I accomplish this?
Looking at Freetype-gl, I noticed that the author generates 6 indices and 4 vertices per character.
Since I am using OpenGL 3.3, I have some additional freedom. My plan was to generate 1 vertex per character plus one integer "code" per character. The character code can be used in texelFetch operations to retrieve texture coördinates and character size information. A geometry shader turns the size information and vertex into a triangle strip.
Is texelFetch going to be slower than sending more vertices/texture coördinates? Is this worth doing?, or is there are reason why it's not done in the font libraries I looked at?
Final code:
Vertex shader:
#version 330
uniform sampler2D font_atlas;
uniform sampler1D code_to_texture;
uniform mat4 projection;
uniform vec2 vertex_offset; // in view space.
uniform vec4 color;
uniform float gamma;
in vec2 vertex; // vertex in view space of each character adjusted for kerning, etc.
in int code;
out vec4 v_uv;
void main()
{
v_uv = texelFetch(
code_to_texture,
code,
0);
gl_Position = projection * vec4(vertex_offset + vertex, 0.0, 1.0);
}
Geometry shader:
#version 330
layout (points) in;
layout (triangle_strip, max_vertices = 4) out;
uniform sampler2D font_atlas;
uniform mat4 projection;
in vec4 v_uv[];
out vec2 g_uv;
void main()
{
vec4 pos = gl_in[0].gl_Position;
vec4 uv = v_uv[0];
vec2 size = vec2(textureSize(font_atlas, 0)) * (uv.zw - uv.xy);
vec2 pos_opposite = pos.xy + (mat2(projection) * size);
gl_Position = vec4(pos.xy, 0, 1);
g_uv = uv.xy;
EmitVertex();
gl_Position = vec4(pos.x, pos_opposite.y, 0, 1);
g_uv = uv.xw;
EmitVertex();
gl_Position = vec4(pos_opposite.x, pos.y, 0, 1);
g_uv = uv.zy;
EmitVertex();
gl_Position = vec4(pos_opposite.xy, 0, 1);
g_uv = uv.zw;
EmitVertex();
EndPrimitive();
}
Fragment shader:
#version 330
uniform sampler2D font_atlas;
uniform vec4 color;
uniform float gamma;
in vec2 g_uv;
layout (location = 0) out vec4 fragment_color;
void main()
{
float a = texture(font_atlas, g_uv).r;
fragment_color.rgb = color.rgb;
fragment_color.a = color.a * pow(a, 1.0 / gamma);
}
I wouldn't expect there to be a significant performance difference between your proposed method vs storing the quad vertex positions and texture coordinates in a vertex buffer. On the one hand your method requires a smaller vertex buffer and less work for the CPU. On the other hand the texelFetch calls will be more-or-less at random locations, and not make the best use of the cache. This last point may not be very significant as I guess that texture wont be very large. Also, the execution model of geometry shaders mean they can quickly become the bottleneck of the pipeline.
To answer "is this worth doing?" - I suspect not for performance reasons. Unfortunately you can't tell until you implement it and measure the performance. I think it's quite a cool idea though, so I don't think you'd be wasting your time trying it out.
Maybe you can use Atomic Counter to handle current position in text.
Here is an interresting paper on memory bandwidth
GPU perf...
You can cache the result in a fbo.
For realy fast rendering as you said, you may build a geom shader taking points as input and outputing quads and sample a texture to get additional on glyph info.
This appear effectively the best solution...
I just started with OpenGL tessellation and have run into a bit a trouble. I am tessellating series of patches formed by one vertex each. These vertices/patches are structured in a gridlike fashion to later form a terrain generated by Perlin Noise.
The problem I have run into is that starting from the second patch, and every 5th patch after that, sometimes have a lot of tessellation (not the way i configured) but most of the time it doesn't get tessellated at all.
Like so:
The two white circles mark the highly/over tessellated patches. Also note the pattern of untessellated patches.
The strange thing is that it works on my Surface Pro 2 (Intel HD4400 graphics) but bugs on my main desktop computer (AMD HD6950 graphics). Is it possible the hardware is bad?
The patches are generated with the code:
vec4* patches = new vec4[m_patchesWidth * m_patchesDepth];
int c = 0;
for (unsigned int z = 0; z < m_patchesDepth; ++z) {
for (unsigned int x = 0; x < m_patchesWidth; ++x) {
patches[c] = vec4(x * 1.5f, 0, z * 1.5f, 1.0f);
c++;
}
}
m_fxTerrain->Apply();
glGenBuffers(1, &m_planePatches);
glBindBuffer(GL_ARRAY_BUFFER, m_planePatches);
glBufferData(GL_ARRAY_BUFFER, m_patchesWidth * m_patchesDepth * sizeof(vec4), patches, GL_STATIC_DRAW);
GLuint loc = m_fxTerrain->GetAttrib("posIn");
glEnableVertexAttribArray(loc);
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, sizeof(vec4), nullptr);
delete(patches);
And drawn with:
glPatchParameteri(GL_PATCH_VERTICES, 1);
glBindVertexArray(patches);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
glDrawArrays(GL_PATCHES, 0, nrOfPatches);
Vertex Shader:
#version 430 core
in vec4 posIn;
out gl_PerVertex {
vec4 gl_Position;
};
void main() {
gl_Position = posIn;
}
Control shader:
#version 430
#extension GL_ARB_tessellation_shader : enable
layout (vertices = 1) out;
uniform float OuterTessFactor;
uniform float InnerTessFactor;
out gl_PerVertex {
vec4 gl_Position;
} gl_out[];
void main() {
if (gl_InvocationID == 0) {
gl_TessLevelOuter[0] = OuterTessFactor;
gl_TessLevelOuter[1] = OuterTessFactor;
gl_TessLevelOuter[2] = OuterTessFactor;
gl_TessLevelOuter[3] = OuterTessFactor;
gl_TessLevelInner[0] = InnerTessFactor;
gl_TessLevelInner[1] = InnerTessFactor;
}
gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position;
}
Evaluation shader:
#version 430
#extension GL_ARB_tessellation_shader : enable
layout (quads, equal_spacing, ccw) in;
uniform mat4 ProjView;
uniform sampler2D PerlinNoise;
out vec3 PosW;
out vec3 Normal;
out vec4 ColorFrag;
out gl_PerVertex {
vec4 gl_Position;
};
void main() {
vec4 pos = gl_in[0].gl_Position;
pos.xz += gl_TessCoord.xy;
pos.y = texture2D(PerlinNoise, pos.xz / vec2(8, 8)).x * 10.0f - 10.0f;
Normal = vec3(0, 1, 0);
gl_Position = ProjView * pos;
PosW = pos.xyz;
ColorFrag = vec4(pos.x / 64.0f, 0.0f, pos.z / 64.0f, 1.0f);
}
Fragment shader:
#version 430 core
in vec3 PosW;
in vec3 Normal;
in vec4 ColorFrag;
in vec4 PosH;
out vec3 FragColor;
out vec3 FragNormal;
void main() {
FragNormal = Normal;
FragColor = ColorFrag.xyz;
}
I have tried to hardcode the different tessellation levels but that did not help. I recently started out with OpenGL so please let me know if i am doing something stupid.
So does anyone have any idea what could be causing this "flickering" of certain patches?
Update: I had a friend run the project and he got the same pattern of flickering tessellation but the failing patches were not drawn at all except when being overly tessellated. He has the same graphics card as I do (AMD HD6950).
You should use triangle/quad tessellation, in which each patch has 3 or 4 vertices. As I can see, you use quads (I use them too). In that case, you can set it like this:
glPatchParameteri(GL_PATCH_VERTICES,4);
glBindVertexArray(VertexArray);
(TIP: use drawelements for your terrain, much better performance for 2D-displacement based mesh.)
In the control shader, use
layout (vertices = 4) out;
since your patch has 4 control points. The ordering is still important (CCW/CW).
Personally I don't like to use built-in variables, so for the vertex shader you can send your vertex data to the tesscontrol like this:
layout (location = 0) out vec3 outPos;
....
outPos.xz = grid.xy;
outPos.y = noise(outPos.xz);
Tess control:
layout (location = 0) in vec3 inPos[]; //outPos (location = 0) from vertex shader
//'collects' the 4 control points to an array in the order they're sended
layout (location = 0) out vec3 outPos[]; //send the c.points to the ev. shader
...
gl_TessLevelOuter[0] = outt[0];
gl_TessLevelOuter[1] = outt[1];
gl_TessLevelOuter[2] = outt[2];
gl_TessLevelOuter[3] = outt[3];
gl_TessLevelInner[0] = inn[0];
gl_TessLevelInner[1] = inn[1];
outPos[ID] = inPos[ID];//gl_invocationID = ID
Note that both in and out vertex data is an array.
The tessev is simple:
layout (location = 0) in vec3 inPos[]; //the 4 control points
layout (location = 0) out vec3 outPos; //this is no longer array, next is the fragment shader
...
//edit: do not forgot to add the next line
layout (quads) in;
vec3 interpolate3D(vec3 v0, vec3 v1, vec3 v2, vec3 v3) //linear interpolation for x,y,z coords on the quad
{
return mix(mix(v0,v1,gl_TessCoord.x),mix(v3,v2,gl_TessCoord.x),gl_TessCoord.y);
};
...main{...
outPos = interpolate3D(inPos[0],inPos[1],inPos[2],inPos[3]); //the four control points of the quad. Every other point is linearly interpolated between them according to the TessCoord.
gl_Position = mvp * vec4(outPos,1.0f);
A good representation of the quad domain: http://ogldev.atspace.co.uk/www/tutorial30/tutorial30.html.
I think the problem is with your one-vertex patch. I cannot imagine how a one vertex path can be divided into triangles, I don't know how it works on another hardware. The tessellation is for divide primitives into other simple primitives, to triangles in case of OGL, since it can be handled by a GPU easily (3 points always lie in a plane). So, the minimum number of patch vertices should be 3, for a triangle. I like quads, because it simplier to index, and the memory cost is less. It will be divided into triangles too during tessellation. http://www.informit.com/articles/article.aspx?p=2120983
Also, there is another type, the isoline tessellation. (check out the links, the second is pretty good.)
All in all, try it with quads or triangles, and set the control vertices to 4 (or 3). My (pretty complex) terrain shader is here with frustum culling, tessellation shader culling for a geoclipmap based terrain. Also, without tessellation it works with vertex morph in vertex shader. Maybe some part of this code will be useful. http://speedy.sh/TAvPR/gshader.txt
A scene with tessellation at about 4 pixels/triangle runs at 75 FPS (with fraps) with runtime normal calculation and bicubic smoothing and other things. I'm using AMD HD 5750. It still could be much faster with better code and pre-baked normals:D. (runs at max 120 w/o normal calc.)
Oh, and you can only send the x and z coords if you displace the vertex in the shader. It will be faster too.
Lots of vertices.