Is it faster to use texelFetch when rendering fonts? - opengl

I am writing some font drawing shaders in OpenGL 3.3. I will render my font into a texture atlas and then generate some display lists for some text I want to draw. I would like the rendering of text to consume the least amount of resources (CPU, GPU memory, GPU time). How can I accomplish this?
Looking at Freetype-gl, I noticed that the author generates 6 indices and 4 vertices per character.
Since I am using OpenGL 3.3, I have some additional freedom. My plan was to generate 1 vertex per character plus one integer "code" per character. The character code can be used in texelFetch operations to retrieve texture coördinates and character size information. A geometry shader turns the size information and vertex into a triangle strip.
Is texelFetch going to be slower than sending more vertices/texture coördinates? Is this worth doing?, or is there are reason why it's not done in the font libraries I looked at?
Final code:
Vertex shader:
#version 330
uniform sampler2D font_atlas;
uniform sampler1D code_to_texture;
uniform mat4 projection;
uniform vec2 vertex_offset; // in view space.
uniform vec4 color;
uniform float gamma;
in vec2 vertex; // vertex in view space of each character adjusted for kerning, etc.
in int code;
out vec4 v_uv;
void main()
{
v_uv = texelFetch(
code_to_texture,
code,
0);
gl_Position = projection * vec4(vertex_offset + vertex, 0.0, 1.0);
}
Geometry shader:
#version 330
layout (points) in;
layout (triangle_strip, max_vertices = 4) out;
uniform sampler2D font_atlas;
uniform mat4 projection;
in vec4 v_uv[];
out vec2 g_uv;
void main()
{
vec4 pos = gl_in[0].gl_Position;
vec4 uv = v_uv[0];
vec2 size = vec2(textureSize(font_atlas, 0)) * (uv.zw - uv.xy);
vec2 pos_opposite = pos.xy + (mat2(projection) * size);
gl_Position = vec4(pos.xy, 0, 1);
g_uv = uv.xy;
EmitVertex();
gl_Position = vec4(pos.x, pos_opposite.y, 0, 1);
g_uv = uv.xw;
EmitVertex();
gl_Position = vec4(pos_opposite.x, pos.y, 0, 1);
g_uv = uv.zy;
EmitVertex();
gl_Position = vec4(pos_opposite.xy, 0, 1);
g_uv = uv.zw;
EmitVertex();
EndPrimitive();
}
Fragment shader:
#version 330
uniform sampler2D font_atlas;
uniform vec4 color;
uniform float gamma;
in vec2 g_uv;
layout (location = 0) out vec4 fragment_color;
void main()
{
float a = texture(font_atlas, g_uv).r;
fragment_color.rgb = color.rgb;
fragment_color.a = color.a * pow(a, 1.0 / gamma);
}

I wouldn't expect there to be a significant performance difference between your proposed method vs storing the quad vertex positions and texture coordinates in a vertex buffer. On the one hand your method requires a smaller vertex buffer and less work for the CPU. On the other hand the texelFetch calls will be more-or-less at random locations, and not make the best use of the cache. This last point may not be very significant as I guess that texture wont be very large. Also, the execution model of geometry shaders mean they can quickly become the bottleneck of the pipeline.
To answer "is this worth doing?" - I suspect not for performance reasons. Unfortunately you can't tell until you implement it and measure the performance. I think it's quite a cool idea though, so I don't think you'd be wasting your time trying it out.

Maybe you can use Atomic Counter to handle current position in text.
Here is an interresting paper on memory bandwidth
GPU perf...
You can cache the result in a fbo.
For realy fast rendering as you said, you may build a geom shader taking points as input and outputing quads and sample a texture to get additional on glyph info.
This appear effectively the best solution...

Related

How to change gl_PointSize in the vertex shader?

I'm optimizing my particle renderer to work with GL_POINTS and now I need to adjust the size of the points using gl_PointSize in the vertex shader to scale the particles the right amount from the vertex shader.
This is the vertex shader I have now:
#version 330 core
layout (location = 0) in vec3 position;
layout (location = 1) in uint uv;
uniform mat4 projection;
uniform mat4 view;
void main(){
gl_PointSize = 10; // No difference with gl_PointSize = 1000
gl_Position = projection * view * vec4(position, 1.0);
}
Changing the gl_PointSize doesn't seem to make a difference when changed in the vertex shader.
You have to enable GL_PROGRAM_POINT_SIZE (see glEnable and gl_PointSize):
glEnable(GL_PROGRAM_POINT_SIZE);
OpenGL Reference - gl_PointSize
Description:
In the vertex, tessellation evaluation and geometry languages, a single global instance of the gl_PerVertex named block is available and its gl_PointSize member is an output that receives the intended size of the point to be rasterized, in pixels. It may be written at any time during shader execution. If GL_PROGRAM_POINT_SIZE is enabled, gl_PointSize is used to determine the size of rasterized points, otherwise it is ignored by the rasterization stage.

OpenGL - uniform presence causing shader to be bypassed

UPDATE: So it turns out this was due to a bug in the C side of things, causing some of the matrix to become malformed. The shaders are all fine. So if adding uniforms causes weird things to happen, my advice would be to use a debugger to check the value of ALL uniforms and make sure that they are all being set correctly.
So I am trying to render depth to a cube map to use as a shadow map, but when I add and use a uniform in the fragment shader everything becomes white as if the shader isn't being used. No warnings or errors are generated when compiling/linking the shader.
The shader program I am using to render the depth map (setting the depth simply to the fragment z position as a test) is as follows:
//vertex shader
#version 430
in layout(location=0) vec4 vertexPositionModel;
uniform mat4 modelToWorldMatrix;
void main() {
gl_Position = modelToWorldMatrix * vertexPositionModel;
}
//geometry shader
#version 430
layout (triangles) in;
layout (triangle_strip, max_vertices=18) out;
out vec4 fragPositionWorld;
uniform mat4 projectionMatrices[6];
void main() {
for (int face = 0; face < 6; face++) {
gl_Layer = face;
for (int i = 0; i < 3; i++) {
fragPositionWorld = gl_in[i].gl_Position;
gl_Position = projectionMatrices[face] * fragPositionWorld;
EmitVertex();
}
EndPrimitive();
}
}
//Fragment shader
#version 430
in vec4 fragPositionWorld;
void main() {
gl_FragDepth = abs(fragPositionWorld.z);
}
The main shader samples from the cubemap and simply renders the depth as greyscale colour:
vec3 lightDirection = fragPositionWorld - pointLight.position;
float closestDepth = texture(shadowMap, lightDirection).r;
finalColour = vec4(vec3(closestDepth), 1.0);
The scene is a small cube in a larger cubic room, and renders as expected, dark near z = 0 and the cube projected back onto the wall (The depth map is being rendered from the centre of the room):
Good:
[2
I can move the small cube around and the projection projects correctly onto all the sides of the cubemap. All good so far.
The problem is when I add a uniform to the fragment shader, i.e:
#version 430
in vec4 fragPositionWorld;
uniform vec3 lightPos;
void main() {
gl_FragDepth = min(lightPos.y, 0.5);
}
Everything renders as white, same as if the render failed to compile:
Bad:
gDEBugger reports that the uniform is set correctly (0,4,0) but regardless of what that lightPos is, gl_FragDepth should be set to a value less than 0.5 and appear a shade of grey (which is what happens if I set gl_FragDepth = 0.5 directly), so I can only conclude that the fragment shader is not being used for some reason and the default one is being use instead. Unfortunately I have no idea why.

DirectX11 / OpenGL only renders half of the texture

This is how it should look like. It uses the same vertices/uv coordinates which are used for DX11 and OpenGL. This scene was rendered in DirectX10.
This is how it looks like in DirectX11 and OpenGL.
I don't know how this can happen. I am using for both DX10 and DX11 the same code on top and also they both handle things really similiar. Do you have an Idea what the problem may be and how to fix it?
I can send code if needed.
also using another texture.
changed the transparent part of the texture to red.
Fragment Shader GLSL
#version 330 core
in vec2 UV;
in vec3 Color;
uniform sampler2D Diffuse;
void main()
{
//color = texture2D( Diffuse, UV ).rgb;
gl_FragColor = texture2D( Diffuse, UV );
//gl_FragColor = vec4(Color,1);
}
Vertex Shader GLSL
#version 330 core
layout(location = 0) in vec3 vertexPosition;
layout(location = 1) in vec2 vertexUV;
layout(location = 2) in vec3 vertexColor;
layout(location = 3) in vec3 vertexNormal;
uniform mat4 Projection;
uniform mat4 View;
uniform mat4 World;
out vec2 UV;
out vec3 Color;
void main()
{
mat4 MVP = Projection * View * World;
gl_Position = MVP * vec4(vertexPosition,1);
UV = vertexUV;
Color = vertexColor;
}
Quickly said, it looks like you are using back face culling (which is good), and the other side of your model is wrongly winded. You can ensure that this is the problem by turning back face culling off (OpenGL: glDisable(GL_CULL_FACE​)).
The real correction is (if this was the problem) to have correct winding of faces, usually it is counter-clockwise. This depends where you get this model. If you generate it on your own, correct winding in your model generation routine. Usually, model files created by 3D modeling software have correct face winding.
This is just a guess, but are you telling the system the correct number of polygons to draw? Calls like glBufferData() take the size in bytes of the data, not the number of vertices or polygons. (Maybe they should have named the parameter numBytes instead of size?) Also, the size has to contain the size of all the data. If you have color, normals, texture coordinates and vertices all interleaved, it needs to include the size of all of that.
This is made more confusing by the fact that glDrawElements() and other stuff takes the number of vertices as their size argument. The argument is named count, but it's not obvious that it's vertex count, not polygon count.
I found the error.
The reason is that I forgot to set the Texture SamplerState to Wrap/Repeat.
It was set to clamp so the uv coordinates were maxed to 1.
A few things that you could try :
Is depth test enabled ? It seems that your inner faces of the polygons from the 'other' side are being rendered over the polygons that are closer to the view point. This could happen if depth test is disabled. Enable it just in case.
Is lighting enabled ? If so turn it off. Some flashes of white seem to be coming in the rotating image. Could be because of incorrect normals ...
HTH

GLSL/OpenGL shader tessellation flickering and failure

I just started with OpenGL tessellation and have run into a bit a trouble. I am tessellating series of patches formed by one vertex each. These vertices/patches are structured in a gridlike fashion to later form a terrain generated by Perlin Noise.
The problem I have run into is that starting from the second patch, and every 5th patch after that, sometimes have a lot of tessellation (not the way i configured) but most of the time it doesn't get tessellated at all.
Like so:
The two white circles mark the highly/over tessellated patches. Also note the pattern of untessellated patches.
The strange thing is that it works on my Surface Pro 2 (Intel HD4400 graphics) but bugs on my main desktop computer (AMD HD6950 graphics). Is it possible the hardware is bad?
The patches are generated with the code:
vec4* patches = new vec4[m_patchesWidth * m_patchesDepth];
int c = 0;
for (unsigned int z = 0; z < m_patchesDepth; ++z) {
for (unsigned int x = 0; x < m_patchesWidth; ++x) {
patches[c] = vec4(x * 1.5f, 0, z * 1.5f, 1.0f);
c++;
}
}
m_fxTerrain->Apply();
glGenBuffers(1, &m_planePatches);
glBindBuffer(GL_ARRAY_BUFFER, m_planePatches);
glBufferData(GL_ARRAY_BUFFER, m_patchesWidth * m_patchesDepth * sizeof(vec4), patches, GL_STATIC_DRAW);
GLuint loc = m_fxTerrain->GetAttrib("posIn");
glEnableVertexAttribArray(loc);
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, sizeof(vec4), nullptr);
delete(patches);
And drawn with:
glPatchParameteri(GL_PATCH_VERTICES, 1);
glBindVertexArray(patches);
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
glDrawArrays(GL_PATCHES, 0, nrOfPatches);
Vertex Shader:
#version 430 core
in vec4 posIn;
out gl_PerVertex {
vec4 gl_Position;
};
void main() {
gl_Position = posIn;
}
Control shader:
#version 430
#extension GL_ARB_tessellation_shader : enable
layout (vertices = 1) out;
uniform float OuterTessFactor;
uniform float InnerTessFactor;
out gl_PerVertex {
vec4 gl_Position;
} gl_out[];
void main() {
if (gl_InvocationID == 0) {
gl_TessLevelOuter[0] = OuterTessFactor;
gl_TessLevelOuter[1] = OuterTessFactor;
gl_TessLevelOuter[2] = OuterTessFactor;
gl_TessLevelOuter[3] = OuterTessFactor;
gl_TessLevelInner[0] = InnerTessFactor;
gl_TessLevelInner[1] = InnerTessFactor;
}
gl_out[gl_InvocationID].gl_Position = gl_in[gl_InvocationID].gl_Position;
}
Evaluation shader:
#version 430
#extension GL_ARB_tessellation_shader : enable
layout (quads, equal_spacing, ccw) in;
uniform mat4 ProjView;
uniform sampler2D PerlinNoise;
out vec3 PosW;
out vec3 Normal;
out vec4 ColorFrag;
out gl_PerVertex {
vec4 gl_Position;
};
void main() {
vec4 pos = gl_in[0].gl_Position;
pos.xz += gl_TessCoord.xy;
pos.y = texture2D(PerlinNoise, pos.xz / vec2(8, 8)).x * 10.0f - 10.0f;
Normal = vec3(0, 1, 0);
gl_Position = ProjView * pos;
PosW = pos.xyz;
ColorFrag = vec4(pos.x / 64.0f, 0.0f, pos.z / 64.0f, 1.0f);
}
Fragment shader:
#version 430 core
in vec3 PosW;
in vec3 Normal;
in vec4 ColorFrag;
in vec4 PosH;
out vec3 FragColor;
out vec3 FragNormal;
void main() {
FragNormal = Normal;
FragColor = ColorFrag.xyz;
}
I have tried to hardcode the different tessellation levels but that did not help. I recently started out with OpenGL so please let me know if i am doing something stupid.
So does anyone have any idea what could be causing this "flickering" of certain patches?
Update: I had a friend run the project and he got the same pattern of flickering tessellation but the failing patches were not drawn at all except when being overly tessellated. He has the same graphics card as I do (AMD HD6950).
You should use triangle/quad tessellation, in which each patch has 3 or 4 vertices. As I can see, you use quads (I use them too). In that case, you can set it like this:
glPatchParameteri(GL_PATCH_VERTICES,4);
glBindVertexArray(VertexArray);
(TIP: use drawelements for your terrain, much better performance for 2D-displacement based mesh.)
In the control shader, use
layout (vertices = 4) out;
since your patch has 4 control points. The ordering is still important (CCW/CW).
Personally I don't like to use built-in variables, so for the vertex shader you can send your vertex data to the tesscontrol like this:
layout (location = 0) out vec3 outPos;
....
outPos.xz = grid.xy;
outPos.y = noise(outPos.xz);
Tess control:
layout (location = 0) in vec3 inPos[]; //outPos (location = 0) from vertex shader
//'collects' the 4 control points to an array in the order they're sended
layout (location = 0) out vec3 outPos[]; //send the c.points to the ev. shader
...
gl_TessLevelOuter[0] = outt[0];
gl_TessLevelOuter[1] = outt[1];
gl_TessLevelOuter[2] = outt[2];
gl_TessLevelOuter[3] = outt[3];
gl_TessLevelInner[0] = inn[0];
gl_TessLevelInner[1] = inn[1];
outPos[ID] = inPos[ID];//gl_invocationID = ID
Note that both in and out vertex data is an array.
The tessev is simple:
layout (location = 0) in vec3 inPos[]; //the 4 control points
layout (location = 0) out vec3 outPos; //this is no longer array, next is the fragment shader
...
//edit: do not forgot to add the next line
layout (quads) in;
vec3 interpolate3D(vec3 v0, vec3 v1, vec3 v2, vec3 v3) //linear interpolation for x,y,z coords on the quad
{
return mix(mix(v0,v1,gl_TessCoord.x),mix(v3,v2,gl_TessCoord.x),gl_TessCoord.y);
};
...main{...
outPos = interpolate3D(inPos[0],inPos[1],inPos[2],inPos[3]); //the four control points of the quad. Every other point is linearly interpolated between them according to the TessCoord.
gl_Position = mvp * vec4(outPos,1.0f);
A good representation of the quad domain: http://ogldev.atspace.co.uk/www/tutorial30/tutorial30.html.
I think the problem is with your one-vertex patch. I cannot imagine how a one vertex path can be divided into triangles, I don't know how it works on another hardware. The tessellation is for divide primitives into other simple primitives, to triangles in case of OGL, since it can be handled by a GPU easily (3 points always lie in a plane). So, the minimum number of patch vertices should be 3, for a triangle. I like quads, because it simplier to index, and the memory cost is less. It will be divided into triangles too during tessellation. http://www.informit.com/articles/article.aspx?p=2120983
Also, there is another type, the isoline tessellation. (check out the links, the second is pretty good.)
All in all, try it with quads or triangles, and set the control vertices to 4 (or 3). My (pretty complex) terrain shader is here with frustum culling, tessellation shader culling for a geoclipmap based terrain. Also, without tessellation it works with vertex morph in vertex shader. Maybe some part of this code will be useful. http://speedy.sh/TAvPR/gshader.txt
A scene with tessellation at about 4 pixels/triangle runs at 75 FPS (with fraps) with runtime normal calculation and bicubic smoothing and other things. I'm using AMD HD 5750. It still could be much faster with better code and pre-baked normals:D. (runs at max 120 w/o normal calc.)
Oh, and you can only send the x and z coords if you displace the vertex in the shader. It will be faster too.
Lots of vertices.

Point Sprites for particle system

Are point sprites the best choice to build a particle system?
Are point sprites present in the newer versions of OpenGL and drivers of the latest graphics cards? Or should I do it using vbo and glsl?
Point sprites are indeed well suited for particle systems. But they don't have anything to do with VBOs and GLSL, meaning they are a completely orthogonal feature. No matter if you use point sprites or not, you always have to use VBOs for uploading the geometry, be they just points, pre-made sprites or whatever, and you always have to put this geometry through a set of shaders (in modern OpenGL of course).
That being said point sprites are very well supported in modern OpenGL, just not that automatically as with the old fixed-function approach. What is not supported are the point attenuation features that let you scale a point's size based on it's distance to the camera, you have to do this manually inside the vertex shader. In the same way you have to do the texturing of the point manually in an appropriate fragment shader, using the special input variable gl_PointCoord (that says where in the [0,1]-square of the whole point the current fragment is). For example a basic point sprite pipeline could look this way:
...
glPointSize(whatever); //specify size of points in pixels
glDrawArrays(GL_POINTS, 0, count); //draw the points
vertex shader:
uniform mat4 mvp;
layout(location = 0) in vec4 position;
void main()
{
gl_Position = mvp * position;
}
fragment shader:
uniform sampler2D tex;
layout(location = 0) out vec4 color;
void main()
{
color = texture(tex, gl_PointCoord);
}
And that's all. Of course those shaders just do the most basic drawing of textured sprites, but are a starting point for further features. For example to compute the sprite's size based on its distance to the camera (maybe in order to give it a fixed world-space size), you have to glEnable(GL_PROGRAM_POINT_SIZE) and write to the special output variable gl_PointSize in the vertex shader:
uniform mat4 modelview;
uniform mat4 projection;
uniform vec2 screenSize;
uniform float spriteSize;
layout(location = 0) in vec4 position;
void main()
{
vec4 eyePos = modelview * position;
vec4 projVoxel = projection * vec4(spriteSize,spriteSize,eyePos.z,eyePos.w);
vec2 projSize = screenSize * projVoxel.xy / projVoxel.w;
gl_PointSize = 0.25 * (projSize.x+projSize.y);
gl_Position = projection * eyePos;
}
This would make all point sprites have the same world-space size (and thus a different screen-space size in pixels).
But point sprites while still being perfectly supported in modern OpenGL have their disadvantages. One of the biggest disadvantages is their clipping behaviour. Points are clipped at their center coordinate (because clipping is done before rasterization and thus before the point gets "enlarged"). So if the center of the point is outside of the screen, the rest of it that might still reach into the viewing area is not shown, so at the worst once the point is half-way out of the screen, it will suddenly disappear. This is however only noticable (or annyoing) if the point sprites are too large. If they are very small particles that don't cover much more than a few pixels each anyway, then this won't be much of a problem and I would still regard particle systems the canonical use-case for point sprites, just don't use them for large billboards.
But if this is a problem, then modern OpenGL offers many other ways to implement point sprites, apart from the naive way of pre-building all the sprites as individual quads on the CPU. You can still render them just as a buffer full of points (and thus in the way they are likely to come out of your GPU-based particle engine). To actually generate the quad geometry then, you can use the geometry shader, which lets you generate a quad from a single point. First you do only the modelview transformation inside the vertex shader:
uniform mat4 modelview;
layout(location = 0) in vec4 position;
void main()
{
gl_Position = modelview * position;
}
Then the geometry shader does the rest of the work. It combines the point position with the 4 corners of a generic [0,1]-quad and completes the transformation into clip-space:
const vec2 corners[4] = {
vec2(0.0, 1.0), vec2(0.0, 0.0), vec2(1.0, 1.0), vec2(1.0, 0.0) };
layout(points) in;
layout(triangle_strip, max_vertices = 4) out;
uniform mat4 projection;
uniform float spriteSize;
out vec2 texCoord;
void main()
{
for(int i=0; i<4; ++i)
{
vec4 eyePos = gl_in[0].gl_Position; //start with point position
eyePos.xy += spriteSize * (corners[i] - vec2(0.5)); //add corner position
gl_Position = projection * eyePos; //complete transformation
texCoord = corners[i]; //use corner as texCoord
EmitVertex();
}
}
In the fragment shader you would then of course use the custom texCoord varying instead of gl_PointCoord for texturing, since we're no longer drawing actual points.
Or another possibility (and maybe faster, since I remember geometry shaders having a reputation for being slow) would be to use instanced rendering. This way you have an additional VBO containing the vertices of just a single generic 2D quad (i.e. the [0,1]-square) and your good old VBO containing just the point positions. What you then do is draw this single quad multiple times (instanced), while sourcing the individual instances' positions from the point VBO:
glVertexAttribPointer(0, ...points...);
glVertexAttribPointer(1, ...quad...);
glVertexAttribDivisor(0, 1); //advance only once per instance
...
glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, count); //draw #count quads
And in the vertex shader you then assemble the per-point position with the actual corner/quad-position (which is also the texture coordinate of that vertex):
uniform mat4 modelview;
uniform mat4 projection;
uniform float spriteSize;
layout(location = 0) in vec4 position;
layout(location = 1) in vec2 corner;
out vec2 texCoord;
void main()
{
vec4 eyePos = modelview * position; //transform to eye-space
eyePos.xy += spriteSize * (corner - vec2(0.5)); //add corner position
gl_Position = projection * eyePos; //complete transformation
texCoord = corner;
}
This achieves the same as the geometry shader based approach, properly-clipped point sprites with a consistent world-space size. If you actually want to mimick the screen-space pixel size of actual point sprites, you need to put some more computational effort into it. But this is left as an exercise and would be quite the oppisite of the world-to-screen transformation from the the point sprite shader.