GLSL stops rendering - c++

I want to write a signed distance interpretation. For that I am creating a voxelgrid 100*100*100 for example (the size will increase if it is working).
Now my plans are to load a point cloud into a 1d texture:
glEnable(GL_TEXTURE_1D);
glGenTextures(1, &_texture);
glBindTexture(GL_TEXTURE_1D, _texture);
glTexParameteri(GL_TEXTURE_1D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_1D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexImage1D(GL_TEXTURE_1D, 0, GL_RGBA, pc->pc.size(), 0, GL_RGBA, GL_FLOAT, &pc->pc.front());
glBindTexture(GL_TEXTURE_1D, 0);
'pc' is just a class which holds a vector of structure Point, which has only floats x,y,z,w.
Than I want to render the hole 100x100x100 grid, so each voxel and iterate trough all points of that texture, calculate the distance to my current voxel and store that distance in a new texture (1000x1000). For the moment this texture I am creating holds only color valuables which stores the distance in the red and green component and blue is set to 1.0.
So I can see the result on screen.
My problem is now, that when I have about 500 000 points in my point cloud, It seems to stop rendering after a few voxels(less than 50 000). My guess is that if it takes to long, it stops and just trow out the buffer that it has.
I don't know if that can be the case but if it is, is there something I can do against it, or maybe something I can do to make this procedure better/faster.
My second guess is, that there is something I don't consider with the 1D Texture. But is there a better way to pass in a high amount of data? Because I will surely need a few hundred thousand points data.
I don't know if it helps if I show the full fragment shader, so I will only show some parts, which I think is important for that problem:
Distance calculation and iteration through all points:
for(int i = 0; i < points; ++i){
vec4 texInfo = texture(textureImg, getTextCoord(i));
vec4 pos = position;
pos.z /= rows*rows;
vec4 distVector = texInfo-pos;
float dist = sqrt(distVector.x*distVector.x + distVector.y*distVector.y + distVector.z*distVector.z);
if(dist < minDist){
minDist = dist;
}
}
Function getTexCoord:
float getTextCoord(float a)
{
return (a * 2.0f + 1.0f) / (2.0f * points);
}
*Edit:
vec4 newPos = vec4(makeCoord(position.x+Col())-1,
makeCoord(position.y+Row())-1,
0,
1.0);
float makeCoord(float a){
return (a/rows)*2;
}
int Col(){
float a = mod(position.z,rows);
return int(a);
}
int Row()
{
float a = position.z/rows;
return int(a);
}

You absolutely shouldn`t be looping through all of your points in a fragment shader, as it gets calculated N times per frame (where N equals the number of pixels), which effectively gives you O(N2) computational complexity.
All textures have limits on how much data they can hold per dimension. Two most important values here are GL_MAX_TEXTURE_SIZE and GL_MAX_3D_TEXTURE_SIZE. As stated in official docs,
Texture sizes have a limit based on the GL implementation. For 1D and 2D textures (and any texture types that use similar dimensionality, like cubemaps) the max size of either dimension is GL_MAX_TEXTURE_SIZE. For array textures, the maximum array length is GL_MAX_ARRAY_TEXTURE_LAYERS. For 3D textures, no dimension can be greater than GL_MAX_3D_TEXTURE_SIZE in size.
Within these limits, the size of a texture can be any value. It is advised however, that you stick to powers-of-two for texture sizes, unless you have a significant need to use arbitrary sizes.
The most typical values are listed here and here.
If you really have to use large data amounts inside your frag shader, consider a 2D or 3D texture with known power-of-2 dimensions and GL_NEAREST / GL_REPEAT coordinates. This will enable you to compute 2D texture coords just by multiplying the source offset by a precomputed 1/width value (Y coord; the remainder is by definition smaller than 1 texel and can be safely ignored in the presence of GL_NEAREST) and using it as-is for X coord (GL_REPEAT guarantees that only the remainder gets used). Personally I implemented this approach when I needed to pass 128 MB of data to a GLSL 1.20 shader.
If you are targeting a recent enough OpenGL (≥ 3.0), you also can use buffer textures.
And the last, but not the least. You cannot pass integer-precision values greater than 224 through standard IEEE floats.

Related

OpenGL why does repeating texture over same amount of fragments may cost performance?

I have a quad of unitary size rendered with seamless texture and with texture repetition enabled.
For the texture coordinates I get it from the position.
So I can artificially raise the texture resolution by multiplying the texture coordinates by a factor of N, which will make the texture repeat over the quad N*N times.
In my code my texture is a normal map and there is specular light calculation with it in fragment shader. And I noticed more I multiply the texture coordinates more my performance drops, however it seems the performance drop is constant after a certain value (no difference between N = 1024 and N = 10240 for example)
What I do not understand is that the quad size stays the same, the texture size is the same, why does multiplying texture coordinates cost me performance over the same amount of fragments ?
No mipmapping, I use GL_LINEAR for filtering for both min & mag filters.
When scale increases, adjacent pixels in your fragment shader correspond to texels in your texture that are far apart. With GL_LINEAR, this means that the texels are not only far apart in the texture, but they are also far apart in memory.
With scale closer to 1:1, adjacent pixels in your fragment shader will be taken from texels that are also close together. This means they will be close together in memory, which means better memory locality. This requires fewer fetches from memory.
Mipmapped textures do not have this problem, and they often look better too because they don't have the aliasing problems you see with GL_LINEAR minification.
Simulating it on the CPU
The CPU has the same problem with memory fetches.
float sum(float *arr, int size, int stride, int count) {
int pos = 0;
float sum = 0.0f;
for (int i = 0; i < count; i++) {
pos = (pos + stride) % size;
sum += arr[pos];
}
return sum;
}
As stride increases, the performance gets worse. This happens for the same reason that your fragment shader performance gets worse, even though it's happening on the CPU.

OpenGL compute shader normal map generation poor performance

I have an height cube map and I want to generate a normal cube map texture from it. My height cube map is just a 2048x2048 image that I load at the beginning of the application for each face of the cube, and I can modify in real time a "maximum height" value which is used as a multiplicator when retrieving a pixel in the height map.
Initially I was calculating the normals in the vertex shader, but it gave me bad lighting results so I decided to move the calculations in the fragment shader.
As the height map does not change every frame (only when I modify the "maximum height" value), I want to generate a normal map texture from it, using a compute shader because I don't need any rasterization, but it gives me very poor performances.
With the fragment shader I ran at 200FPS but using the compute shader I run at 40 FPS.
Here is how I bind my images and start the compute work:
_computeShaderProgram.use();
glUniform1f(_computeShaderProgram.getUniformLocation("maxHeight"), maxHeight);
glBindImageTexture(
0,
static_cast<GLuint>(heightMap),
0,
GL_TRUE,
0,
GL_READ_ONLY,
GL_RGBA32F
);
glBindImageTexture(
1,
static_cast<GLuint>(normalMap),
0,
GL_TRUE,
0,
GL_WRITE_ONLY,
GL_RGBA32F
);
// Start compute work
// I only compute for one face of the cube map
glDispatchCompute(normalMap.getWidth() / 16, normalMap.getWidth() / 16, 1);
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
And the compute shader:
#version 430 core
#extension GL_ARB_compute_shader : enable
layout(local_size_x = 16, local_size_y = 16, local_size_z = 1) in;
layout(rgba32f, binding = 0) readonly uniform imageCube heightMap;
layout(rgba32f, binding = 1) writeonly uniform imageCube normalMap;
uniform float maxHeight;
float getHeight(ivec3 heightMapCoord) {
vec4 heightMapValue = imageLoad(heightMap, heightMapCoord);
return heightMapValue.r * maxHeight;
}
void main() {
ivec3 textCoord = ivec3(gl_GlobalInvocationID);
// Calculate height of neighbors
float leftCubePosHeight = getHeight(textCoord + ivec3(-1, 0, 0));
float rightCubePosHeight = getHeight(textCoord + ivec3(1, 0, 0));
float topCubePosHeight = getHeight(textCoord + ivec3(0, -1, 0));
float bottomCubePosHeight = getHeight(textCoord + ivec3(0, 1, 0));
// Calculate normal using central differences method
vec3 horizontal = vec3(2.0, rightCubePosHeight - leftCubePosHeight, 0.0);
vec3 vertical = vec3(0.0, bottomCubePosHeight - topCubePosHeight, 2.0);
vec3 normal = normalize(cross(vertical, horizontal));
imageStore(normalMap, textCoord, vec4(normal, 1.0));
}
I tried with different work groups sizes (width, width / 8, width / 16, width / 32) and local sizes (1, 8, 16, 32) but the performance is always poor, around 40 FPS or 20 FPS for work group with a size of the full width.
I know I can use shared memory for threads in the same work group to prevent fetching the same texture coordinate 4 times but later I will have height map generated procedurally and will be larger than 2048x2048 I think.
What is the difference between the fragment shader and the compute shader that make it so slow ? Am I doing something wrong ?
Is there any other solutions to generate this normal map ?
EDIT:
The fps I gave above are not right because I was generating 1/16 of the normal map (when I had 40FPS), and I also used the central differences technique to calculate the normals, which is cheap but does not give good lighting results, so I switched to Sobel technique, which is a little more expensive.
I made some tests to know which technique could give the best performance.
Each frame I generate the normal map (this will not be the case later, but it's just to test the performance). Here are my tests:
CPU side single thread: 1.5FPS
Compute shader with local sizes of 1 and one worker group for each image pixel: 4FPS
Compute shader with local sizes of 16 and one worker group for each 16x16 image pixels block: 11FPS
Fragment shader using framebuffer and MRT with 6 color attachments (one for each face of the normal map): 12.5FPS
This is a little laggy when I modify the max height (which generate the normal map again), but I think it's okay as I won't modify it a lot.

Deferred Rendering - Is it valid to reconstruct position for a point light using a light volume?

Position reconstruction
I want to verify that this is a valid method and I'm not overlooking something.
I am using a spherical mesh which I am using to only render the portion of the screen that the light overlaps. I rendering only the back-faces if the depth is greater or equal the depth buffer as suggested here.
To reconstruct the camera space position of a fragment I am taking the vector from the camera space fragment on the light volume, normalizing it, and scaling it by the linear depth from my gbuffer (which is stored as a 32 bit float). This is sort of a hybrid of the methods discussed here (using linear depth) and here (spherical light volumes).
Banding
The reason I ask is because the results I get from deferred vs forward for light attenuation are different.
Deferred
Forward
Attenuation is linked to my camera space position as I calculate attenuation as follows:
vec3 light_dir_to = curr_light.camera_space_position - surface_pos_cam;
float light_dist_sq = dot(light_dir_to, light_dir_to);
float light_attenuation_factor = 1.0f - ((1.0f / (curr_light.radius * curr_light.radius)) * light_dist_sq);
light_attenuation_factor = clamp(light_attenuation_factor, 0.0f, 1.0f);
light_attenuation_factor = pow(light_attenuation_factor, curr_light.falloff);
The difference isn't super noticeable in these instances, but the instance I try to scale the light (ex. raise it to a power to make it fade out faster), the effects become immediately apparent.
light_atten = pow(light_atten, 2.0f)
My problem may lie elsewhere, but I want to verify that my position reconstruction method isn't flawed in some way I'm overlooking.
EDIT
Posting my gbuffer setup as requested.
enum render_targets { e_dist_32f = 0, e_diffuse_rgb8, e_norm_xyz10, e_spec_intens_b8_spec_pow_a8, e_light_rgb8, num_rt };
//...
GLint internal_formats[num_rt] = { GL_R32F, GL_RGBA8, GL_RGB10_A2, GL_RGBA8, GL_RGBA8 };
GLint formats[num_rt] = { GL_RED, GL_RGBA, GL_RGBA, GL_RGBA, GL_RGBA };
GLint types[num_rt] = { GL_FLOAT, GL_FLOAT, GL_FLOAT, GL_FLOAT, GL_FLOAT };
for(uint i = 0; i < num_rt; ++i)
{
glBindTexture(GL_TEXTURE_2D, _render_targets[i]);
glTexImage2D(GL_TEXTURE_2D, 0, internal_formats[i], _width, _height, 0, formats[i], types[i], nullptr);
}
// Separate non-linear depth buffer used for depth testing
glBindTexture(GL_TEXTURE_2D, _depth_tex_id);
glTexImage2D(GL_TEXTURE_2D, 0, GL_DEPTH_COMPONENT32, _width, _height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, nullptr);
NOTE: This issue occurs on planar surfaces that have one normal for the whole surface, thus this cannot be a loss of precision with normals.
FINAL EDIT - SOLUTION
It appears as though this method is in-fact valid (as mentioned by GuyRT). The banding issue appears to be coming from how I am doing gamma correction.
For my forward renderer I only have one loop over 8 lights (I don't do multiple passes, 1 pass only), and I apply gamma correction right after the lighting calculations.
For my deferred renderer I do all lighting calculations, post-processing, etc., then convert to gamma. The issue here is that I:
Do my lighting calculations in linear RGB space
Store it in a texture in RGB space (with only 8 bits of precision)
When lighting is done, gamma correct the value and copy it to the back buffer.
For example, let's say the lighting calculations for two fragments have the final values 1/255 (~0.003) and 2/255 (~0.007) in sRGB space (as presented in the end). These values in RGB space are (1/255)^2.2 = ~0.000006 and (2/255)^2.2 = ~0.00002. When these values are stored to my lighting accumulating texture, they are both stored as the same value, 0. This is the cause of the banding.
Converting my lighting accumulation texture to GL_R11F_G11F_B10F has yielded results that are very close to my forward renderer. The answers for these two questions helped me once I found that gamma was the issue: sRGB textures. Is this correct? and When to call glEnable(GL_FRAMEBUFFER_SRGB)?.
The final result with a "falloff" of 4.0
EXTRA RESOURCE
I just found out this effect is called "Gamma Banding", which makes sense. This website has some useful charts and this video has a nice numerical walkthrough.
With a bit of tweaking, I think your method is valid and feasible.
This looks very much like the same artefact discussed here. It is caused by a loss of precision in your g-buffer normals. The solution in that case was to use the GL_RGB10_A2 format to store normals.
If you're interested, there is quite a thorough a discussion of alternative representations for g-buffer normals here: http://aras-p.info/texts/CompactNormalStorage.html, although is is a bit old, so ALU/bandwidth trade-offs might be different today. Also, I think he makes a (quite common) mistake in his discussion of view-space normals, the z-component of which can be negative.

OpenGL Texture sampling different depending on camera position

I am rendering a point based terrain from loaded heightmap data - but the points change their texturing depending on where the camera position is. To demonstrate the bug (and the fact that this isnt occuring from a z-buffering problem) I have taken screenshots with the points rendered at a fixed 5 pixel size from very slightly different camera positions (same angle), shown bellow:
PS: The images are large enough if you drag them into a new tab, didn't realise stack would scale them down this much.
State 1:
State 2:
The code to generate points is relatively simple so I'm posting this merely to rule out the option - mapArray is a single dimensional float array and copied to a VBO:
for(j = 0; j < mHeight; j++)
{
for(i = 0; i < mWidth; i++)
{
height = bitmapImage[k];
mapArray[k++] = 5 * i;
mapArray[k++] = height;
mapArray[k++] = 5 * j;
}
}
I find it more likely that I need to adjust my fragment shader because I'm not great with shaders- although I'm unsure where I could have gone wrong with such simple code and guess it's probably just not fit for purpose (with point based rendering). Bellow is my frag shader:
in varying vec2 TexCoordA;
uniform sampler2D myTextureSampler;
void main(){
gl_FragColor = texture2D(myTextureSampler, TexCoordA.st) * gl_Color;
}
Edit (requested info):
OpenGL version 4.4 no texture flags used.
TexCoordA is passed into the shader directly from my Vertex shader with no alterations at all. Self calculated UV's using this:
float* UVs = new float[mNumberPoints * 2];
k = 0;
for(j = 0; j < mHeight; j++)
{
for(i = 0; i < mWidth; i++)
{
UVs[k++] = (1.0f/(float)mWidth) * i;
UVs[k++] = (1.0f/(float)mHeight) * j;
}
}
This looks just like a subpixel accurate texture mapping side-effect. The problem with texture mapping implementation is that it needs to interpolate the texture coordinates on the actual rasterized pixels (fragments). When your camera is moving, the roundoff error from real position to the integer pixel position affects texture mapping, and is normally required for jitter-free animation (otherwise all the textures would jump by seemingly random subpixel amounts as the camera moves. There was a great tutorial on this topic by Paul Nettle.
You can try to fix this by not sampling texel corners but trying to sample texel centers (add half size of the texel to your point texture coordinates).
Another thing you can try is to compensate for the subpixel accurate rendering by calculating the difference between the rasterized integer coordinate (which you need to calculate yourself in a shader) and the real position. That could be enough to make the sampled texels more stable.
Finally, size matters. If your texture is large, the errors in the interpolation of the finite-precision texture coordinates can introduce these kinds of artifacts. Why not use GL_TEXTURE_2D_ARRAY with a separate layer for each color tile? You could also clamp the S and T texcoords to edge of the texture to avoid this more elegantly.
Just a guess: How are your point rendering parameters set? Perhaps the distance attenuation (GL_POINT_DISTANCE_ATTENUATION ) along with GL_POINT_SIZE_MIN and GL_POINT_SIZE_MAX are causing different fragment sizes depending on camera position. On the other hand I think I remember that when using a vertex shader these functionality is disabled and the vertex shader must decide about the size. I did it once by using
//point size calculation based on z-value as done by distance attenuation
float psFactor = sqrt( 1.0 / (pointParam[0] + pointParam[1] * camDist + pointParam[2] * camDist * camDist) );
gl_PointSize = pointParam[3] * psFactor;
where pointParam holds the three coefficients and the min point size:
uniform vec4 pointParam; // parameters for point size calculation [a b c min]
You may play around by setting your point size in the vertex shader directly with gl_PointSize = [value].

Omnidirectional shadow mapping with depth cubemap

I'm working with omnidirectional point lights. I already implemented shadow mapping using a cubemap texture as color attachement of 6 framebuffers, and encoding the light-to-fragment distance in each pixel of it.
Now I would like, if this is possible, to change my implementation this way:
1) attach a depth cubemap texture to the depth buffer of my framebuffers, instead of colors.
2) render depth only, do not write color in this pass
3) in the main pass, read the depth from the cubemap texture, convert it to a distance, and check whether the current fragment is occluded by the light or not.
My problem comes when converting back a depth value from the cubemap into a distance. I use the light-to-fragment vector (in world space) to fetch my depth value in the cubemap. At this point, I don't know which of the six faces is being used, nor what 2D texture coordinates match the depth value I'm reading. Then how can I convert that depth value to a distance?
Here are snippets of my code to illustrate:
Depth texture:
glGenTextures(1, &TextureHandle);
glBindTexture(GL_TEXTURE_CUBE_MAP, TextureHandle);
for (int i = 0; i < 6; ++i)
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X + i, 0, GL_DEPTH_COMPONENT,
Width, Height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_CUBE_MAP, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
Framebuffers construction:
for (int i = 0; i < 6; ++i)
{
glGenFramebuffers(1, &FBO->FrameBufferID);
glBindFramebuffer(GL_FRAMEBUFFER, FBO->FrameBufferID);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT,
GL_TEXTURE_CUBE_MAP_POSITIVE_X + i, TextureHandle, 0);
glDrawBuffer(GL_NONE);
}
The piece of fragment shader I'm trying to write to achieve my code:
float ComputeShadowFactor(samplerCubeShadow ShadowCubeMap, vec3 VertToLightWS)
{
float ShadowVec = texture(ShadowCubeMap, vec4(VertToLightWS, 1.0));
ShadowVec = DepthValueToDistance(ShadowVec);
if (ShadowVec * ShadowVec > dot(VertToLightWS, VertToLightWS))
return 1.0;
return 0.0;
}
The DepthValueToDistance function being my actual problem.
So, the solution was to convert the light-to-fragment vector to a depth value, instead of converting the depth read from the cubemap into a distance.
Here is the modified shader code:
float VectorToDepthValue(vec3 Vec)
{
vec3 AbsVec = abs(Vec);
float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
const float f = 2048.0;
const float n = 1.0;
float NormZComp = (f+n) / (f-n) - (2*f*n)/(f-n)/LocalZcomp;
return (NormZComp + 1.0) * 0.5;
}
float ComputeShadowFactor(samplerCubeShadow ShadowCubeMap, vec3 VertToLightWS)
{
float ShadowVec = texture(ShadowCubeMap, vec4(VertToLightWS, 1.0));
if (ShadowVec + 0.0001 > VectorToDepthValue(VertToLightWS))
return 1.0;
return 0.0;
}
Explaination on VectorToDepthValue(vec3 Vec) :
LocalZComp corresponds to what would be the Z-component of the given Vec into the matching frustum of the cubemap. It's actually the largest component of Vec (for instance if Vec.y is the biggest component, we will look either on the Y+ or the Y- face of the cubemap).
If you look at this wikipedia article, you will understand the math just after (I kept it in a formal form for understanding), which simply convert the LocalZComp into a normalized Z value (between in [-1..1]) and then map it into [0..1] which is the actual range for depth buffer values. (assuming you didn't change it). n and f are the near and far values of the frustums used to generate the cubemap.
ComputeShadowFactor then just compare the depth value from the cubemap with the depth value computed from the fragment-to-light vector (named VertToLightWS here), also add a small depth bias (which was missing in the question), and returns 1 if the fragment is not occluded by the light.
I would like to add more details regarding the derivation.
Let V be the light-to-fragment direction vector.
As Benlitz already said, the Z value in the respective cube side frustum/"eye space" can be calculated by taking the max of the absolute values of V's components.
Z = max(abs(V.x),abs(V.y),abs(V.z))
Then, to be precise, we should negate Z because in OpenGL, the negative Z-axis points into the screen/view frustum.
Now we want to get the depth buffer "compatible" value of that -Z.
Looking at the OpenGL perspective matrix...
http://www.songho.ca/opengl/files/gl_projectionmatrix_eq16.png
http://i.stack.imgur.com/mN7ke.png (backup link)
...we see that, for any homogeneous vector multiplied with that matrix, the resulting z value is completely independent of the vector's x and y components.
So we can simply multiply this matrix with the homogeneous vector (0,0,-Z,1) and we get the vector (components):
x = 0
y = 0
z = (-Z * -(f+n) / (f-n)) + (-2*f*n / (f-n))
w = Z
Then we need to do the perspective divide, so we divide z by w (Z) which gives us:
z' = (f+n) / (f-n) - 2*f*n / (Z* (f-n))
This z' is in OpenGL's normalized device coordinate (NDC) range [-1,1] and needs to be transformed into a depth buffer compatible range of [0,1]:
z_depth_buffer_compatible = (z' + 1.0) * 0.5
Further notes:
It might make sense to upload the results of (f+n), (f-n) and (f*n) as shader uniforms to save computation.
V needs to be in world space since the shadow cube map is normally axis aligned in world space thus the "max(abs(V.x),abs(V.y),abs(V.z))"-part only works if V is a world space direction vector.