Strange behavior of length() in GLSL - glsl

Environment:
Windows 10 version 1803
nVidia GeForce GTX 780 Ti
Latest driver 398.36 installed
Visual Studio 2015 Update 3
OpenGL 4.6
GLSL Source:
#version 460 core
in vec4 vPos;
void
main()
{
float coeff[];
int i,j;
coeff[7] = 2.38;
i=coeff.length();
coeff[9] = 4.96;
j=coeff.length();
if(i<j)
gl_Position = vPos;
}
My expectation is that i is 8 and j is 10 so gl_Position = vPos; should be executed, but shader debugging using Nsight shows me that both i and j are 10 so gl_Position = vPos; is bypassed for all vertices. What is the matter? Is it related to compiler optimization? If I want GLSL to be compiled as expected (i<j is true), how to fix the code? Thanks.

This is both an incorrect use of yours, and a compiler bug (because it doesn't break when it should).
See what the specification has to say:
It is legal to declare an array without a size (unsized) and then later redeclare the same name as an array of the same type and specify a size, or index it only with integral constant expressions (implicitly sized).
OK so far, that's what you are doing. But now...
It is a compile-time error to declare an array with a size, and then later (in the same shader) index the same array with an integral constant expression greater than or equal to the declared size.
That's also what you are doing. First you set the size to 7, then to 9. That's not allowed, and it's an error to be detected at compile time. So, the fact that this "works" at all (i.e. no compiler error) is a compiler bug.
Now why do you see a size of 10 then? Don't ask me, who knows... my best guess would be the nVidia compiler works by doing "something" in such cases, whatever it is. Something, to make it work anyway, although it's wrong.

Related

Why does the value of GL_MAX_UNIFORM_BLOCK_SIZE change?

According to khronos.org,
GL_MAX_UNIFORM_BLOCK_SIZE refers to the maximum size in basic machine units of a uniform block. The value must be at least 16384.
I have a fragment shader, where I declared a uniform interface block and attached a uniform buffer object to it.
#version 460 core
layout(std140, binding=2) uniform primitives{
vec3 foo[3430];
};
...
If I query the size of GL_MAX_UNIFORM_BLOCK_SIZEwith:
GLuint info;
glGetUniformiv(shaderProgram.getShaderProgram_id(), GL_MAX_UNIFORM_BLOCK_SIZE, reinterpret_cast<GLint *>(&info));
cout << "GL_MAX_UNIFORM_BLOCK_SIZE: " << info << endl;
I get: GL_MAX_UNIFORM_BLOCK_SIZE: 22098. It is ok, but for example: when I changes the size of the array to 3000 (instead of 3430), I get GL_MAX_UNIFORM_BLOCK_SIZE: 21956
As far as I know, GL_MAX_UNIFORM_BLOCK_SIZE should be a constant depending on my GPU. Then why does it change, when I modify the size of the array?
GL_MAX_UNIFORM_BLOCK_SIZE is properly queried with glGetIntegerv. It is a constant defined by the implementation which tells you the implementation-defined maximum. glGetUniform returns the value of a uniform in the given program. You probably got an OpenGL error of some kind, since GL_MAX_UNIFORM_BLOCK_SIZE is not a valid uniform location, and therefore your integer was never written to. So you're just reading uninitialized data.

glGetnUniformfv crashes for no reason?

I tried getting an uniform vec3 from the fragment shader to my CPU using glGetnUniformfv. According to the documentation this should perfectly work. It also works when only getting a float from the shader. But when used like this,
float f[3] = {0.0f};
glGetnUniformfv(program, glGetUniformLocation(program, name.c_str()), 3, f);
my program crashes. I checked the glGetUniformLocation but it had a valid output.
The third parameter to the glGetnUniform family of functions is not actually the number of entries in the array. It is the byte size of the array pointed to by f. Which, because f is an array rather than just a pointer to an array, would be sizeof(f).
Now, your implementation shouldn't have crashed, so there's probably something else going on there. But this is the problem in the code you've provided.
Unless you're using a context that actually supports OpenGL 4.5+, get the vec3 using "the old way" like this:
float f[3] = {0.0f};
glGetUniformfv(program, glGetUniformLocation(program, name.c_str()), f);
The new desktop-only glGetnUniform entry points exist only for extra safety, similar to strncpy vs strcpy.
Also, if you do use the glGetn variant, you should pass 12 instead of 3 for bufSize since it's a byte count.

GL_SHADER_STORAGE_BUFFER memory limitations

I'm writing ray-tracing on OGL computing shaders, to pass data to and from shaders I use buffers.
When size of vec2 output buffer (which is equal to number of rays multiplied by number of faces) reaches ~30Mb attempt of mapping buffer is stable returning NULL pointer. Range mapping also fails.
I can't find any info about GL_SHADER_STORAGE_BUFFER limitations in ogl documentation, but maybe someone can help me, is ~30Mb limit or this mapping-fail may happen because of something different?
And is there any way to avoid this except for calling shader multiple times?
Data declaration in shader:
#version 440
layout(std430, binding=0) buffer rays{
vec4 r[];
};
layout(std430, binding=1) buffer faces{
vec4 f[];
};
layout(std430, binding=2) buffer outputs{
vec2 o[];
};
uniform int face_count;
uniform vec4 origin;
Calling code (using some Qt5 wrappers):
QOpenGLBuffer ray_buffer;
QOpenGLBuffer face_buffer;
QOpenGLBuffer output_buffer;
QVector<QVector2D> output;
output.resize(rays[r].size()*faces.size());
if(!ray_buffer.create()) { /*...*/ }
if(!ray_buffer.bind()) { /*...*/ }
ray_buffer.allocate(rays.data(), rays.size()*sizeof(QVector4D));
if(!face_buffer.create()) { /*...*/ }
if(!face_buffer.bind()) { /*...*/ }
face_buffer.allocate(faces.data(), faces.size()*sizeof(QVector4D));
if(!output_buffer.create()) { /*...*/ }
if(!output_buffer.bind()) { /*...*/ }
output_buffer.allocate(output.size()*sizeof(QVector2D));
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ray_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, face_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, output_buffer.bufferId());
int face_count = faces.size();
compute.setUniformValue("face_count", face_count);
compute.setUniformValue("origin", pos);
ogl->glDispatchCompute(rays.size()/256, faces.size(), 1);
ray_buffer.destroy();
face_buffer.destroy();
QVector2D* data = (QVector2D*)output_buffer.map(QOpenGLBuffer::ReadOnly);
First of all, you have to understand that the OpenGL specification defines minimum maxima for a variety of values (the ones starting with a MAX_{*} prefix). That means that implementations are required to at least provide the specified amount as the maximum value, but are free to increase the limit as implementors see fit. This way, developers can at least rely on some upper bound, but can still make provisions for possibly larger values.
Section 23 - State Tables summarizes what has been previously specified in the corresponding sections. The information you were looking for is found in table 23.64 - Implementation Dependent Aggregate Shader Limits (cont.). If you want to know about which state belongs where (because there is per-object state, quasi-global state, program state and so on), you go to section 23.
The minimum maximum size of a shader storage buffer is represented by the symbolic constant MAX_SHADER_STORAGE_BLOCK_SIZE as per section 7.8 of the core OpenGL 4.5 specification.
Since their adoption into core, the required size (i.e. the minimum maximum) has been significantly increased. In core OpenGL 4.3 and 4.4, the minimum maximum was pow(2, 24) (or 16MB with 1 byte basic machine units and 1MB = 1024^2 bytes) - in core OpenGL 4.5 this value is now pow(2, 27) (or 128MB)
Summary: When in doubt about OpenGL state, refer to section 23 of the core specification.
From OpenGL Wiki:
SSBOs can be much larger. The OpenGL spec guarantees that UBOs can be
up to 16KB in size (implementations can allow them to be bigger). The
spec guarantees that SSBOs can be up to 128MB. Most implementations
will let you allocate a size up to the limit of GPU memory.
OpenGL < 4.5 guarantees only 16MiB (OpenGL 4.5 increased the minimum to 128MiB) , you can try using glGet() to query if you can bind more.
GLint64 max;
glGetInteger64v(GL_MAX_SHADER_STORAGE_BLOCK_SIZE, &max);
In fact problem seems to be in Qt wrappers. Didn't look in-depth, but when I've changed QOpenGLBuffer's create(), bind(), allocate() and map() to glCreateBuffers(), glBindBuffer(), glNamedBufferData() and glMapNamedBuffer(), all called through QOpenGLFunctions_4_5_Core, memory problem was gone until I reached 2Gb (which is GPU physical memory limit).
Second error I've made was not using glMemoryBarrier(), but it didn't help while QOpenGLBuffer was in use.

GLSL: will compiler evaluate functions with constant arguments?

If I have the following code in a GLSL fragment shader:
float r = 0.386;
float a = 26.6;
float xd = r*cos(0.0174532924*(a+0));
float yd = r*sin(0.0174532924*(a+0));
float xe = r*cos(0.0174532924*(a+90));
float ye = r*sin(0.0174532924*(a+90));
is it a sane assumption that the compiler will evaluate those trigonometric functions instead of have them be evaluated in every fragment execution?
In this case, sadly, you can't know much, since the compilation is done by the GPU. I would say it is implementation dependent, since some compilers may be better optimized.
However, as WearyWanderer sayed, you can hardcode the values or pass them through uniforms/UBO.
As you mentioned you could calculate the values and directly assign them, but want to let it for documentation purposes, I assum the values will be the same in every execution of the shader code.
Uniform Variables are variables that you can calculate once, send to a shader, and are the same for every execution, unless you change the uniform variable at some point. For example:
float r = 0.386;
float a = 26.6;
float xd_val = r*cos(0.0174532924*(a+0));
GLuint xd_id = glGetUniformLocation(pShaderProgram, "xd");
glUniform1f(xd_id, xd_val);
This calculates the value only once on the CPU, passes it to the shader program as a uniform variable, and the shader has access to the value for every execution without recaulcating it, but still leaves the code in here for your documentation that you wanted.
Uniform's are commonly used for object wide values, I.E an alpha-value, passing in scene lights for phong shader model, etc.

Issue with glBindBufferRange() OpenGL 3.1

My vertex shader is ,
uniform Block1{ vec4 offset_x1; vec4 offset_x2;}block1;
out float value;
in vec4 position;
void main()
{
value = block1.offset_x1.x + block1.offset_x2.x;
gl_Position = position;
}
The code I am using to pass values is :
GLfloat color_values[8];// contains valid values
glGenBuffers(1,&buffer_object);
glBindBuffer(GL_UNIFORM_BUFFER,buffer_object);
glBufferData(GL_UNIFORM_BUFFER,sizeof(color_values),color_values,GL_STATIC_DRAW);
glUniformBlockBinding(psId,blockIndex,0);
glBindBufferRange(GL_UNIFORM_BUFFER,0,buffer_object,0,16);
glBindBufferRange(GL_UNIFORM_BUFFER,0,buffer_object,16,16);
Here what I am expecting is, to pass 16 bytes for each vec4 uniform. I get GL_INVALID_VALUE error for offset=16 , size = 16.
I am confused with offset value. Spec says it is corresponding to "buffer_object".
There is an alignment restriction for UBOs when binding. Any glBindBufferRange/Base's offset must be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT. This alignment could be anything, so you have to query it before building your array of uniform buffers. That means you can't do it directly in compile-time C++ logic; it has to be runtime logic.
Speaking of querying things at runtime, your code is horribly broken in many other ways. You did not define a layout qualifier for your uniform block; therefore, the default is used: shared. And you cannot use `shared* layout without querying the layout of each block's members from OpenGL. Ever.
If you had done a query, you would have quickly discovered that your uniform block is at least 32 bytes in size, not 16. And since you only provided 16 bytes in your range, undefined behavior (which includes the possibility of program termination) results.
If you want to be able to define C/C++ objects that map exactly to the uniform block definition, you need to use std140 layout and follow the rules of std140's layout in your C/C++ object.