Push constant limit in geometry shader? - glsl

I have a geometry shader with the following push constant block:
layout(push_constant) uniform Instance {
mat4 VP;
vec3 posCam;
float radius;
float curvature;
} u_instance;
The push constants are defined in the pipeline layout like this:
uint32_t offset = 0;
uint32_t size = 21 *sizeof(float);
vk::PushConstantRange range {vk::ShaderStageFlagBits::eGeometry,offset,size};
However, the Vulkan validation layers throw this error:
Push constant range covering variable starting at offset 0 not accessible from stage VK_SHADER_STAGE_GEOMETRY_BIT
What does 'not accessible' mean here? Why wouldn't they be accessible? If I move the push constants to a different stage (e.g. fragment or vertex shader), no error occurs.
Additionally, I only get this error on a Nvidia GeForce GTX 650 Ti. I've also tried it on an AMD card, which worked fine.
Is there some kind of limitation on push constants for geometry shaders? I've checked the limitations for my Nvidia GPU, the total max push constant size is 256 bytes, and geometry shaders are supported. I also can't find anything in the Vulkan specification either.

Can you please add some more code (or upload it to somewhere)? I just tested this with your push constant block on a GTX 980 with validation layers compiled from source and don't get any validation warnings.
Additionally, I only get this error on a Nvidia GeForce GTX 650 Ti. I've also tried it on an AMD card, which worked fine.
That's very odd, as the validation messages are not generated by the drivers and as such should not differ between implementations (unless it's a validation related to device limits).
Is there some kind of limitation on push constants for geometry shaders? I've checked the limitations for my Nvidia GPU, the total max push constant size is 256 bytes, and geometry shaders are supported. I also can't find anything in the Vulkan specification either.
There is no push constant limit specific to the geometry shader. If you exceed the push constant size limit the validation layers will throw an error.
I think that std430 packing rules (or 14.5.4. Offset and Stride Assignment of the Vulkan spec) can mess up the sizes. E.g. the vec3 would be laid out as vec4 (so maybe 22*sizeof(float)? - sorry not exactly confident in this myself).
Not sure if packing would be a problem here, but basically this should work without a validation layer message. If stride would be a problem the validation layers should still not trigger anything if you start with offset 0.

I think that std430 packing rules (or 14.5.4. Offset and Stride Assignment of the Vulkan spec) can mess up the sizes. E.g. the vec3 would be laid out as vec4 (so maybe 22*sizeof(float)? - sorry not exactly confident in this myself).
The layer code is open if you would like to investigate yourself (and I found the line producing that report): https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/blob/master/layers/core_validation.cpp#L1976
UPDATE:
I don't think I was right above. What 1.0.17 SDK glslangValidator gives me is 84 bytes (21*float; with member offsets 0, 64, 76 and 80 respectively). The whole block is getting padded to multiple of 16 to me though (to whole block size of 96 B).
Besides the message is tied to the supplied stage enum (it compares it with the stage of the shader module). It is very strange the error message would differ between implementations... (make sure you have updated SDK and drivers, and be suspicious of the vkcpp wrapper or whatever that posted code is) and check pStages[n].stage member of the used VkGraphicsPipelineCreateInfo and the PushRange stage value (which the layer compares).

Related

Intel OpenGL Driver bug?

Whenever I try to render my terrain with point light's it only works on my Nvidia gpu and driver, and not the Intel integrated and driver. I believe the problem is in my code and a bug in the Nvidia gpu since I heard Nvidia's OpenGL implementations are buggy and will let you get away with things your not supposed to. And since I get no error's I need help debugging my shader's.
Link:
http://pastebin.com/sgwBatnw
Note:
I use OpenGL 2 and GLSL Version 120
Edit:
I was able to fix the problem on my one, to anyone with similar problems it's not because I used the regular transformation matrix because when I did that I set the normals w value to 0.0; The problem was that with the intel integrated graphics there is apparently a max number of array's in a uniform or max uniform size in general and I was going over that limit but it was deciding not to report it. Another thing wrong with this code was that I was doing implicit type conversion (dividing vec3's by floats) so I corrected those things and it started to work. Here's my updated code.
Link: http://pastebin.com/zydK7YMh

Does a conversion take place when uploading a texture of GL_ALPHA format (glTexImage2D)?

The documentation for glTexImage2D says
GL_RED (for GL) / GL_ALPHA (for GL ES). "The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha. Each component is clamped to the range [0,1]."
I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague. Can anyone confirm whether uploading a texture as GL_RED / GL_ALPHA gets converted from 8bit to 32bit on the GPU?
I'm interested in answers for GL and GL ES.
I've read through the GL ES specs to see if it specifies whether the GPU memory is actually 32bit vs 8bit, but it seems rather vague.
Well, that's what it is. The actual details are left for the actual implementation to decide. Giving such liberties in the specification allows actual implementations to contain optimizations tightly tailored to the target system. For example a certain GPU may cope better with a 10 bits per channel format, so it's then at liberty to convert to such a format.
So it's impossible to say in general, but for a specific implementation (i.e. GPU + driver) a certain format will be likely choosen. Which one depends on GPU and driver.
Following on from what datenwolf has said, I found the following in the "POWERVR SGX
OpenGL ES 2.0 Application Development Recommendations" document:
6.3. Texture Upload
When you upload textures to the OpenGL ES driver via glTexImage2D, the input data is usually in linear scanline
format. Internally, though, POWERVR SGX uses a twiddled layout (i.e.
following a plane-filling curve) to greatly improve memory access
locality when texturing. Because of this different layout uploading
textures will always require a somewhat expensive reformatting
operation, regardless of whether the input pixel format exactly
matches the internal pixel format or not.
For this reason we
recommend that you upload all required textures at application or
level start-up time in order to not cause any framerate dips when
additional textures are uploaded later on.
You should especially
avoid uploading texture data mid-frame to a texture object that has
already been used in the frame.

Which version of GLSL supports Indexing in Fragment Shader?

I have a fragment shader that iterates over some input data and on old hardwares I get:
error C6013: Only arrays of texcoords may be indexed in this profile, and only with a loop index variable
Googling around I saw a lot of things like "hardware prior to XX doesnt support indexing on fragment shader".
I was wondering if this behavior is standardized in GLSL versions, something like "glsl version pior to XX doesnt support indexing on fragment shader". And if so, which version starts supporting it.
What is your exact hardware ?
Old ATI cards (below X1600) and their drivers have such issues. Most certainly, not the most recent cards from Intel also suffer from this.
"Do you have any sugestion on how to detect if my hardware is capable of indexing in fragment shader?"
The only reliable yet not-so-beautiful way is to get the Renderer information:
glGetString(GL_RENDERER)
and check if this renderer occurs in the list of unsupported ones.
That particular error comes from the Nvidia compiler for nv4x (GeForce 6/7 cards), and is a limitation of the hardware. Any workaround would require disabling the hardware completely and using pure software rendering.
All versions of GLSL support indexing in the language -- this error falls under the catch-all of exceeding the hardware resource limits.

Do some GPUs or drivers lack some of the glsl built-in functions?

I'm trying to perform a log during my rendering on an Intel X3100 under Linux (using the default Ubuntu driver). The code looks something like as follows:
vec4 frag_color;
frag_color.rgb = log(frag_value.rgb);
frag_color.a = frag_value.a
gl_FragColor = frag_color;
where frag_value is derived from a texture lookup. Now, I can set the texture such that the log of frag_value should give a sensible answer (i.e, it's in a sensible range to give a frag_color 0.0->1.0), but it always renders as black (so I assume it's just setting it to zero). Of course, I can verify I'm sensibly setting frag_value by removing the log (and setting the frag_value texture to be in the range 0.0->1.0), which does what I expect, and multiplication and other trivial operations work fine.
My question is, is this expected behaviour? Am I missing something? Are some GPUs or drivers lacking the some built in functions (e.g. sqrt seems to not work either)?
I'm pretty sure color components are in the 0.0-1.0 range for regular non-float textures.
I'm also pretty sure gl_FragColor components are clamped to the 0.0-1.0 range by default.
log(x) for 0 < x < 1 is negative.
Do some GPUs or drivers lack some of the glsl built-in functions?
Yes. noise functions aren't properly implemented anywhere - non-functional on NVidia, massive performance drop or non-functional on ATI (were this way last time I checked).
Solved it:
Texture uploads are usually clamped to the range 0.0->1.0 (despite what might be inferred from the internal format type name), so of course log is not going to give anything useful. Full range floats were introduced with ARB_texture_float, which extends the internal types to include full range floats, such as LUMINANCE_ALPHA32F_ARB. Using that solves the problem.

DirectX 9 HLSL vs. DirectX 10 HLSL: syntax the same?

For the past month or so, I have been busting my behind trying to learn DirectX. So I've been mixing back back and forth between DirectX 9 and 10. One of the major changes I've seen in the two is how to process vectors in the graphics card.
One of the drastic changes I notice is how you get the GPU to recognize your structs. In DirectX 9, you define the Flexible Vertex Formats.
Your typical set up would be like this:
#define CUSTOMFVF (D3DFVF_XYZRHW | D3DFVF_DIFFUSE)
In DirectX 10, I believe the equivalent is the input vertex description:
D3D10_INPUT_ELEMENT_DESC layout[] = {
{"POSITION",0,DXGI_FORMAT_R32G32B32_FLOAT, 0 , 0,
D3D10_INPUT_PER_VERTEX_DATA, 0},
{"COLOR",0,DXGI_FORMAT_R32G32B32A32_FLOAT, 0 , 12,
D3D10_INPUT_PER_VERTEX_DATA, 0}
};
I notice in DirectX 10 that it is more descriptive. Besides this, what are some of the drastic changes made, and is the HLSL syntax the same for both?
I would say there's no radical changes in the HLSL syntax itself between DX9 and DX10 (and by extension DX11).
As codeka said, changes are more a matter of cleaning the API and a road toward generalization (for the sake of GPGPU). But there are indeed noticable differences:
Noticable differences:
To pass constant to the shaders, you
now have to go through Constant Buffers.
A Common-Shader Core: all types of
shader have access to the same set of
intrinsic functions (with some exceptions like for GS stage). Integer and bitwise operations are now fully IEEE-compliant (and not emulated via floating point). You have now access to binary casts to interpret an int as a float, a float as an uint etc..
Textures and Samplers have been dissociated. You now use syntax g_myTexture.Sample( g_mySampler, texCoord )
instead of tex2D( g_mySampledTexture, texCoord )
Buffers: a new kind of resource for accessing data that need no filtering in a random access way, using the new Object.Load function.
System-Value Semantics: a generalization and extensions of POSITION, DEPTH, COLOR semantics, that are now SV_Position, SV_Depth, SV_Target and add of per stage new semantics like SV_InstanceID, SV_VertexId, etc.
That's all what I see for now. If something new pops up of my mind I will update my answer.
The biggest change I've noticed between DX9 and DX10 is the fact that under DX10 you need to set an entire renderstate block where in DX9 you could change individual states. This broke my architecture somewhat because I was rather relying on being able to make a small change and leave all the rest of the states the same (This only really becomes a problem when you set states from a shader).
The other big change is the fact that under DX10 vertex declarations are tied to a compiled shader (in CreateInputLayout). Under DX9 this wasn't the case. You just set a declaration and set a shader. Under DX10 you need to create a shader then create an input layout attached to a given shader.
As codeka points out the D3DVERTEXELEMENT9 has been the recommended way to create shader signatures since DX9 was introduced. FVF was already depreciated and through FVF you are unable to do things like set up a tangent basis. Vertex layours are far far more powerful and don't cause you to get fixed to a layout. You can put the vertex elements wherever you like.
If you want to know more about DX9 input layouts then i suggest you start with MSDN.
FVFs were (kind-of) deprecated in favour of D3DVERTEXELEMENT9 (aka Vertex Declarations) - which is remarkably similar to D3D10_INPUT_ELEMENT_DESC - anyway. In fact, most of what's in DirectX 10 is remarkably similar to what was in DirectX 9 minus the fixed-function pipeline.
The biggest change between DirectX9 and DirectX10 was the cleaning up of the API (in terms of the separation of concerns, making it much clearer what goes with what stage of the pipeline, etc).