Choosing between multiple shaders based on uniform variable - opengl

I want to choose from 2 fragment shaders based on the value of an uniform variable. I want to know how to do that.
I have onSurfaceCreated function which does compile and link to create program1 and glGetAttribLocation of that program1
In my onDrawFrame I do glUseProgram(program1). This function runs for every frame.
My problem is that, in the function onDrawFrame(), I get value of my uniform variable. There I have to choose between program1 or program2. But program1 is already compiled linked and all. How to do this? How will I change my program accordingly and use that since it is already done in onSurfaceCreated.?

Looks like you need to prepare both programs in your onSurfaceCreated function. I'll try to illustrate that with a sample code. Please organize it in a more accurate manner in your project:
// onSurfaceCreated function:
glCompileShader(/*shader1 for prog1*/);
glCompileShader(/*shader2 for prog1*/);
//...
glCompileShader(/*shadern for prog1*/);
glCompileShader(/*shader1 for prog2*/);
glCompileShader(/*shader2 for prog2*/);
//...
glCompileShader(/*shadern for prog2*/);
glLinkProgram(/*prog1*/);
glLinkProgram(/*prog2*/);
u1 = glGetUniformLocation(/*uniform in prog1*/);
u2 = glGetUniformLocation(/*uniform in prog2*/);
// onDrawFrame
if(I_need_prog1_condition) {
glUseProgram(prog1);
glUniform(/*set uniform using u1*/);
} else {
glUseProgram(prog2);
glUniform(/*set uniform using u2*/);
}
If you want to use the same set of uniforms form different programs (like in the code above), there exists a more elegant and up-to-date solution: uniform buffer objects! For example, you can create a buffer object with all variables that any of your shaders may need, but each of your shader programs can use only a subset of them. Moreover, you can determine unneeded (optimized-out) uniforms using glGetActiveUniform.
Also please note that the title of your question is a bit misleading. It looks like you want to choose an execution branch not in your host code (i.e. onDrawFrame function), but in your shader code. This approach is known as uber-shader technique. There are lots of discussions about them in the Internet like these:
http://www.gamedev.net/topic/659145-what-is-a-uber-shader/
http://www.shawnhargreaves.com/hlsl_fragments/hlsl_fragments.html
If you decide to do so, remember that GPU is not really good at handling if statements and other branching.

Related

GLSL dynamic looping not working on Intel UHD Graphics [duplicate]

I asked for help about an OpenGL ES 2.0 Problem in this question.
What seems to be the answer is very odd to me.
Therefore I decided to ask this question in hope of being able to understand what is going on.
Here is the piece of faulty vertex-shader code:
// a bunch of uniforms and stuff...
uniform int u_lights_active;
void main()
{
// some code...
for ( int i = 0; i < u_lights_active; ++i )
{
// do some stuff using u_lights_active
}
// some other code...
}
I know this looks odd but this is really all code that is needed to explain the problem / faulty behavior.
My question is: Why is the loop not getting executed when I pass in some value greater 0 for u_lights_active?
When I hardcode some integer e.g. 4, instead of using the uniform u_lights_active, it is working just fine.
One more thing, this only appears on Android but not on the Desktop. I use LibGDX to run the same code on both platforms.
If more information is needed you can look at the original question but I didn't want to copy and paste all the stuff here.
I hope that this approach of keeping it short is appreciated, otherwise I will copy all the stuff over.
Basically GLSL specifies that implementations may restrict loops to have "constant" bounds on them. This is to make it simpler to optimize the code to run in parallel (different loop counts for different pixels would be complex). I believe on some implementations the constants even have to be small. Note that the spec just specifies the "minimum" behavior, so some devices might support more complex loop controls than the spec requires.
Here's a nice summary of the constraints:
http://www.khronos.org/webgl/public-mailing-list/archives/1012/msg00063.html
Here's the GLSL spec (look at section 4 of Appendix A):
http://www.khronos.org/registry/gles/specs/2.0/GLSL_ES_Specification_1.0.17.pdf
http://www.opengl.org/discussion_boards/showthread.php/171437-can-for-loops-terminate-with-a-uniform
http://www.opengl.org/discussion_boards/showthread.php/177051-GLSL-loop-problem-on-Radeon-HD-cards
http://www.opengl.org/discussion_boards/showthread.php/162722-Problem-when-using-uniform-variable-as-a-loop-count-in-fragment-shader
https://www.opengl.org/discussion_boards/showthread.php/162535-variable-controlled-for-loops
If you have a static loop it can be unrolled and made into static constant lookups. If you absolutely need to make it dynamic, you'll need to store indexed data into a 1D texture and sample that instead.
I'm guessing that the hardware on the desktop is more advanced than on the tablet. Hope this helps!
Kind of a fun half-answer, and-or, the solution to the underlying problem that I have chosen.
The following function called with 'id' passed as the ID of the shader's script block and 'swaps' filled with an array of 2 component arrays in the format of [[ThingToReplace, ReplaceWith],] strings. Called before the shader is created.
In the javascript:
var ReplaceWith = 6;
function replaceinID(id,swaps){
var thingy = document.getElementById(id);
for(var i=0;i<swaps.length;i++){
thingy.innerHTML = thingy.innerHTML.replace(swaps[i][0], swaps[i][1]);
}
}
replaceinID("My_Shader",[['ThingToReplace',ReplaceWith],]);
Coming from C, this is a very Macro like approach, in that it simulates a preprocessor.
In the GLSL:
for(int i=0;i<ThingToReplace;i++){
;//whatever goes here
}
Or;
const int val = ThingToReplace;
for(int i=0;i<val;i++){
;//whatever goes here
}

RenderPass Dependency and (transition) memory barrier

I am facing a comprehensive issue.
Let's say I have an image in a TRANSFER_LAYOUT layout. Going that way, the memory is already made available (not visible).
Let's say I update a uniform buffer (via vkCmdCopyBuffer).
Now let's say I have a renderPass (with an "empty frameBuffer", so there is no colorAttachment to make thing simpler) that use the prior image in SHADER_READ_OPTIMAL layout and the uniform buffer we just update. The image and the buffer are both used inside the fragment shader.
Is it correct to do the following?
Transition the image to SHADER_READ_LAYOUT
srcAccess = 0; // layers will say error : it must be TRANSFER_READ
dstAccess = 0; // The visibility will be made in the renderpass dependency (hower layers tells that it should be SHADER_READ I think)
srcPipe = TOP_OF_PIPE;
dstPipe = BOTTOM_OF_PIPE;
It is, in my understanding, meaningless to use different access than 0 here because TOP_OF_PIPE and BOTTOM_OF_PIPE does not access memory.
In the renderpass dependency from VK_EXTERNAL_SUBPASS :
srcAccess = TRANSFER_WRITE; // for the uniformBuffer
dstAccess = SHADER_READ; // for the uniform and the image
srcPipeline = TRANSFER; // For the uniformBuffer
dstPipeline = FRAGMENT_SHADER; // They are used here
Going that way, we are sure that the uniform buffer will not have any problems : The data are both made available and visible thanks to the renderPass. The memory should also be made visible for the image (also thanks to the dependency). However, the transition is write here to happened "not before" the bottom stage. Since I am using the image in FRAGMENT_STAGE, is it a mistake? Or is the "end of the renderPass dependency" behave like a bottom stage?
This code works on NVIDIA, and on AMD, but I am not sure it is really correct
To understand synchronization perfectly, one must simply read the specification; especially the Execution and Memory Dependencies theory chapter. Lets analyze your situation in terms of what is written in the specification.
You have (only) three synchronization commands: S1 (image transition to TRANSFER_LAYOUT and availability operation), S2 (image transition to SHADER_READ_LAYOUT), and S3 (render pass VK_EXTERNAL dependency).
Your command buffer is an ordered list like: [Cmds0, S1, Cmds1, S2, Cmds2, S3, Cmds3].
For S1 let's assume you did the first part of a dependency (i.e. src part) correctly. You only said you made the image available from.
You also said you did not made it visible to, so let's assume dstAccess was 0 and dstStage was probably BOTTOM_OF_PIPE.
S2 has no execution dependency and it has not memory dependency. Only layout transition. There is an layout transition synchronization exception in the spec saying that layout transitions are performed in full in submission order (i.e. implicit execution dependency is automagically added). I personally would not be comfortable relying on it (and I would not trust drivers to implement it correctly on the first try). But lets assume it is valid, and assume the image will be correctly transitioned and made available from (and not visible to) at some point after the S1.
The S3 is an external subpass dependency on non-attachment resource, but the spec reassures us it is no different than vkCmdPipelineBarrier with a VkMemoryBarrier.
The second part of the dependency (i.e. dst) in S3 seems correct for your needs.
TL;DR, so far so good.
The first part of the dependency (i.e. dst) in S3 would indeed be the problematic one.
There is no automatic layout transitions for non-attachment resources, so we cannot rely on that crutch as above.
Set of commands A3 are all the commands before the renderpass.
Synchronization scope A3S are only those kinds of operations that are on the srcStage pipline stage or any logically earlier stage (i.e. TOP_OF_PIPE up to the specified STAGE_TRANSFER).
The execution dependency is made between A3' and B3'. Above we agreed the B3' half of the dependency is correct. The A3' half is intersection of A3 and A3S.
The layout transition in S2 is made between srcPipe = TOP_OF_PIPE and dstPipe = BOTTOM_OF_PIPE, so basically can be anywhere. It can be as late as in BOTTOM_OF_PIPE (to be precise, happens-before BOTTOM_OF_PIPE of commands recorded after S2 is executed).
So the layout transition is part of A3, but there is no guarantee it is part of A3S; so there would not be guarantee the transition is part of the intersection A3'.
That means there is no guarantee the layout transition to SHADER_READ_LAYOUT happens-before the image reading in the first subpass in STAGE_FRAGMENT_SHADER.
There is no correct memory dependency either because that is defined in terms of A3' too.
EDIT: Somehow missed this, which is probably the problem:
Or is the "end of the renderPass dependency" behave like a bottom stage?
Beginning and end of a render pass does not behave like anything. It only affects submission order. In the presence of VK_EXTERNAL dependency only that applies (and of course any other previous explicit synchronization commands). What happens without explicit VK_EXTERNAL dependency is described in the spec too bellow Valid Usage sections of VkSubpassDependency (basically all memory that is available before TOP_OF_PIPE is made visible to the whole first subpass for attachment use).

Does IASetInputLayout check to see if you pass an already set input layout?

I am designing a game engine in DirectX 11 and I had a question about the ID3D11DeviceContext::IASetInputLayout function. From what i can find in the documentation there is no mention of what the function will do if you set an input layout to the device that has been previously set. In context, if i were to do the following:
//this assumes dc is a valid ID3D11DeviceContex interface and that
//ia is a valid ID3D11InputLayout interface.
dc->IASetInputLayout(&ia);
//other program lines: drawing, setting vertex shaders/pixel shaders, etc.
dc->IASetInputLayout(&ia);
//continue execution
would this incur a performance penalty through device state switching, or would the runtime recognize the input layout as being equivalent to the one already set and return?
While I also can not find anything related to if the InputLayout is already set, you could get a pointer to the input layout already bound by calling ID3D11DeviceContext::IAGetInputLayout or by doing an internal check by keeping your own reference, that way you do not have a call to your ID3D11DeviceContext object.
As far as I know, it should detect that there are no changes and so the call is to be ignored. But it can be easily tested - just call this method 10000 times each render and see how bad FPS drop is :)

How to create a CgFx like effect system?

Seriouse graphics engine like CryEngine3, Unreal Engine 3 have their customized shader language and effect system. While trying to find some effect system for my small graphics framework, it looks like nvidia CgFx is the only choice (seems Khronos had a project called glFx, but the project page is 404 now).
I have several reasons to make a effect system of my own:
I need more control about how and when to pass the shader parameters.
In order to reuse shader snippets, I want to create some c++ macro like mechanism. It's also useful to use macro to do some conditional compilation, and that the way CryEngine used to produces various effects.
Looks like GLSL don't have such effect system
so I am wondering how to create a effect system? Do I need to write grammar parser from scratch or there's already some code/tools able to do this thing?
PS: I am using OpenGL with both GLSL and CG.
Back in the day when I was using HLSL, I developped a little shader system which allowed me to specify all my parameters through data, so that I could just edit a sort of XML file containing the list of parameters and the shader codes, and after saving, the engine would automatically reload it, rebind all parameters, etc.
It's nothing compared to what's found in the UDK, but pretty convenient, and I guess you're trying to implement something like that ?
I it is, then here are a few stuff to do. First, you need to create a class to abstract shader parameter handling (binding, setting, etc.) Something along these lines :
class IShaderParameter
{
protected:
IShaderParameter(const std::string & name)
: m_Uniform(-1)
, m_Name(name)
{}
GLuint m_Uniform;
std::string m_Name;
public:
virtual void Set(GLuint program) = 0;
};
Then, for static parameters, you can simply create an overload like this :
template < typename Type >
class StaticParameter
: public IShaderParameter
{
public:
StaticParameter(const std::string & name, const Type & value)
: IShaderParameter(name)
, m_Value(value)
{}
virtual void Set(GLuint program)
{
if (m_Uniform == -1)
m_Uniform = glGetUniformLocation(program, m_Name.c_str());
this->SetUniform(m_Value);
}
protected:
Type m_Value;
void SetUniform(float value) { glUniform1f(m_Uniform, value); }
// write all SetUniform specializations that you need here
// ...
};
And along the same idea, you can create a "dynamic shader parameter" type. For example, if you want to be able to bind a light's parameter to your shader, create a specialized parameter's type. In its constructor, pass the light's id so that it will know how to retrieve the light in the Set method. With a little work, you can have a whole bunch of parameters that you can then automatically bind to an entity of your engine (material parameters, light parameters, etc.)
The last thing to do is create a little custom file format (I used xml) to define your various parameters, and a loader. For example, in my case, it looked like this :
<shader>
<param type="vec3" name="lightPos">light_0_position</param>
<param type="vec4" name="diffuse">material_10_diffuse</param>
<vertexShader>
... a CDATA containing your shader code
</vertexShader>
</shader>
In my engine, "light_0_position" would mean a light parameter, 0 is the light's ID, and position is the parameter to get. Binding between the parameter and the actual value was done during loading so there was not much overhead.
Anyway, I don't if that answer your question, and don't take these sample codes too seriously (HLSL and OpenGL shaders work quite differently, and I'm no OpenGL expert ^^) but hopefully it'll give you a few leads :)
Could you elaborate on this? By working with OpenGL directly you have a full control over the parameters being passed to the GPU. What exactly are you missing?
(and 3.) GLSL does support re-using the code. You can have a library of shaders providing different functions. In order to use any function you just need to pre-declare it in the client shader (vec4 get_diffuse();) and attach the shader object implementing the function to the shader program before linking.

OpenGL Multitexturing - glActiveTexture is NULL

I have started a new project, which I want to use multitexturing in.
I have done multixexturing before, and is supported by my version of OpenGL
In the header I have:
GLuint m_TerrainTexture[3];//heightmap, texture map and detail map
GLuint m_SkyboxTexture[5]; //left, front, right, back and top textures
PFNGLMULTITEXCOORD2FARBPROC glMultiTexCoord2fARB;
PFNGLACTIVETEXTUREARBPROC glActiveTexture;
In the constructor I have:
glActiveTexture = (PFNGLACTIVETEXTUREARBPROC) wglGetProcAddress((LPCSTR)"glActiveTextureARB");
glMultiTexCoord2fARB = (PFNGLMULTITEXCOORD2FARBPROC) wglGetProcAddress((LPCSTR)"glMultiTexCoord2fARB");
if(!glActiveTexture || !glMultiTexCoord2fARB)
{
MessageBox(NULL, "multitexturing failed", "OGL_D3D Error", MB_OK);
}
glActiveTexture( GL_TEXTURE0_ARB );
...
This shows the message box "multitexturing failed" and the contents of glActiveTexture is 0x00000000
when it gets to glActiveTexture( GL_TEXTURE0_ARB ); I get an access violation error
I am implementing the MVC diagram, so this is all in my terrain view class
You quoted your code to load the extensions like following:
PFNGLMULTITEXCOORD2FARBPROC glMultiTexCoord2fARB;
PFNGLACTIVETEXTUREARBPROC glActiveTexture;
glActiveTexture = (PFNGLACTIVETEXTUREARBPROC) wglGetProcAddress((LPCSTR)"glActiveTextureARB");
glMultiTexCoord2fARB = (PFNGLMULTITEXCOORD2FARBPROC) wglGetProcAddress((LPCSTR)"glMultiTexCoord2fARB");
This is very problematic, since it possibly redefines already existing symbols. The (dynamic) linker will eventually trip over this. For example it might happen that the assignment to the pointer variable glActiveTexture goes into some place, but whenever a function of the same name is called it calls something linked in from somewhere else.
In C you usually use a combination of preprocessor macros and custom prefix to avoid this problem, without having to adjust large portions of code.
PFNGLMULTITEXCOORD2FARBPROC myglMultiTexCoord2fARB;
#define glMultiTexCoord2fARB myglMultiTexCoord2fARB
PFNGLACTIVETEXTUREARBPROC myglActiveTexture;
#define glActiveTexture myglActiveTexture
glActiveTexture = (PFNGLACTIVETEXTUREARBPROC) wglGetProcAddress((LPCSTR)"glActiveTextureARB");
glMultiTexCoord2fARB = (PFNGLMULTITEXCOORD2FARBPROC) wglGetProcAddress((LPCSTR)"glMultiTexCoord2fARB");
I really don't know of any other reason why things should fail if you have a valid render context active and the extensions supported.
GLEE is a dead library; it hasn't been updated in a long time.
GLEW is a fine extension loading library, but it has some issues working with core 3.2 and above.
I would suggest GL3W. The beauty of it is that it is self-updating; it downloads and parses the headers by itself. The downside is that you need a Python 2.6 installation to generate the loader. But it provides reasonably good results otherwise.
I recommend GLEW/GLEE for extension management.
Rastertek tutorial has the complete setup required to make wglGetProcAddress to work. GLEW doesn't work for me either, I've tried everything I could think of and I asked many people about it but it simply doesn't work in VS 2012, not to mention the enormous frustration I experienced when I wanted to compile a shader.