I can work with code in C++, but it's not where I spend most of my time. I usually work in another language, where, over the course of my career, I have put together a well defined architecture for building predictor/corrector (e.g Kalman filter) type algorithms that are easily maintained and modified. For the sake of a ground up deployment of a recently designed filter, I am hoping to replicate this architecture within a C++ framework. Hopefully, we can get the same level of extensibility built into the deployed product, so I don't need to keep jumping back-and-forth to another language whenever I want to modify the model being used by the filter.
The idea here is that we're going to have an array that contains a bunch of different information about the state of a given system. Let's say, for example, we have a an object with a position and orientation in 3D... We'll use a quaternion for the orientation, but the specifics of that aren't super important.
Here's some pseudo-code to demonstrate what I'm trying to accomplish:
function build_model()
model.add_state('quaternion',[0;0;0;1],[1;1;1]);
model.add_state('position',[0;0;0],[10;10;10]);
model.add_input('velocity',[0;0;0]);
model.add_input('angular_rate',[0;0;0]);
model.add_noise('velocity_noise',[1;1;1]);
model.add_noise('angular_rate_noise',0.01*[1;1;1]);
end
where the above have the form:
add_state(state_name, initial_state_estimate, init_error_std_deviation_estimate)
add_input(input_name, initial_input_value)
add_noise(noise_name, noise_std_deviation)
After calling build_model() happens, I end up with a bunch of information about the estimator.
The state space is of dimension 7
The state error space is of dimension 6
The input vector is of dimension 6
The "process noise" vector is of dimension 6
Further (indexed from 0), I have some arrays, such that:
state[0:3] holds the quaternion
state[4:6] holds the position
state_err[0:2] holds quaternion error
state_err[3:5] holds position error
input[0:2] holds velocity
input[3:5] holds angular_rate
process_noise[0:2] holds velocity noise
process_noise[3:5] holds angular rate noise
... but, I don't want a bunch of hard-coded indices... in fact, once the model is built, the rest of the code should be designed to be completely agnostic to the positions/dimensions/etc of the variables/model/state/error-space etc.
Since the estimator and the model don't really care about each other, I try to keep them encapsulated... i.e. the estimator just has state/error/noise of known dimensions and processes it with functions of a generic format, and then the model specific stuff is presented in the appropriate format. This, unfortunately, makes using an indexed array (rather than a struct or something) preferable.
Essentially what I'm looking for, is a pre-compiler way to associate names (like a structure) and indices (like an array) with the same data... ideally building it up piece by piece using simple language as shown above, to a final dimension, determined by the pre-compiler based on the model definition, to be used for defining the size of various arrays within the estimator runtime algorithm.
I'm not looking for someone to do this for me, but I'd love a push in the right direction. Good architecture early pays dividends in the long run, so I'm willing to invest some time to get it right.
So, a couple of things I've thought about:
There are definitely ways to do this at run-time with dynamic memory and things like std:vector, structures, enums, and so forth. But, since the deployed version of this is going to be running in real-time, performance is an issue... besides, all of this stuff shouldn't need to happen at run-time anyway. If we had sufficiently sophisticated precompiler, it could just calculate all of this out, define some constants/macros/whatever to manipulate the model by name while using indices behind the scenes... unfortunately, fancy precompiler stuff is a pretty niche area that I have little experience with.
It seems like template meta-programming and/or macros might be a way to go, but I'm hesitant to dive head-first into that without guidance, and I recognize that this is shady at best in terms of modern software design.
I could always write code to write the C++ code for me... i.e. spit out a bunch of #defines or enums for the indices by name, as well as the dimensionality of the model/estimator components, and just copy paste this into the C++ code... but that feels wrong for different reasons. On the other hand, that's one way to get a "sufficiently sophisticated pre-compiler".
Giving up on the compile-time dimensioning of my arrays would also solve the problem, but since the all of this is constant once computed, run-time seems like the wrong place for it...
So, is there an elegant solution out there? I'd hate to just brute force this, but I don't see a clear alternative. Also, much of the above may be WAY OFF for any number of reasons... apologies if so, and I appreciate any input you might have :-)
I ended up getting most of the way there using template meta-programming... [see below]
I'd like to find a way to add the state to state_enum and define its corresponding set_state struct at the same time ie:
add_state(quaternion,{0,0,0,1},{1,1,1})
just for cleanliness and to prevent one happening without the other... if anyone has ideas on how to do this (preferably without using __COUNTER__ or boost), let me know. Thanks!
#include <iostream>
struct state_enum{
enum{quaternion,position,last};
};
template <int state_num> struct set_state{
static constexpr double x0[] = {};
static constexpr double sx0[] = {};
};
template <> struct set_state<state_enum::quaternion>{
static constexpr double x0[] = {0,0,0,1};
static constexpr double sx0[] = {1,1,1};
};
template <> struct set_state<state_enum::position>{
static constexpr double x0[] = {0,0,0};
static constexpr double sx0[] = {2,2,2};
};
template <int state_num> struct state{
enum{
m_x = sizeof(set_state<state_num>::x0)/sizeof(set_state<state_num>::x0[0]),
m_dx = sizeof(set_state<state_num>::sx0)/sizeof(set_state<state_num>::sx0[0])
};
enum{
m_x_cummulative = state<state_num-1>::m_x_cummulative+m_x,
m_dx_cummulative=state<state_num-1>::m_dx_cummulative+m_dx,
i_x0=state<state_num-1>::m_x_cummulative,
i_dx0=state<state_num-1>::m_dx_cummulative,
i_x1=state<state_num-1>::m_x_cummulative+m_x-1,
i_dx1=state<state_num-1>::m_dx_cummulative+m_dx-1
};
};
template <> struct state<-1>{
enum{m_x = 0, m_dx=0};
enum{m_x_cummulative = 0, m_dx_cummulative=0, i_x0 = 0, i_dx0=0, i_x1 = 0, i_dx1=0};
};
int main(int argc, const char * argv[]) {
std::cout << "Summary of model indexing and dimensions...\n\n";
std::printf("%-32s %02i\n","quaternion first state index",state<state_enum::quaternion>::i_x0);
std::printf("%-32s %02i\n","quaternion final state index",state<state_enum::quaternion>::i_x1);
std::printf("%-32s %02i\n","position first state index",state<state_enum::position>::i_x0);
std::printf("%-32s %02i\n","position final state index",state<state_enum::position>::i_x1);
std::printf("%-32s %02i\n","full state vector dimensionality",state<state_enum::last>::m_x_cummulative);
std::cout << "\n";
std::printf("%-32s %02i\n","quaternion first error index",state<state_enum::quaternion>::i_dx0);
std::printf("%-32s %02i\n","quaternion final error index",state<state_enum::quaternion>::i_dx1);
std::printf("%-32s %02i\n","position first error index",state<state_enum::position>::i_dx0);
std::printf("%-32s %02i\n","position final error index",state<state_enum::position>::i_dx1);
std::printf("%-32s %02i\n","full error vector dimensionality",state<state_enum::last>::m_dx_cummulative);
std::cout << "\n\n";
return 0;
}
Suppose that I have one shader storage buffer and want to have several views into it, e.g. like this:
layout(std430,binding=0) buffer FloatView { float floats[]; };
layout(std430,binding=0) buffer IntView { int ints[]; };
Is this legal GLSL?
opengl.org says no:
Two blocks cannot use the same index.
However, I could not find such a statement in the GL 4.5 Core Spec or GLSL 4.50 Spec (or the ARB_shader_storage_buffer_object extension description) and my NVIDIA Driver seems to compile such code without errors or warnings.
Does the OpenGL specification expressly forbid this? Apparently not. Or at least, if it does, I can't see where.
But that doesn't mean that it will work cross-platform. When dealing with OpenGL, it's always best to take the conservative path.
If you need to "cast" memory from one representation to another, you should just use separate binding points. It's safer.
There is some official word on this now. I filed a bug on this issue, and they've read it and decided some things. Specifically, the conclusion was:
There are separate binding namespaces for: atomic counters, images, textures, uniform buffers, and SSBOs.
We don't want to allow aliasing on any of them except atomic counters, where aliasing with different offsets (e.g. sharing a binding) is allowed.
In short, don't do this. Hopefully, the GLSL specification will be clarified in this regard.
This was "fixed" in the revision 7 of GLSL 4.5:
It is a compile-time or link-time error to use the same binding number for more than one uniform block or for more than one buffer block.
I say "fixed" because you can still perform aliasing manually via glUniform/ShaderStorageBlockBinding. And the specification doesn't say how this will work exactly.
Ok, this is probably an easy one for the pro's out there. I want to use an enum in GLSL in order to make an if bitwise and check on it, like in c++.
Pseudo C++ code:
enum PolyFlags
{
Invisible = 0x00000001,
Masked = 0x00000002,
Translucent = 0x00000004,
...
};
...
if ( Flag & Masked)
Alphathreshold = 0.5;
But I am already lost at the beginning because it fails already compiling with:
'enum' : Reserved word
I read that enum's in GLSL are supposed to work as well as the bitwise and, but I can't find a working example.
So, is it actually working/supported and if so, how? I tried already with different #version in the shader, but no luck so far.
The OpenGL Shading Language does not have enumeration types. However, they are reserved keywords, which is why you got that particular compiler error.
C enums are really just syntactic sugar for a value (C++ gives them some type-safety, with enum classes having much more). So you can emulate them in a number of ways. Perhaps the most traditional (and dangerous) is with #defines:
#define Invisible 0x00000001u
#define Masked 0x00000002u
#define Translucent 0x00000004u
A more reasonable way is to declare compile-time const qualified global variables. Any GLSL compiler worth using will optimize them away to nothingness, so they won't take up any more resources than the #define. And it won't have any of the drawbacks of the #define.
const uint Invisible = 0x00000001u;
const uint Masked = 0x00000002u;
const uint Translucent = 0x00000004u;
Obviously, you need to be using a version of GLSL that supports unsigned integers and bitwise operations (aka: GLSL 1.30+, or GLSL ES 3.00+).
I've written my first couple of GLSL programs for Processing (a visual language similar to Java that can load shaders) recently that make fractals. In the loop that handles the fractal code, I have an escape conditional that breaks if a point would tend to infinity.
It works fine and it is similar to how I generally write the code for non-GLSL. However someone told me that two paths are calculated every time a conditional is executed. I've had a hard time finding exactly how much of a penalty is caused by conditionals in GLSL.
Edit: To the best of my understanding in non-GLSL when an if is encountered a path is assumed. If the "correct" path was assumed everything is great. If the "wrong" path was assumed then "bad" work is discarded and instructions continue along the "correct" path. The penalty might be say 3 (or whatever number) of instructions. I want to know if there is some number (3 or whatever) of instructions that are the penalty or if both paths are calculated all the way through.
Here is the code if the explanation is not clear enough:
// Mandelbrot Set code
int i = 0;
float zr = x;
float zi = y;
for (; i < maxIterations; i++) {
float sqZr = zr*zr;
float sqZi = zi*zi;
float twoZri = 2.0*zr*zi;
zr = sqZr-sqZi+x;
zi = twoZri+y;
if (sqZr+sqZi > 16.0) break;
}
On old GPUs, both sides of an if() clause were executed and the correct result chosen at the end. On newer ones, this is only the case if the compiler thinks it would be more efficient. if() clauses are not free: the generic rule of thumb I have used for some time is: "if() costs 14 clock cycles" though the latest GPUs may be cheaper.
Why is this so? Because GPUs are stream processors, they want to have identical data-loading profiles for all pixels (especially for gradient values like texture colors or values from vertex registers). The principle of SIMD -- even when the devices are not strictly SIMD -- is usually the way to get the most performance from such devices.
When in doubt, see if you can use one of the NVIDIA perf analysis tools on your code, or just try writing the code (it's short!) a few different ways and comparing your performance for your specific GPU.
(BTW Processing is not Java-like: it's Java)
I'm looking at the source of an OpenGL application that uses shaders. One particular shader looks like this:
uniform float someConstantValue;
void main()
{
// Use someConstantValue
}
The uniform is set once from code and never changes throughout the application run-time.
In what cases would I want to declare someConstantValue as a uniform and not as const float?
Edit:
Just to clarify, the constant value is a physical constant.
Huge reason:
Error: Loop index cannot be compared with non-constant expression.
If I use:
uniform float myfloat;
...
for (float i = 0.0; i < myfloat; i++)
I get an error because myfloat isn't a constant expression.
However this is perfectly valid:
const float myfloat = 10.0;
...
for (float i = 0.0; i < myfloat; i++)
Why?
When GLSL (and HLSL for that matter) are compiled to GPU assembly instructions, loops are unrolled in a very verbose (yet optimized using jumps, etc) way. Meaning the myfloat value is used during compile time to unroll the loop; if that value is a uniform (ie. can change each render call) then that loop cannot be unrolled until run time (and GPUs don't do that kind of JustInTime compilation, at least not in WebGL).
First off, the performance difference between using a uniform or a constant is probably negligible. Secondly, just because a value is always constant in nature doesn't mean that you will always want it be constant in your program. Programmers will often tweak physical values to produce the best looking result, even when that doesn't match reality. For instance, the acceleration due to gravity is often increased in certain types of games to make them more fast paced.
If you don't want to have to set the uniform in your code you could provide a default value in GLSL:
uniform float someConstantValue = 12.5;
That said, there is no reason not to use const for something like pi where there would be little value in modifying it....
I can think of two reasons:
The developer reuses a library of shaders in several applications. So instead of customizing each shader for every app, the developer tries to keep them general.
The developer anticipates this variable will later be a user-controlled setting. So declaring it as uniform is preparation for that upcoming feature.
If I was the developer and none of the above applies then I would declare it as "const" instead because it can give a performance benefit and I wouldn't have to set the uniform from my code.