Recently I've been messing around a fair amount with OpenGL, and I have come across the split between allowing OpenGL to manage the view/model/projection matrices or managing them yourself, either with your own matrix implementation or a library such as GLM. I've seen that a lot of large projects have their own camera management (i.e. manage their own translations, rotations etc.). I can see why it would help for making sure you have full control of the system, but besides this it seems like a lot of work for a marginal gain.
Why is it better to do your own management than to use the built-in OpenGL functions? Obviously this is in the context of a shader pipeline, not the fixed function default.
(This would apply to any 3D library).
(As an aside, OpenGL ES 2 has no transform management facility, so in some cases you have no choice.)
More on point, I've found managing matrices via OpenGL's built-in matrix stacks to be a real pain at times, forcing me to push and pop rather copiously in the more intricate portions of my rendering code, even reordering the rendering at times just to simplify stack management. I also wrote a C++ pusher-popper class that uses RAII to automatically manage all this, but it requires careful scoping of local variables.
When I switched to ES 2, I was dismayed to learn that all that functionality was gone. However, I found that switching to my own matrices actually simplified my code, because I could work with multiple transforms using a combination of local and member variables (with meaningful names) without getting lost in space, and the transform stack was replaced mainly by using the call stack — i.e., the current transform is a just local matrix variable that gets passed as a parent transform parameter to the next function down — but with the flexibility to do it differently at other times.
It is better for a large list of reasons. Apple's recent presentation on the OpenGL improvements in OSX Lion says it best: the newer OpenGL specs (primarily 3.2 on up) focus better on representing what the GPU is actually doing. In OpenGL 2.1, all of the matrix operations take place on the CPU. So, not only is there no magical accelerated benefit to using GL's matrices, you are locked into a completely arbitrary model of matrix management: projection & model-view matrices only (for vertices), matrix stack size limits, a limited set of matrix operations, etc.
When you start managing your own matrices, you start to see why it is so much better. As your scenes grow more complex, you start seeing the need for more matrix caches (beyond just "projection" and "model view"). You discover opportunities to build more useful matrix functions. For instance, which sounds more pleasant to use? glRotatef(90.0f, 1.0f, 0.0f, 0.0f); or matrix.rotateX(90.0f); ? It always bothered me that I had to specify the axis of rotation every single time!
As you start to recognize the divide between CPU operations and GPU operations, you will come to appreciate managing your own matrices.
The GL-managed matrix stack is deprecated in recent revs. of the OpenGL spec. So going forward managing them yourself is the only option.
Related
I'm building a high-performance UI layout engine on top of Direct3D 11. The application is being developed using Visual Studio 2013, targeting x64 and is intended for Windows 7 (with Platform Update) and up.
I need to do matrix transformations on 2D elements in the visual tree and I am wondering whether using DirextXMath's built-in (SIMD-optimized) XMMATRIX and its related functions is efficient for 2D use (as that only requires a 3x3 matrix while XMMATRIX et al is 4x4), or whether I should roll my own matrix class / functions (probably without any SIMD-specific code, though).
It seems to me that a 4x4 matrix throughout would mean a lot of redundant calculations being performed, but then again that might be offset by SIMD instructions when compared to non-SIMD 3x3 matrix work.
Edit: Comments about how "premature optimization is the root of all evil" (and derivatives thereof) are superfluous here (and ironically premature, since you know nothing about the project - or me). The question sums up what I am interested in some viewpoints on / knowing more about.
Layout engines tend to have a lot of chained transformations, so using (and keeping for the duration of the chain) your data in SSE registers is likely to improve performance (even more so than typical game scenarios which usually only have a handful of chained transformations). If you are specifically not going to use SSE in your custom class, then XMMATRIX will probably be faster. The column difference shouldn't really matter much since each row fits in an SSE register, but the row difference will mean an extra load. Still, the benefit of SSE is probably worth it.
That said, many modern compilers auto-vectorize now, so a custom class you write in vanilla C++ might end up getting SSE-optimized behind the scenes anyway.
Either way, you probably won't see any difference in the performance if you haven't already optimized your engine for caching behavior. For example, if your engine represents the hierarchy using pointers, and you just allocate new elements on the heap whenever you need them, you'll thrash the cache and have plenty of time to calculate transformations while you wait for memory, SSE or not.
I am reading through the OpenGL Superbible Fifth Edition and they discuss using stacks via their own class. That's all great but they mention that matrix stacks were deprecated. Why were they deprecated and what do people use instead of them?
The reason(s) are political, not technical, and date back to the early 2000s.
OpenGL 3 was the first ever version willing to break backwards compatibility. The designers wanted to create an API for the expert users, the game programmers and high end visualization coders who knew all about shaders and wrote their own matrix code. The intent was that the OpenGL 3 API should match the actual hardware quite closely. (Even in OpenGL 1/2, the matrix stack was usually implemented on the CPU side, not the GPU.)
From a game engine programmer point of view, this was better. And hey, if you have to develop a new game engine every couple of years anyway, what's the big deal about throwing away the old code?
The result of this design process is the OpenGL 3/4 core profile.
Once the "new generation" OpenGL was announced, all the not-so-expert coders in universities and companies realized they would be screwed. These are the people (like me) who teach 3D graphics or write utility programs for research or design. We don't need any more advanced lighting than plain ambient-diffuse-specular. We often have to mix code from different sources together, and that is only easy if everyone is using exactly the same matrix, lighting, and texturing conventions - like those supplied by OpenGL 2.
Also, I've heard but cannot verify, the big CAD/CAM companies realized that they'd be screwed as well. Throwing away two million lines of code from ten years of development is not an option when you've got paying (and well-paying: compare prices for Quadro vs GeForce, or FireGL vs Radeon) customers.
So both NVIDIA and ATI announced they'd support the old API for as long as they could.
The result of this pressure is the compatibility profiles. And the OpenGL ARB now seems to have realized that while they'd like everyone to switch to core profile it just isn't going to happen: read the extension spec for tessellation shaders in OpenGL 4 and it mentions that GL_PATCHES will work with glBegin.
Matrix Stack (and the rest of matrix functions) were deprecated only in the core profile. In the Compatibility profile you should still be able to use them.
From my point of view it was removed because most of engines/frameworks have custom Math code and shader uniform style for sending matrices to shaders.
Although for simple programs/tutorials it is very inconvenient to use and search for something else.
I suggest using:
glm (http://glm.g-truc.net/)
very simple math lib (vsml)
Why were they deprecated
Because nobody actually used it in real world OpenGL programs. Take a physics simulation for example: You'd have all the object placement being stored in the physics system as a 4×4 matrix anyway. So you'd just use that. Same goes for visible object determination and animation systems. All those need to implement the matrix math anyway, so having this in OpenGL is rather redundant, as most of the time the already existing matrices were simply put into glLoadMatrix.
and what do people use instead of them?
What they used before: Their animation systems, physics simulators, scene graphs, etc.
Well the first and main reason, for me, is that with the rise of programmable shaders (being mandatory after the 3rd version of opengl), all the variables such as GL_PROJECTION and GL_MODELVIEW that were automatically transferred to the shaders are being deleted from the shaders, so the user has to define its own matrix to use it in the shader. Since you have to send the matrix manually using the Uniform functions, you don't really need fixed variables anymore.
Why do people tend to mix deprecated fixed-function pipeline features like the matrix stack, gluPerspective(), glMatrixMode() and what not when this is meant to be done manually and shoved into GLSL as a uniform.
Are there any benefits to this approach?
There is a legitimate reason to do this, in terms of user sanity. Fixed-function matrices (and other fixed-function state tracked in GLSL) are global state, shared among all uniforms. If you want to change the projection matrix in every shader, you can do that by simply changing it in one place.
Doing this in GLSL without fixed function requires the use of uniform buffers. Either that, or you have to build some system that will farm state information to every shader that you want to use. The latter is perfectly doable, but a huge hassle. The former is relatively new, only introduced in 2009, and it requires DX10-class hardware.
It's much simpler to just use fixed-function and GLSL state tracking.
No benefits as far as I'm aware of (unless you consider not having to recode the functionality a benefit).
Most likely just laziness, or a lack of knowledge of the alternative method.
Essentially because those applications requires shaders to run, but programmers are too lazy/stressed to re-implement those features that are already available using OpenGL compatibility profile.
Notable features that are "difficult" to replace are the line width (greater than 1), the line stipple and separate front and back polygon mode.
Most tutorials teach deprecated OpenGL, so maybe people don't know better.
The benefit is that you are using well-known, thoroughly tested and reliable code. If it's for MS Windows or Linux proprietary drivers, written by the people who built your GPU and therefore can be assumed to know how to make it really fast.
An additional benefit for group projects is that There Is Only One Way To Do It. No arguments about whether you should be writing your own C++ matrix class and what it should be called and which operators to overload and whether the internal implementation should be a 1D or 2D arrary...
In newer OpenGL specifications, matrix manipulation functions are removed. You need to calculate the transformation matrices by hand and pass them to the shaders. Although glRotate, glScale, etc. disappeared, I didn't see anything in exchange...
My question:
how do you handle the transformations? Do you dig the theory and implement all by hand, or use some predefined libraries? Is there any "official" OpenGL solution?
For example, datenwolf points to his hand made C library in this post. For Java users (Android) there is AffineTransform class, but it applies to 3x3 matrices, so it needs an extra effort to apply it to OpenGL mat4
What is your solution?
how do you handle the transformations? Do you dig the theory and implement all by hand, or use some predefined libraries?
Either way goes. But the thing is: In a real program that deals with 3D geometry you need those transformation matrices for a lot more than just rendering stuff. Say you have some kind of physics simulation running. The position of rigid objects is usually represented by their transformation matrix. So if doing a physics sim, you've got that transformation matrix lying around somewhere anyway, so you just use that.
In fully integrated simulation engines you'll also want to avoid redundancies, so you take some physics simulation library like ODE, Bullet or so and modify it in a way that it can work directly on your object representing structures without copying the data into library specific records for procressing and then back.
So you usually end up with some mixture. Some of the math comes in preexisting libraries, others you implement yourself.
I agree with datenwolf, but to give an example I use Eigen, which is a fantastic general purpose matrix math library.
above glsl 3.0 the glTraslate(),glRotate(),fTransform() etc. functions are deprecated.. but still can be use.
one better way is to use some math library like GLM http://glm.g-truc.net/ which is compatible with the glsl specifications.
The projection matrix, model matrix and view matrix are passed to the shader as uniform variables.
I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.