Can I use OpenGL for general purpose matrix multiplication? - opengl

The MultMatrix appears to only multiply 4x4 matrices, which makes sense for OpenGL purposes, but I was wondering if a more general matrix multiplication function existed within OpenGL.

No, as can be easily verified by looking at the documentation, including the GL Shader Language. The largest matrix data type is a 4x4.
It is very true there is a whole craft of of getting GPUs to do more general purpose math, including string and text manipulation for e.g. cryptographic purposes, by using OpenGL primitives in very tricky ways. However you asked for a general purpose matrix multiplication function.
OpenCL is a somewhat different story. It doesn't have a multiply primitive, but it's designed for general numeric computation, so examples and libraries are very common.
You can also easily code a general matrix multiply for NVIDIA processors in CUDA. Their tutorials include the design of such a routine.

A lot of people think, that legacy OpenGL's (up to OpenGL-2.1) matrix multiplication would be in some way faster. This is not the case. The fixed function pipeline matrix manipulation functions are all executed on the CPU and only update the GPU matrix register on demand before a drawing call.
There's no benefit in using OpenGL for doing matrix math multiplication. If you want do to GPGPU computing you must do this using either OpenCL or compute shaders and to actually benefit from it, it must be applied to a very well parallelized problem.

Related

GPGPU for 3d math

I am reading a lot about gpgpu and I am currently learning OpenGL. Now that I have to write all math by myself (or use an existing 3rd party library) I had the idea of using the gpu instead of the cpu for creating my own math library. (matrices vectors etc)
But I didn't found any 3d math library which utilizes the gpu.
Is there a specific reason?
Maybe the CPU is better at those tasks?
It depends on how many vectors or matrices you want to work on at a time, and whether you want to draw the results or not.
GLSL (OpenGL Shading Language) already has a maths library built in. It has functions and operators for matrix maths, transpose, inverse; vector dot and cross products; multiplying a vector by a matrix, etc.
When you're drawing geometry or whatever with OpenGL, you use these built-in functions in your shaders on the GPU. No point in a 3d math library replicating what is already there.
If you want to do small scale vector/matrix maths without drawing anything, for instance a ray - plane intersection test, then the CPU is better. Copying the values to the GPU and copying the result back would take much longer than just doing the math on the CPU. (Even if the GPU were actually faster - typical speeds today are 2Ghz+ for CPU, < 1Ghz for GPU.) This is why math libraries just use the CPU.
If you want to do "industrial scale" matrix/vector math without drawing, then yes it is worth considering the GPU. (This is why CUDA and OpenCL exist.) With a modern version of OpenGL that supports transform feedback and texture buffer objects (usually V3+) you can do maths on hundreds to thousands of matrices/vectors on the GPU, and OpenGL 4.3 makes it even easier with compute shaders. It isn't quite as convenient or efficient as CUDA/OpenCL, but if you already know OpenGL it is much easier.
Hope this helps.
Look for CUDA thrust as a starting point. I think GPU's will be good for this task. SIMD on CPU's can be something to look into as well but will not give as much parallelism as you'd be hoping for .
You can try arrayfire. It supports up to 4 dimensions and has a lot of support for commonly used functions. Currently only cuda is supported, but opencl support will be added shortly with the same interface (I work at Accelereyes, so I know this).
What kind of operations do you want to do? You can use the OpenCL built-in float4 and its default operators (+,-,*,/, dot, sqrt) for Vector3 or Vector4. You can easily extend this with Quaternions and Matrices, that's what we did.
See http://github.com/erwincoumans/experiments
The code can help you learning OpenCL and also OpenGL and OpenCL-OpenGL interop.
My github repository contains simple 3d math functions for quaternions, 3d vectors and 3x3 matrices for the OpenCL version of our 3D Bullet game physics library. It also has a fast radix sort, prefix scan, collision detection algorithms and rigid body dynamics, 100% running on GPU. It runs on NVIDIA, AMD,Intel Windows & Mac OSX.
https://github.com/erwincoumans/experiments/blob/master/opencl/primitives/AdlPrimitives/Math/MathCL.h

How do you calculate transformation matrix for shader in OpenGL

In newer OpenGL specifications, matrix manipulation functions are removed. You need to calculate the transformation matrices by hand and pass them to the shaders. Although glRotate, glScale, etc. disappeared, I didn't see anything in exchange...
My question:
how do you handle the transformations? Do you dig the theory and implement all by hand, or use some predefined libraries? Is there any "official" OpenGL solution?
For example, datenwolf points to his hand made C library in this post. For Java users (Android) there is AffineTransform class, but it applies to 3x3 matrices, so it needs an extra effort to apply it to OpenGL mat4
What is your solution?
how do you handle the transformations? Do you dig the theory and implement all by hand, or use some predefined libraries?
Either way goes. But the thing is: In a real program that deals with 3D geometry you need those transformation matrices for a lot more than just rendering stuff. Say you have some kind of physics simulation running. The position of rigid objects is usually represented by their transformation matrix. So if doing a physics sim, you've got that transformation matrix lying around somewhere anyway, so you just use that.
In fully integrated simulation engines you'll also want to avoid redundancies, so you take some physics simulation library like ODE, Bullet or so and modify it in a way that it can work directly on your object representing structures without copying the data into library specific records for procressing and then back.
So you usually end up with some mixture. Some of the math comes in preexisting libraries, others you implement yourself.
I agree with datenwolf, but to give an example I use Eigen, which is a fantastic general purpose matrix math library.
above glsl 3.0 the glTraslate(),glRotate(),fTransform() etc. functions are deprecated.. but still can be use.
one better way is to use some math library like GLM http://glm.g-truc.net/ which is compatible with the glsl specifications.
The projection matrix, model matrix and view matrix are passed to the shader as uniform variables.

Why is it better to explicitly manage matrices in OpenGL?

Recently I've been messing around a fair amount with OpenGL, and I have come across the split between allowing OpenGL to manage the view/model/projection matrices or managing them yourself, either with your own matrix implementation or a library such as GLM. I've seen that a lot of large projects have their own camera management (i.e. manage their own translations, rotations etc.). I can see why it would help for making sure you have full control of the system, but besides this it seems like a lot of work for a marginal gain.
Why is it better to do your own management than to use the built-in OpenGL functions? Obviously this is in the context of a shader pipeline, not the fixed function default.
(This would apply to any 3D library).
(As an aside, OpenGL ES 2 has no transform management facility, so in some cases you have no choice.)
More on point, I've found managing matrices via OpenGL's built-in matrix stacks to be a real pain at times, forcing me to push and pop rather copiously in the more intricate portions of my rendering code, even reordering the rendering at times just to simplify stack management. I also wrote a C++ pusher-popper class that uses RAII to automatically manage all this, but it requires careful scoping of local variables.
When I switched to ES 2, I was dismayed to learn that all that functionality was gone. However, I found that switching to my own matrices actually simplified my code, because I could work with multiple transforms using a combination of local and member variables (with meaningful names) without getting lost in space, and the transform stack was replaced mainly by using the call stack — i.e., the current transform is a just local matrix variable that gets passed as a parent transform parameter to the next function down — but with the flexibility to do it differently at other times.
It is better for a large list of reasons. Apple's recent presentation on the OpenGL improvements in OSX Lion says it best: the newer OpenGL specs (primarily 3.2 on up) focus better on representing what the GPU is actually doing. In OpenGL 2.1, all of the matrix operations take place on the CPU. So, not only is there no magical accelerated benefit to using GL's matrices, you are locked into a completely arbitrary model of matrix management: projection & model-view matrices only (for vertices), matrix stack size limits, a limited set of matrix operations, etc.
When you start managing your own matrices, you start to see why it is so much better. As your scenes grow more complex, you start seeing the need for more matrix caches (beyond just "projection" and "model view"). You discover opportunities to build more useful matrix functions. For instance, which sounds more pleasant to use? glRotatef(90.0f, 1.0f, 0.0f, 0.0f); or matrix.rotateX(90.0f); ? It always bothered me that I had to specify the axis of rotation every single time!
As you start to recognize the divide between CPU operations and GPU operations, you will come to appreciate managing your own matrices.
The GL-managed matrix stack is deprecated in recent revs. of the OpenGL spec. So going forward managing them yourself is the only option.

Large matrix inversion methods

Hi I've been doing some research about matrix inversion (linear algebra) and I wanted to use C++ template programming for the algorithm , what i found out is that there are number of methods like: Gauss-Jordan Elimination or LU Decomposition and I found the function LU_factorize (c++ boost library)
I want to know if there are other methods , which one is better (advantages/disadvantages) , from a perspective of programmers or mathematicians ?
If there are no other faster methods is there already a (matrix) inversion function in the boost library ? , because i've searched alot and didn't find any.
As you mention, the standard approach is to perform a LU factorization and then solve for the identity. This can be implemented using the LAPACK library, for example, with dgetrf (factor) and dgetri (compute inverse). Most other linear algebra libraries have roughly equivalent functions.
There are some slower methods that degrade more gracefully when the matrix is singular or nearly singular, and are used for that reason. For example, the Moore-Penrose pseudoinverse is equal to the inverse if the matrix is invertible, and often useful even if the matrix is not invertible; it can be calculated using a Singular Value Decomposition.
I'd suggest you to take a look at Eigen source code.
Please Google or Wikipedia for the buzzwords below.
First, make sure you really want the inverse. Solving a system does not require inverting a matrix. Matrix inversion can be performed by solving n systems, with unit basis vectors as right hand sides. So I'll focus on solving systems, because it is usually what you want.
It depends on what "large" means. Methods based on decomposition must generally store the entire matrix. Once you have decomposed the matrix, you can solve for multiple right hand sides at once (and thus invert the matrix easily). I won't discuss here factorization methods, as you're likely to know them already.
Please note that when a matrix is large, its condition number is very likely to be close to zero, which means that the matrix is "numerically non-invertible". Remedy: Preconditionning. Check wikipedia for this. The article is well written.
If the matrix is large, you don't want to store it. If it has a lot of zeros, it is a sparse matrix. Either it has structure (eg. band diagonal, block matrix, ...), and you have specialized methods for solving systems involving such matrices, or it has not.
When you're faced with a sparse matrix with no obvious structure, or with a matrix you don't want to store, you must use iterative methods. They only involve matrix-vector multiplications, which don't require a particular form of storage: you can compute the coefficients when you need them, or store non-zero coefficients the way you want, etc.
The methods are:
For symmetric definite positive matrices: conjugate gradient method. In short, solving Ax = b amounts to minimize 1/2 x^T A x - x^T b.
Biconjugate gradient method for general matrices. Unstable though.
Minimum residual methods, or best, GMRES. Please check the wikipedia articles for details. You may want to experiment with the number of iterations before restarting the algorithm.
And finally, you can perform some sort of factorization with sparse matrices, with specially designed algorithms to minimize the number of non-zero elements to store.
depending on the how large the matrix actually is, you probably need to keep only a small subset of the columns in memory at any given time. This might require overriding the low-level write and read operations to the matrix elements, which i'm not sure if Eigen, an otherwise pretty decent library, will allow you to.
For These very narrow cases where the matrix is really big, There is StlXXL library designed for memory access to arrays that are mostly stored in disk
EDIT To be more precise, if you have a matrix that does not fix in the available RAM, the preferred approach is to do blockwise inversion. The matrix is split recursively until each matrix does fit in RAM (this is a tuning parameter of the algorithm of course). The tricky part here is to avoid starving the CPU of matrices to invert while they are pulled in and out of disk. This might require to investigate in appropiate parallel filesystems, since even with StlXXL, this is likely to be the main bottleneck. Although, let me repeat the mantra; Premature optimization is the root of all programming evil. This evil can only be banished with the cleansing ritual of Coding, Execute and Profile
You might want to use a C++ wrapper around LAPACK. The LAPACK is very mature code: well-tested, optimized, etc.
One such wrapper is the Intel Math Kernel Library.

Modifying an image with OpenGL?

I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.