I know OpenCL gives control of the GPU's memory architecture and thus allows better optimization, but, leaving this aside, can we use Compute Shaders for vector operations (addition, multiplication, inversion, etc.)?
In contrast to the other OpenGL shader types, compute shaders are not directly related to computer graphics and provide a much more direct abstraction of the underlying hardware, similar to CUDA and OpenCL. It provides customizable work group size, shared memory, intra-group synchronization and all those things known and loved from CUDA and OpenCL.
The main differences are basically:
It uses GLSL instead of OpenCL C. While there isn't such a huge difference bewteen those programming languages, you can however use all the graphics-related GLSL functions not available to OpenCL, like advanced texture types (e.g. cube map arrays), advanced filtering (e.g. mipmapping, well Ok, you will probably need to compute the mip-level yourself), and little convenience things like 4x4 matrices or geometric functions.
It is an OpenGL shader program like any other GLSL shader. This means accessing OpenGL data (like buffers, textures, images) is just trivial, while interfacing between OpenGL and OpenCL/CUDA can get tedious, with possible manual synchronization effort from your side. In the same way integrating it into an existing OpenGL workflow is also trivial, while setting up OpenCL is a book on its own, not to speak of its integration into an existing graphics pipeline.
So what this comes down to is, that compute shaders are really intended for use within existing OpenGL applications, though exhibiting the usual (OpenCL/CUDA-like) compute-approach to GPU programming, in contrast to the graphics-approach of the other shader stages, which didn't have the compute-flexibility of OpenCL/CUDA (while offering other advantages, of course). So doing compute tasks is more flexible, direct and easy than either squeezing them into other shader stages not intended for general computing or introducing an additional computing framework you have to synchronize with.
Compute shaders should be able to do nearly anything achievable with OpenCL with the same flexibility and control over hardware resources and with the same programming approach. So if you have a good GPU-suitable algorithm (that would work well with CUDA or OpenCL) for the task you want to do, then yes, you can also do it with compute shaders, too. But it wouldn't make that much sense to use OpenGL (which still is and will probably always be a framework for real-time computer graphics in the first place) only because of compute shaders. For this you can just use OpenCL or CUDA. The real strength of compute shaders comes into play when mixing graphics and compute capabilities.
Look here for another perspective.
Summarizing:
Yes, OpenCL already existed, but it targets heavyweight applications (think CFD, FEM, etc), and it is much more universal than OpenGL (think beyond GPUs... Intel's Xeon Phi architecture supports >50 x86 cores).
Also, sharing buffers between OpenGL/CUDA and OpenCL is not fun.
Related
The OpenGL Graphics Pipeline is changing every year. So the programmable Pipelines are growing. At the end, as an opengl Programmer we create many little programms (Vertex, Fragment, Geometry, Tessellation, ..)
Why is there such a high specialization between the stages? Are they all running on a different part of the hardware? Why not just writing one code-block to describe what should be come out at the end instead of juggling between the stages?
http://www.g-truc.net/doc/OpenGL%204.3%20Pipeline%20Map.pdf
In this Pipeline PDF we see the beast.
In the days of "Quake" (the game), developers had the freedom to do anything with their CPU rendering implementations, they were in control of everything in the "pipeline".
With the introduction of fixed pipeline and GPUs, you get "better" performance, but lose a lot of the freedom. Graphics developers are pushing to get that freedom back. Hence, more customization pipeline everyday. GPUs are even "fully" programmable now using tech such as CUDA/OpenCL, even if it's not strictly about graphics.
On the other hand, GPU vendors cannot replace the whole pipeline with fully programmable one overnight. In my opinion, this boils down to multiple reasons;
GPU capabilities and cost; GPUs evolve with each iteration, it's
nonsense to throw away all the architecture you have and replace it
overnight, instead you add new features and enhancements every iteration, especially
when developers ask for it (example: Tessellation stage). Think of CPUs, Intel tried to replace the x86 architecture with Itanium, losing backward compatibility, having failed, they eventually copied what AMD did with AMDx64 architecture.
They also can't fully replace it due to legacy applications support, which are more widely used than someone might expect.
Historically, there were actually different processing units for the different programmable parts - there were Vertex Shader processors and Fragment Shader processors, for example. Nowadays, GPUs employ a "unified shader architecture" where all types of shaders are executed on the same processing units. That's why non-graphic use of GPUs such as CUDA or OpenCL is possible (or at least easy).
Notice that the different shaders have different inputs/outputs - a vertex shader is executed for each vertex, a geometry shader for each primitive, a fragment shader for each fragment. I don't think this could be easily captured in one big block of code.
And last but definitely far from least, performance. There are still fixed-function stages between the programmable parts (such as rasterisation). And for some of these, it's simply impossible to make them programmable (or callable outside of their specific time in the pipeline) without reducing performance to a crawl.
Because each stage has a different purpose
Vertex is to transform the points to where they should be on the screen
Fragment is for each fragment (read: pixel of the triangles) and applying lighting and color
Geometry and Tessellation both do things the classic vertex and fragment shaders cannot (replacing the drawn primitives with other primitives) and are both optional.
If you look carefully at that PDF you'll see different inputs and outputs for each shader/
Separating each shader stage also allows you to mix and match shaders beginning with OpenGL 4.1. For example, you can use one vertex shader with multiple different fragment shaders, and swap out the fragment shaders as needed. Doing that when shaders are specified as a single code block would be tricky, if not impossible.
More info on the feature: http://www.opengl.org/wiki/GLSL_Object#Program_separation
Mostly because nobody wants to re-invent the wheel if they do not have to.
Many of the specialized things that are still fixed-function would simply make life more difficult for developers if they had to be programmed from scratch to draw a single triangle. Rasterization, for instance, would truly suck if you had to implement primitive coverage yourself or handle attribute interpolation. It might add some novel flexibility, but the vast majority of software does not require that flexibility and developers benefit tremendously from never thinking about this sort of stuff unless they have some specialized application in mind.
Truth be told, you can implement the entire graphics pipeline yourself using compute shaders if you are so inclined. Performance generally will not be competitive with pushing vertices through the traditional render pipeline and the amount of work necessary would be quite daunting, but it is doable on existing hardware. Realistically, this approach does not offer a lot of benefits for rasterized graphics, but implementing a ray-tracing based pipeline using compute shaders could be a worthwhile use of time.
I have used Opengl for a semester, but in a traditional way, like: glBegin...glEnd.
I heard someone said the GLSL is the future of OpenGL, I was just wondering do I need jump into GLSL instead of the traditional OpenGL?
Moreover, whether GLSL only works well for good GPU?
Short answer: Yes, you do need to update your OpenGL usage as you will generally get lousy performance from glBegin/glEnd and limit what you can do by constraining yourself to the old fixed pipe behavior.
Long answer:
You're mixing up two different problems. One of them is immediate mode (glBegin glVertex ... glEnd, etc.) vs. batched mode (glVertexPointer, etc.). To get full performance out of modern GPUs you need to used batches. See this SO discussion: When are VBOs faster than "simple" OpenGL primitives (glBegin())?.
The other one is fixed pipe vs. programmable shaders (glEnable states, etc.. vs. GLSL). This can be a performance issue in many cases, but more importantly it's a flexibility issue. With GLSL you have far more control over how things are rendered, so you can accomplish things that weren't really possible using the fixed pipe -- at least not at a usable frame rate. Programmable shaders are also a better reflection of how modern GPUs really work -- in fact if you use the fixed pipe it is probably just being emulated with a shader under the hood.
GLSL is not the future of OpenGL, it's the current way of programming. As Aeluned states, glBegin and glEnd are deprecated (and not even supported in OpenGL ES.)
And what do you mean by good GPU? Even Intel integrated graphic cards support shaders, using GLSL is not slower just for being GLSL. You might get a slow performance when doing heavy stuff, but if you implement the fixed pipeline I think you will get the same performance.
I'd say learning GLSL is the way to go.
I am reading a lot about gpgpu and I am currently learning OpenGL. Now that I have to write all math by myself (or use an existing 3rd party library) I had the idea of using the gpu instead of the cpu for creating my own math library. (matrices vectors etc)
But I didn't found any 3d math library which utilizes the gpu.
Is there a specific reason?
Maybe the CPU is better at those tasks?
It depends on how many vectors or matrices you want to work on at a time, and whether you want to draw the results or not.
GLSL (OpenGL Shading Language) already has a maths library built in. It has functions and operators for matrix maths, transpose, inverse; vector dot and cross products; multiplying a vector by a matrix, etc.
When you're drawing geometry or whatever with OpenGL, you use these built-in functions in your shaders on the GPU. No point in a 3d math library replicating what is already there.
If you want to do small scale vector/matrix maths without drawing anything, for instance a ray - plane intersection test, then the CPU is better. Copying the values to the GPU and copying the result back would take much longer than just doing the math on the CPU. (Even if the GPU were actually faster - typical speeds today are 2Ghz+ for CPU, < 1Ghz for GPU.) This is why math libraries just use the CPU.
If you want to do "industrial scale" matrix/vector math without drawing, then yes it is worth considering the GPU. (This is why CUDA and OpenCL exist.) With a modern version of OpenGL that supports transform feedback and texture buffer objects (usually V3+) you can do maths on hundreds to thousands of matrices/vectors on the GPU, and OpenGL 4.3 makes it even easier with compute shaders. It isn't quite as convenient or efficient as CUDA/OpenCL, but if you already know OpenGL it is much easier.
Hope this helps.
Look for CUDA thrust as a starting point. I think GPU's will be good for this task. SIMD on CPU's can be something to look into as well but will not give as much parallelism as you'd be hoping for .
You can try arrayfire. It supports up to 4 dimensions and has a lot of support for commonly used functions. Currently only cuda is supported, but opencl support will be added shortly with the same interface (I work at Accelereyes, so I know this).
What kind of operations do you want to do? You can use the OpenCL built-in float4 and its default operators (+,-,*,/, dot, sqrt) for Vector3 or Vector4. You can easily extend this with Quaternions and Matrices, that's what we did.
See http://github.com/erwincoumans/experiments
The code can help you learning OpenCL and also OpenGL and OpenCL-OpenGL interop.
My github repository contains simple 3d math functions for quaternions, 3d vectors and 3x3 matrices for the OpenCL version of our 3D Bullet game physics library. It also has a fast radix sort, prefix scan, collision detection algorithms and rigid body dynamics, 100% running on GPU. It runs on NVIDIA, AMD,Intel Windows & Mac OSX.
https://github.com/erwincoumans/experiments/blob/master/opencl/primitives/AdlPrimitives/Math/MathCL.h
I'm just learning about them, and find it discouraging that they have been deprecated. Should I keep investing into learning them? Would I learn something useful for the current model?
I think, though I may be wrong, that since most high-performance graphics apps (mostly games) pretty much only used vertex buffers and the like (in order to squeeze every drop of performance out of the card), that there was pressure to stop worrying about "frivolous" items such as display lists (and even good-old glVertex calls). IMHO, this provides a huge barrier to people learning to write OpenGL code, and (for my own purposes) is a big impediment to whipping up some quick, legible, and reasonably well performing code.
Note that these features were deprecated in 3.0, and actually removed in 3.1 (but still provided compatibility via an ARB extension). In OpenGL 3.2, they moved these features into a 'compatibility' profile that is optional for driver writers to implement.
So what does this mean? NVidia, at least, has vowed to continue support for the old-school compatibility mode for the forseeable future - there is a large wealth of legacy code out there that they need to support. You can find the discussion of their support in a slideshow at:
http://www.slideshare.net/Mark_Kilgard/opengl-32-and-more
starting at about slide #32. I don't know ATI/AMD's stance on this, but I would assume that it would be similar.
So, while display lists are technically removed from the required portion of the OpenGL 3.2 standard, I think that you are safe using them for quite a while. Eventually, you may wish to learn the buffer/shader-centric interface to OpenGL, especially if your end-goal is envelope-pushing game writing, but it really is a lot less intuitive (no glRotate, even!), so I would recommend starting with good old OpenGL 2.x.
-matt
Display lists were removed, because with opengl 3+ all vertex, texture and lighting data are stored on the graphics card, in what is called retained mode rendering (the data is retained, allowing you to send a single command to the card to draw a mesh, rather than sending vertex data to the card every frame). A major bottleneck in computer graphics is data bandwidth between RAM and gpuRAM. by generating meshes once, and retaining that data, we can transform it using homogeneous transform matrices, and draw it easily. This effectively reduces the bottleneck, with the drawback of longer loading times.
Immediate mode, however (pre 3.0) uses massive amounts of graphics bandwidth to send vertex data every frame, pre-transformed, with recalculated normals etc.
The problems with this approach are twofold:
1) excessive bandwidth use, and too much gpu idle time.
2) Excessive use of cpu time for calculations that could be done in parallel on 100+ cores, on the gpu
The simple solution to this, is retained mode.
With retained mode, display lists are no longer necessary. Hence their removal from the core profile.
Immediate mode is still very good for learning the theory of computer graphics. (and it's loads of fun, to boot) It just suffers in terms of maximum possible performance.
VBOs & VAOs may be, at first, less intuitive, but in terms of speed, it is far superior.
There are several easy to understand opengl 3.0 tutorials on the internet. Once you have openGL 2.0 down, you should consider moving on to 3.0+, as it allows you to build very fast 3d graphics applications.
While Matthew Hall has a good answer and covers most things, there are a few things I'll add.
If you look at what's been deprecated, you'll see it's a lot of client side and fixed functionality. So it's obvious that they're trying to move people away from client side centered code and have people do everything possible server side on the GPU instead.
When it comes to which context to use, well, that's up to you. Though if performance is a major concerned then 3.x is probably the way to go. I personally definitely want to learn opengl 3.x, but I doubt I'll be giving up 1.x/2.x. It's just so much easier to put together a quick app with what's available in a 1.x or 2.x context.
If you want a list of what's been deprecated, download the 3.0 specification and look under "The Deprecation Model"
A note from the future: latest Direct-X, Metal, and Vulkan apis have command buffers and command queues, which allow to record commands in the CPU, then sent them to the GPU to execute them there. Thus, perhaps, display lists was not a so old-fashioned idea. In fact, compiling display list is something orthogonal to the use of shaders and VBOs, and display lists can improve performance further....I wonder if a Vulkan or Metal to OpenGL translator could use display lists for command buffers....
Because VBOs (vertex buffer objects) are much more efficient and can do everything display lists can do. They're not really any more complex, either, just a little different. Unless you're already more familiar with the old style glBegin/glEnd stuff, you're probably best off learning about buffers from the get go.
I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.