Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I read that a lot of raytracers use CUDA or OpenCL. However, I don't know why modern( version 4.0+) OpenGL is not used.
I know that CUDA and OpenCL have more features, I think they are closer to the hardware, but... is this really useful for this purpose? If so, why?
OpenGL design is all about getting points, lines or triangles rasterized to a frame buffer. Shaders are used to control this process, but ultimately its just rasterization. When rasterizing you take each triangle, one by one, determine where it is going to be in the framebuffer and then manipulate those specific pixels.
Raytracing or Path tracing is something entirely different. Instead of starting out with the triangles determining which pixels to touch, you start out with the pixels and for each pixel trace into the scene which geometry is relevant for the pixel. I.e. it's kind of complementary what OpenGL does. Hence trying to fit this into OpenGL is kind of barking up the wrong tree. You need to work with completely different data structures, your programs are structured differently than shaders. OpenCL and CUDA are much better suited for programming ray or path tracing algorithms.
Both OpenGL and more generic parallel computing mechanisms such as OpenCL or CUDA can be used to implement all kinds of ray-casting, including Raytracing. In fact, if you go to Shadertoy you'll find a great many of the shaders there produce interesting 3D scenes and effects using only a single fragment shader, though mostly using ray marching rather than ray tracing.
With any kind of ray casting, you typically perform a lot of computation for every single pixel you're rendering. Since no pixel depends on the output of any other pixel, and since the algorithm for each pixel is the same, this kind of problem is ideal for parallel processing, which is at the heart of what OpenCL is designed for. You can also use OpenGL for parallel computing. The main advantage for OpenGL here is that if you're attempting to render something in real-time, then you want the results of the computation to stay on the video card, resulting in display on an output device.
On the other hand, if you're doing non-realtime rendering, then OpenCL probably has more functionality and less overhead required to accomplish the task at hand.
In either case, the biggest problem is probably not the implementation of the renderer, but figuring out a way to express the scene description either directly in the rendering code, or load it from some scene descriptor file and encode it in some fashion that the rendering code can interpret within the OpenCL or OpenGL framework. You cannot, for instance, simply load some XML or JSON scene description file and pass it to an OpenCL Kernel / OpenGL fragment shader. Ultimately it ends up having to be expressed in terms of whatever kinds of structures and primitives you can express in the language you choose.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm reading through the OpenGL and OpenCL specification in order to find few informations about the memory model and how exactly they corresponds to each other.
I'm aware OpenGL and OpenCL use essentially the same memory model. What I struggle to understand however, given the namings don't map 1-to-1 (at least it seems to me in that way) is what exactly can be mapped to what (in terms of terminology) between the two.
Any reference would be appreciated.
Under the assumption we have the same GPU as device for both OpenCL and OpenGL the specific question/questions is/are:
How for example VBO actually maps to OpenCL? Does a VBO essentially correspond to a chunk of global memory in the OpenCL terminology?
What about OpenGL Texture object? my understanding is that this corresponds exactly to an image object in OpenCL, and they both map to texture memory.
What about Shared Storage Buffer Object (specifically in the context of Compute Shaders) what does this corresponds to?
Also, even on this site, I find few debates on which one is more performant (between OpenCL and OpenGL). It seems to me that OpenGL Compute Shaders for example should be preferred over OpenCL kernels only if the nature of the problem maps nicely to something graphics related, OpenCL instead is preferred if you have some heavily numerical which is not necessarily graphics related (such as a heavy simulation for example).
What I struggle a bit to understand is why this is the case given that the memory model, and resources, are essentially the same. Apart from experimenting I wonder what is the actual difference justifying the difference. With specific reference to Compute Shaders I'm aware they allow to implement any algorithm you would be able to implement in OpenCL with OpenGL, so why is there a performance difference then?
The kind of problem I'm thinking of would be some relatively heavy optimisation based on level 3 Blas routines (such as GEMM or GEMV).
How well would OpenGL and OpenCL scale for these kind of problems?
The reason I'm asking is because I struggle a bit to find relatively recent information and benchmarks that might answer the question.
How for example VBO actually maps to OpenCL
I have limited experience with OpenGL, but in my understanding, many OpenGL objects don't map to OpenCL objects at all. OpenGL in general works at much higher level of abstraction, and it does a whole lot of things in the background for you. OpenCL is significantly simpler, and more low level (which might also explain why OpenCL can sometimes be faster). There are chunks of memory (cl_mem), code (cl_kernel), you launch the kernels that work with the memory - that's pretty much it. There's no complicated internal state machine like in OpenGL.
With specific reference to Compute Shaders I'm aware they allow to implement any algorithm you would be able to implement in OpenCL with OpenGL
Actually, i think that might be incorrect. OpenCL allows you to do with pointers almost everything you can do in C (arithmetic, reinterpret-casting etc), while GLSL is much more limited (AFAIK).
what is the actual difference justifying the difference
One huge difference (again AFAIK) is the builtin library of math functions (like sin, cos, etc). OpenGL has them too, but in OpenCL they have guaranteed precision by the standard. This makes a huge difference for scientific applications, OTOH it means the OpenCL kernel can be significantly slower (because an implementation of sin() with high precision on full input range is much more code than some crappy implementation that just gives you reasonably precise values on some very limited input range).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
As I understand, there were no modules in early Qt versions, there were separate classes with different functions, including graphical. Opengl support was realized In qt 1.2. However, QPainter, QImage existed in early versions.
So, is it correct to say that these classes are native (in other words, classes, which were primordial); opengl classes - non-native (it is a separste branch, after all)?
I`d like to learn a further evolution of Qtopengl as non-native and alternative way for creating 2D graphics ih Qt, influence of this module on evolution of native methods (for creating 2D graphics).
So, is it correct to say that these classes are native?
No, it is not.
The reason for that is "native" would mean different things to different people. It is the matter of interpretation. See your other question how confused we got.
By now, I think you mean "non-opengl" 2/3D by native. That probably means software rasterization as opposed to be going through the display driver directly. So, still on the Qt level, but without the opengl classes in Qt.
Now, this is the point where we can come back to QImage and QPainter. Yes, QPainter is basically the initial generation for software rasterization from the times where GPUs were not so common and cheap as these days.
They are basically doing the rendering purely with software techniques. That is, it is more limited, but it worked without more expensive and less common hardwares around.
(Those were the times of Quake and other software products, fun times looking at it from today's perspective ...)
If by "native" you mean "hardware assisted", then the line isn't all so clear anymore.
Note that QPainter can use various paint engines to do the painting, so merely using a QPainter doesn't mean anything by itself.
If by "hardware assisted" one merely means using something more than legacy integer or floating point execution units of the CPU, then yes, the raster paint engine does use various SIMD/vectored operations where available. The raster paint engine is the engine used to paint on QImage, QPixmap and non-GL QWidget.
If by "hardware assistance" you mean "rendered by the graphics card hardware", then you need to use an OpenGL paint engine. It's used when you paint on a QGLWidget or in a QQuickPaintedItem. Of course the painting is still defined by the software - geometry setup and shaders are just code! This software runs on hardware that can execute it much faster than general purpose CPUs can.
Given that the fixed-function OpenGL pipeline is more-or-less a historical artifact these days, it's not incorrect to state that all of rendering in Qt is done using purely software techniques, but the software can run on a general-purpose CPU, or leverage SIMD/vector execution units on a general-purpose CPU, or can run on a GPU.
It should also be said that typical Windows drivers these days do not accelerate GDI/gdiplus drawing other than blits. Thus when doing 2D drawing using the raster engine, especially on older Windows versions like XP, Qt can be faster than platform-native 2D drawing.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have the need to replicate GPU tessellation on the CPU (ie get the same uvw coordinates on the CPU side as I will get on the GPU from the tessellator).
The reason for this is rather complicated, but simply put I have an algorithm that stores data per tessellation point, and to calculate it in the first place I need the uvw coordinates on the CPU.
I have googled a lot for the exact details of the tessellation pattern, but I only find very vague texts speaking about it in a general nature, the best one being this one: http://fgiesen.wordpress.com/2011/09/06/a-trip-through-the-graphics-pipeline-2011-part-12/
Is the reason for the lack of texts on this that it's vendor dependent, or have I simply not found the not found the right page?
I'm interested in texts both on the OpenGL and DX11 implementation, if they differ.
I was also very interested by tessellation, and specifically subdivision surfaces some time ago. This is a very complicated topic. It was researched from early 70th and it's still in research.
It isn't clear if you want to re-implement whole shader tessellation pipeline (which I think will take years for a single programmer) or just a single subdivision algorithm (or even algorith that isn't subdivision).
Anyway, there is some links about subdivision:
Theory
Typically we implement subdivision surfaces in tessellation shaders using Catmull–Clark subdivision surface algorithm. You can find some papers from original authors in Google. There is main one:
"Recursively generated B-spline surfaces on arbitrary topological meshes" (year 1978, PDF)
Closer to code
Check papers of those cool guys from Microsoft Research:
Charles Loop - (BTW, author of another subdiv algorithm), purely mathematical stuff
Hugues Hoppe - Look for "progressive meshes", much more close to software
Even more closer to code.
You can find some libraries on the web. When I searched a while ago, there was a dozen of libs, that implemented subdivision on CPU. I didn't look they much, because I was interested in GPU implementation. The keyword for search is "subdivision lib" =)
The most interesting one is Pixar's OpenSubdiv. Take a look at their code.
Also look at "NVIDIA Instanced Tessellation" Sample (DirectX, OpenGL). They've implemented tessellation pipelne in Vertex and Geometry shaders.
Hope it helps!
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I'm pretty new in GPU programming and I haven't heard of any other way (apart from CUDA and OpenCL) to use the video board, so I'm wondering if there are other ways to use it. Can anybody point me to some examples?
And one more question :D:
Do OpenGL and DirectX take advantage of the graphics card? If yes, can you please tell me how?
P.S: I use C++.
Edit: Thank you for your fast replies.
As far as I know, the graphics card is used for parallel processing (Same Instruction Multiple Data especially). If you mentioned it, I would also like to know whether there are also other kind of uses of the graphics card (especially for the older ones, since they don't support OpenCL and CUDA, I guess they can't do parallel processing).
Addional GPGPU Languages/Stanadards
Brook
OpenACC
OpenMP
And one more question :D: Do OpenGL and DirectX take advantage of the
graphics card? If yes, can you please tell me how?
Before the introduction of shaders, the rendering processes were implemented
sequentially by fixed-function circuits.After Shader Model was introduced, vertex shader processors and pixel shader processors were added to replace thefixed-function blocks.
Now each stream core is unified shader proccessor ,which can run Vertex/Geometry/Pixel Shader as well as Hull and Doman Shader ,itroduced in DirectX11 and aslo Compute Shaders -Kernels.Thus, the full capacity of the shaders can be used at all times,maximizing shader performance efficiencies.
I'm pretty new in GPU programming and I haven't heard of any other way (apart from CUDA and OpenCL) to use the video board
There's OpenCL and OpenMP. ATI also had some kind of API(I think), but I forgot what it is called.
Do OpenGL and DirectX take advantage of the graphics card?
Of course they do. DirectX in particular was designed to provide more-or-lesss easy access to hardware acceleration. You can't perform rendering on CPU with the same speed.
, I would also like to know whether there are also other kind of uses of the graphics card
If the card supports floating point textures, you can use fragment/pixel shaders in conjuction with those textures to perform some calculations on the GPU even if GPU doesn't support CUDA/OpenCL/whatever. I think technique was called GDGPU. DirectX sdk had "GPU cloth" example a while ago, and in NVidia OpenGL SDK there are "GDGPU fluid" and "GPU particles". Please note that trying to perform overly expensive computation (like performing exhaustive search on 4096x4096 texture for every pixel rendered) on the GPU this way can bluescreen the system on certain hardware.
OpenCL, OpenGL, DirectX and, to a certain extent, CUDA are just interface specifications which a hardware vendor can implement in a driver. These interfaces can then bus used to utilize whatever is driven by the driver. In most cases, this is a GPU for these interfaces. There are other interfaces (WDDM specifies more interfaces) to talk to a GPU but they all have certain use cases.
You might want to clarify what you mean by "using the grahpics card".
I have a device to acquire XRay images. Due to some technical constrains, the detector is made of heterogeneous pixel size and multiple tilted and partially overlapping tiles. The image is thus distorted. The detector geometry is known precisely.
I need a function converting these distorted images into a flat image with homogeneous pixel size. I have already done this by CPU, but I would like to give a try with OpenGL to use the GPU in a portable way.
I have no experience with OpenGL programming, and most of the information I could find on the web was useless for this use. How should I proceed ? How do I do this ?
Image size are 560x860 pixels and we have batches of 720 images to process. I'm on Ubuntu.
OpenGL is for rendering polygons. You might be able to do multiple passes and use shaders to get what you want but you are better off re-writing the algorithm in OpenCL. The bonus then would be you have something portable that will even use multi core CPUs if no graphics accelerator card is available.
Rather than OpenGL, this sounds like a CUDA, or more generally GPGPU problem.
If you have C or C++ code to do it already, CUDA should be little more than figuring out the types you want to use on the GPU and how the algorithm can be tiled.
If you want to do this with OpengGL, you'd normally do it by supplying the current data as a texture, and writing a fragment shader that processes that data, and set it up to render to a texture. Once the output texture is fully rendered, you can retrieve it back to the CPU and write it out as a file.
I'm afraid it's hard to do much more than a very general sketch of the overall flow without knowing more about what you're doing -- but if (as you said) you've already done this with CUDA, you apparently already have a pretty fair idea of most of the details.
At heart what you are asking here is "how can I use a GPU to solve this problem?"
Modern GPUs are essentially linear algebra engines, so your first step would be to define your problem as a matrix that transforms an input coordinate < x, y > to its output in homogenous space:
For example, you would represent a transformation of scaling x by ½, scaling y by 1.2, and translating up and left by two units as:
and you can work out analogous transforms for rotation, shear, etc, as well.
Once you've got your transform represented as a matrix-vector multiplication, all you need to do is load your source data into a texture, specify your transform as the projection matrix, and render it to the result. The GPU performs the multiplication per pixel. (You can also write shaders, etc, that do more complicated math, factor in multiple vectors and matrices and what-not, but this is the basic idea.)
That said, once you have got your problem expressed as a linear transform, you can make it run a lot faster on the CPU too by leveraging eg SIMD or one of the many linear algebra libraries out there. Unless you need real-time performance or have a truly immense amount of data to process, using CUDA/GL/shaders etc may be more trouble than it's strictly worth, as there's a bit of clumsy machinery involved in initializing the libraries, setting up render targets, learning the details of graphics development, etc.
Simply converting your inner loop from ad-hoc math to a well-optimized linear algebra subroutine may give you enough of a performance boost on the CPU that you're done right there.
You might find this tutorial useful (it's a bit old, but note that it does contain some OpenGL 2.x GLSL after the Cg section). I don't believe there are any shortcuts to image processing in GLSL, if that's what you're looking for... you do need to understand a lot of the 3D rasterization aspect and historical baggage to use it effectively, although once you do have a framework for inputs and outputs set up you can forget about that and play around with your own algorithms in shader code relatively easily.
Having being doing this sort of thing for years (initially using Direct3D shaders, but more recently with CUDA) I have to say that I entirely agree with the posts here recommending CUDA/OpenCL. It makes life much simpler, and generally runs faster. I'd have to be pretty desperate to go back to a graphics API implementation of non-graphics algorithms now.