Confusion on CUDA/openCL and C++ AMP - c++

I read that Microsoft is closely working with Nvidia to improve AMP performances.
But my question is: is AMP a CUDA-replace by Microsoft? Or does AMP use CUDA drivers when a NVIDIA CUDA video card is available? Is AMP an openCL substitute?
I'm still pretty confused..

C++ AMP is a library (and as part of it a key language extension was also introduced). Since C++ AMP is an open specification, it can be implemented on any other low level languages. Microsoft’s implementation builds on DirectCompute (and hence on HLSL), but that is completely hidden from you when you are using C++ AMP (which is why C++ AMP can be an open specification; it does not expose DirectX in the API surface). For more on C++ AMP, please follow the resources on the right of our blog (we’ll keep adding to that):
http://blogs.msdn.com/b/nativeconcurrency/
You made a statement about Microsoft working with NVIDIA to improve C++ AMP performance – that is not true. Microsoft has worked with NVIDA and AMD and other partners to create the C++ AMP open specification. Microsoft also work with hardware vendors to make sure that the hardware vendors have stable video card drivers, which are required for any GPU compute technology to work correctly.
You also expressed confusion and threw some terms out. OpenCL is an approach to GPU computing (by Khronos), as is DirectCompute (by Microsoft), as is CUDA (by NVIDIA). These are all separate technologies, each with its own path to the GPU (always via a driver of some sort), each with its own merits, strengths, and disadvantages. One does not replace the other, and one is not universally better than the other. You now also have C++ AMP in that mix, as one more choice, and the same statements apply to that. The choice is yours as to which you decide to use.

C++ AMP is a set of language extentions and APIs to support parallel programming technology including CUDA.
Since Microsoft also has a direct competitor to CUDA ( Direct Compute) and generally has preferred it's own proprietary graphics standards we will have to see what actually ever happens with it.
For Microsoft's view on it see these lectures

Related

Best way to leverage the GPU

I have a relatively small section of code that deals with huge datasets which I've already parallelized using openmp and am keen to increase performance further using the GPU. The program is C++, developed under VS2015, runs exclusively on Windows and will need to support 64 bit versions from 7 upwards on as wide a variety of GPUs as is feasible. Technologies I've been looking at so far include AMP, OpenCL, HLSL, and CUDA. Questions already asked, such as this with an informative answer by Ade Miller, make me question whether AMP is the way to go although it looks like the easiest option. I'm dismissing CUDA as it limits me in terms of hardware supported, and am tending towards OpenCL while currently working my way through the following book. As such, I've the following questions;
Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL? Reason I ask this is that the OpenCL.DLL downloaded with the latest version of the CUDA SDK is 1.9. I had to download the Intel SDK for OpenCL to get a 2.x version.
If I go with OpenCL, what do I have to distribute with my application (assuming OpenCL.DLL as a minimum) and are there any licensing issues? Are default drivers for most cards going to support OpenCL and if so which versions?
With respect to the above, am I actually better of with AMP, as it works with anything that has DirectX 11 or better?
(Apologies if the above is slightly off topic, if anyone believes that it is perhaps they could point me to a better forum to ask these questions)
Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
OpenCL seems to be most widely supported GPU computing platform. Supported by nVidia, AMD and Intel. Works on most mobile platforms as well. It is also large set of libraries available: ViennaCL, clBLast, clBlast, Boost-Compute and so on.
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL?
Yes, currently the safest is to stick with 1.2 - and actually it is more then enough.
All major desktop GPU vendors (Intel, AMD, nVidia) support at least OpenCL 1.2.
Actually only nVidia didn't released official 2.0 support - it is still in beta stage.
Also note that some older GPUs will support OpenCL 1.2 only as well.

Is it possible to write OpenCL kernels in C++ rather than C?

I understand there's an openCL C++ API, but I'm having trouble compiling my kernels... do the kernels have to be written in C? And then it's just the host code that's allowed to be written in C++? Or is there some way to write the kernels in C++ that I'm not finding? Specifically, I'm trying to compile my kernels using pyopencl, and it seems to be failing because it's compiling them as C code.
OpenCL C is a subset of C99.
There is also OpenCL C++ (OpenCL 2.1 and OpenCL 2.2 specs) which is a subset of C++14 but it's not implemented by any vendor yet (OpenCL 2.1 partially implemented by Intel but not C++ kernels).
Host code can be written in C,C++,python, etc.
In short you can read about OpenCL on wikipedia. There is a description about each OpenCL version. In pyopencl you can use OpenCL1.2 (as far as I'm aware there isn't support for OpenCL2.0 yet).
More details about OpenCL on Khronos website.
I would add SYCL on ComputeCpp from Codeplay. They have been very active at IWOCL.org promoting the use of single source C++ host and kernel code. SYCL has OpenCL execution model "under the hood". https://en.wikipedia.org/wiki/SYCL. Though Wikipedia has this statement about SYCL: "The open standards SYCL and OpenCL are similar to vendor-specific CUDA from Nvidia." Which cannot be any further from the intent of portable code (not performance portable) of SYCL and OpenCL.
You can find information, news, blogs, videos and resourcs on SYCL on the sycl.tech website.
For reference, there's also Boost.Compute. It doesn't help you with pyopencl, but it addresses many of the issues that pyopencl does, and has some metaprogramming magic that facilitates writing OpenCL kernels in C++.
This SO question (referenced in the Boost.Compute FAQ) also contains a nice discussion of some of the relevant design constraints that OpenCL poses to devs.
This is an old question, and the work to "solve" it has been ongoing for some time...
There is a community-driven C++ for OpenCL kernel language that is implemented by clang Clang C++ for OpenCL and there is a Khronos extension cl_ext_cxx_for_opencl that adds an online compilation of this language to OpenCL drivers too. Arm has just announced the support for this extension. Although it is also possible to compile kernels in this language offline using upstream tools into machine binary, SPIR-V, or any other IR and then load the precompiled code in OpenCL drivers without any extension.

GPU DirectX VS OpenGL support

As I understand GPU vendors defined standard interface to be used by OS Developers to communicate with their specific driver. So DirectX and OpenGL are just wrappers for that interface. When OS developers decide to create new version of Graphic API , GPU vendors expand their interface (new routines are faster and older ones are left for compatibility issues) and OS developers use this new part of interface.
So, when it is said that GPU vendors' support for DirectX is better than for OpenGL, does it simply mean that GPU vendors primarily take into account Microsoft's future plans of developing DirectX API structure and adjust future development of this interface to their needs? Or there is some technical reasons before this?
As I understand GPU vendors defined standard interface to be used by OS Developers to communicate with their specific driver. So DirectX and OpenGL are just wrappers for that interface.
No, not really. DirectX and OpenGL are just specifications that define APIs. But a specification is nothing more than a document, not software. The OpenGL API specification is controlled by Khronos, the DirectX API specification is controlled by Microsoft. Each OS then defines a so called ABI (Application Binary Interface) that specifies which system level APIs are supported by the OS (OpenGL and DirectX are system level APIs) and what rules an actual implementation must adhere to, when being run on the OS in question.
The actual OpenGL or Direct3D implementation happens in the hardware's drivers (and in fact the hardware itself is part of the implementation as well).
When OS developers decide to create new version of Graphic API , GPU vendors expand their interface
In fact it's the other way round: Most of the graphic APIs specifications are laid out by the graphics hardware vendors. After all they are close to where the rubber hits the road. In the case of Khronos the GPU makers are part of the controlling group of Khronos. In the case of DirectX the hardware makers submit drafts to and review the changes and suggestions made by Microsoft. But in the end each new APIs release reflects the common denominator of the capabilities of the next hardware generation in development.
So, when it is said that GPU vendors' support for DirectX is better than for OpenGL, does it simply mean that GPU vendors primarily take into account Microsoft's future plans of developing DirectX API structure and adjust future development of this interface to their needs?
No, it means that each GPU vendor implements his own version of OpenGL and the Direct3D backend, which is where all the magic happens. However OpenGL puts a lot of emphasis on backward compatibility and ease of transition to newer functionality. Direct3D development OTOH is quick in cutting the ties with earlier versions. This also means that full blown compatibility profile OpenGL implementations are quite complex beasts. That's also the reason why recent versions of OpenGL core profiles did (overdue) work in cutting down support for legacy features; this reduction of API complexity is also quite a liberating thing for developers. If you develop purely for a core profile it simplifies a lot of things; for example you no longer have to worry about a plethora of internal state when writing plugin.
Another factor is, that for Direct3D there's exactly one shader compiler, which is not part of the driver infrastructure / implementation itself, but gets run at program build time. OpenGL implementations however must implement their own GLSL shader compiler, which complicates things. IMHO the lack of a unified AST or immediate shader code is one of the major shortcomings of OpenGL.
There is not a 1:1 correspondence between the graphics hardware abstraction and graphics API like OpenGL and Direct3D. WDDM, which is Windows Vista's driver model defines things like common scheduling, memory management, etc. so that DirectX and OpenGL applications work interoperably, but very little of the design of DirectX, OpenGL or GPUs in general has to do with this. Think of it like the kernel, nobody creates a CPU specifically to run it, and you do not have to re-compile the kernel everytime a new iteration of a processor architecture comes out that adds a new subset of instructions.
Application developers and IHVs (GPU vendors, as you call them) are the ones who primarily deal with changes to GPU architecture. It may appear that the operating system has more to do with the equation than it actually does because Microsoft (more so) and Apple--who both maintain their own proprietary operating systems--are influential in the design of DirectX and OpenGL. These days OpenGL closely follows the development of commodity desktop GPU hardware, but this was not always the case - it contains baggage from the days of custom SGI workstations and lots of things in compatibility profiles have not been hardware native on desktop GPUs in decades. DirectX, on the other hand, has always followed desktop hardware. It used to be if you wanted an indication of where desktop GPUs were headed, D3D was a good marker.
OpenGL is arguably more complicated than DirectX because until recently it never let go of anything, whereas DirectX radically redefined the API and stripped legacy support with every iteration. Both APIs have settled down in recent years, but D3D still maintains a bit of an edge considering it only has to be implemented on a single platform and Microsoft writes the one and only shader compiler. If anything, the shader compiler and minimal feature set (void of legacy baggage) in D3D is probably why you get the impression that vendors support it better.
With the emergence of AMD Mantle, the desktop picture might change again (think back to the days of 3Dfx and Glide)... it certainly goes to show that OS developers have very little to do with graphics API design. NV and AMD both have proprietary APIs on the PS3, GameCube/Wii/WiiU, and PS4 that they have to implement in addition to D3D and OpenGL on the desktop, so the overall picture is much broader than you think.

C/C++ Cross platform Library allowing the utilisation of GPU for floating point calculations

Does any one know of any cross platform c/c++ libraries which will utilise the GPU for the purposes of floating point calculations, not specifically graphics oriented calcs. Which ones are in common use, which ones recommended , which ones have you had experience of. Specifically it should be open source with a GPL license.
addendum:- Any libraries you know of that are not GPU manufacturer specific.
addendum:- OpenCL has been brought up in a few answers as having cross GPU compatability. Does anyone have experience using it and can vouch for it's maturity? I'm guessing that if it's Kronos it'll be pretty good.
I would very much doubt that you have a reasonable chance of finding something like this as open source, as "utilise GPU" usually implies "heftily hardware specific, top secret NDA driver stuff".
However, OpenCL is as cross platform as you can get (works with every major vendor and even has at least one software fallback implementation) and it is reasonably free insofar as there are no fees and no restrictions on how you may use it. The only non-free thing is that it's not open source and you can't modify it.
ATI/AMD and nVidia have been offering OpenCL working on G80 and RHD, respectively, for some time, also ATI/AMD has been offering a software implementaion for a good time. As for Intel, I remember reading that they were working for OpenCL for Sandy Bridge generation about a year or so ago, so it should probably be finished by now as well.
How about OpenCL?
Here is the project page at the Kronos Group.
It all depends on the chip you are targeting but NVIDIA offers an SDK in the form of CUDA for Windows, Mac, and Linux. The license is not opensource but depending on what you need that might not actually be a big hurdle.

OpenCL or CUDA Which way to go?

I'm investigating ways of using GPU in order to process streaming data. I had two choices but couldn't decide which way to go?
My criterias are as follows:
Ease of use (good API)
Community and Documentation
Performance
Future
I'll code in C and C++ under linux.
OpenCL
interfaced from your production code
portable between different graphics hardware
limited operations but preprepared shortcuts
CUDA
separate language (CUDA C)
nVidia hardware only
almost full control over the code (coding in a C-like language)
lot of profiling and debugging tools
Bottom line -- OpenCL is portable, CUDA is nVidia only. However, being an independent language, CUDA is much more powerful and has a bunch of really good tools.
Ease of use -- OpenCL is easier to use out of the box, but once you setup the CUDA coding environment it's almost like coding in C.
Community and Documentation -- both have extensive documentation and examples, however I think CUDA has better.
Performance -- CUDA allows for greater control, hence can be better fine-tuned for higher performance.
Future -- hard to say really.
My personal experiences were:
API: OpenCL has slightly more complex api. However most time you will spent with writing kernel code, and here both are almost identical.
Community: CUDA has a much bigger community then OpenCL up til now, but this will probably about to even out.
Documentation: Both are very well documented.
Performance: We made the experience, that OpenCL drivers are not yet fully optimized.
Future: The future lies with OpenCL as it is an open standard, not restricted to a vendor or specific hardware!
This assessment is from 2010, so probably out-dated.
OpenCL all the way unless you have a specific reason to use CUDA. OpenCL runs well on multicores like Intel i7 in addition to running on GPUs. By using OpenCL you can run it on a much wider range of hardware from Droid cell phones to the IBM Power7 compute nodes of the world's largest supercomputer, Blue Waters, which is supposed to come online next year.