Best way to leverage the GPU - c++

I have a relatively small section of code that deals with huge datasets which I've already parallelized using openmp and am keen to increase performance further using the GPU. The program is C++, developed under VS2015, runs exclusively on Windows and will need to support 64 bit versions from 7 upwards on as wide a variety of GPUs as is feasible. Technologies I've been looking at so far include AMP, OpenCL, HLSL, and CUDA. Questions already asked, such as this with an informative answer by Ade Miller, make me question whether AMP is the way to go although it looks like the easiest option. I'm dismissing CUDA as it limits me in terms of hardware supported, and am tending towards OpenCL while currently working my way through the following book. As such, I've the following questions;
Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL? Reason I ask this is that the OpenCL.DLL downloaded with the latest version of the CUDA SDK is 1.9. I had to download the Intel SDK for OpenCL to get a 2.x version.
If I go with OpenCL, what do I have to distribute with my application (assuming OpenCL.DLL as a minimum) and are there any licensing issues? Are default drivers for most cards going to support OpenCL and if so which versions?
With respect to the above, am I actually better of with AMP, as it works with anything that has DirectX 11 or better?
(Apologies if the above is slightly off topic, if anyone believes that it is perhaps they could point me to a better forum to ask these questions)

Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
OpenCL seems to be most widely supported GPU computing platform. Supported by nVidia, AMD and Intel. Works on most mobile platforms as well. It is also large set of libraries available: ViennaCL, clBLast, clBlast, Boost-Compute and so on.
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL?
Yes, currently the safest is to stick with 1.2 - and actually it is more then enough.
All major desktop GPU vendors (Intel, AMD, nVidia) support at least OpenCL 1.2.
Actually only nVidia didn't released official 2.0 support - it is still in beta stage.
Also note that some older GPUs will support OpenCL 1.2 only as well.

Related

Is it possible to emulate a GPU for CUDA/OpenCL unit testing purposes?

I would like to develop a library with an algorithm that can run on the CPU or the GPU. The GPU can be Nvidia (then the algorithm will use CUDA) or not (then the algorithm will use OpenCL).
I would like to emulate a GPU in this project because maybe:
I will use different computer to develop the software and some of them don't have a GPU.
The software will be finally executed in servers that can have a GPU or not and the unit test must be executed and passed.
Is there a way to emulate a GPU for unit testing purposes?
In the following link:
GPU Emulator for CUDA programming without the hardware
They show a solution but only for CUDA, not for OpenCL and the software they propose "GPUOcelot" is no longer actively maintained.
It depends on what you mean on emulation. You cannot emulate the speed of GPUs.
The GPU is architecturally very different from the CPU, with a lot of working threads (1000s, 10000s, ...), that's why we use it. The CPU can have only a few threads, even when you parallelize the code. They also have different instruction sets.
You can however emulate the execution using special softwares, like NVEmulate for NVIDIA GPUs and OpenCL Emulator-Debugger for AMD.
A related question: GPU Emulator for CUDA programming without the hardware, where the accepted answer recommends gpuocelot for CUDA emulation.
I don't know the full state of the art but I can provide a very limited set of things to look at which may be useful.
The accepted answer for this question is now out of date.
The question of compiling and runnning GPU code for CUDA or OpenCL on a machine that does not natively support it has come up on here several times (but sadly its often taken as off-topic). This answer is for those questions too.
Many of the answers refer to software solutions that have not been maintained. There seem to be only two answers which stand the test of time which treat this as a mu question.
Use a real GPU - i.e. buy a cheap cuda card if you don't already have one.
Rent someone elses GPU in the cloud
However emulators do exist.
Also GPU virtualization is well covered by the wikipedia page. There is strong support for getting virtual machines to use the hosts hardware.
Docker and virtualbox both for example support GPU passthough.
Reasons to emulate
To learn and keep up to date with changes to CUDA and OpenCL
To estimate the effect of the various APIs on performance.
To test that your code works on a variety of different platforms.
As a proxy for hardware you don't have access to (as per this question)
Kind of emulation
For testing you might accept a slow implementation as long as it is compliant and reliable.
For production running on different hardware you would more likely accept similar, but not 100% equivalent constructs but (e.g. different warp size, different high-level libraries for FFT, ...) and much more complicated performance-optimized implementations of primitives. You would probaly demand at least 80% of the Cuda speed for comparable hardware.
(Thanks to https://stackoverflow.com/users/13130048/sebastian for those two points)
For the second case you would likely need not just GPU virtualisation but additional optimisation passes.
Why are there less emulators and why don't they survive the test of time?
GPUs are affordable. It is only high performance that costs.
GPUs (not to mention TPUs and FPGAs) are developing rapidly.
Some hardware tricks are kept secret from competitors so emulating actual hardware is difficult.
The CUDA and openCL standards are changing too but less quickly.
There is arguably a need for more programmers that understand them. Compiling your code without running and testing it would simply be unprofessional. There would seem to be an obvious need for emulation where you don't have all the possible or interesting hardware combinations physical available.
That being the case its surprising that so many of these emulation projects have not stood the test of time or been endorsed/provided by GPU manufacturers.
There are some active emulation projects however.
Active GPU EMulation Projects
There are at least two active emulation projects maintained as of October 2022:
gpgpu-sim
oclgrind - openCL device simulator
I cannot speak to how good these are and how commonly they are used compared to using real GPUs (either your own or rented).
Honorable mentions
Cuda to OpenCL source to source transpilers.
These appear to be maintained but are not themselves emulators.
CU2CL
coriander
Why is this not a solved problem?
There are a number of challengs to overcome. My take on these would be something like:
provide a runtime emulating a particular version of the CUDA or openCL standard
provide a compiler targeting this runtime (ideally gcc or clang)
get the backing of a vendor (e.g. Nvidia or the kronos group)
get the backing of a community (i.e. a decent userbase and set of contributors)
build support into a popular emulation environment (e.g. virtualbox)
You could also argue the case that almost all people working in this area have access to real GPUs so this is not necessary at all.
The vendors of point 3 are doing well with points 1 and 2 and 4.
An emulator has to both build on that and take some mindshare of its own.
This is an uphill struggle. I hope and believe there will be success in the future.
Looking at virtualbox the last discussion I can find is from 2011.
https://forums.virtualbox.org/viewtopic.php?f=9&t=41155
Seemingly retired projects
These have been mentioned in answers to previous other attempts to ask and answer this kind of question.
gpuocelot - no longer maintained
mcuda - looks unmaintained
cuda-waste - on google code which was frozen long ago
nvemulate - cude emulator Nvidia - retired a while back
Other seemingly retired projects of interest:
openTPU - a Tensor PU emulator from 2017
gdev - 2010
Implementing Open-Source CUDA Runtime - paper from 2013
Earlier (out of date) questions:
GPU Emulator for CUDA programming without the hardware
Asked 2010 - most recent answer 2016
CUDA without CUDA enabled gpu
Asked 2010
How can I emulate a GPU for testing code written in Pytorch?
Asked 2021 - pytorch specific
CUDA code without a GPU
Asked 2014
CUDA on a system that has no GPU
Asked 2013
Using the built-in graphics cards without a NVIDIA graphics card, Can I use the CUDA and Caffe library?
Asked 2016

Which OpenGL version is most stable and currently used?

I've been thinking of making an additional wrapper for my project to use OpenGL rather then Allegro. I was not sure which OpenGL version to go for since I know that some computers cannot run recent versions such as v4.4. Also, I need a version which compiles no problem in Linux, Windows, Mac.
You'll want to look at what kinds of graphics cards will be available on your target systems and bear some details in mind:
OpenGL up to 1.5 can be completely emulated in software in real time on most systems. You don't necessarily need hardware support for good performance.
OpenGL 1.4 has universal support. Virtually all hardware supports it.
Mac OS X only supports up to OpenGL 2.0 and OpenGL 2.1, depending on OS version and hardware. Systems using GMA950 have only OGL1.4 support. Mac OS X Lion 10.7 supports OpenGL 3.2 Core profile on supported hardware.
On Linux, it's not unusual for users to specifically prefer open source drivers over the alternative "binary blobs," so bear in mind that the version of Mesa that most people have supports only up to about OpenGL 2.1 compatibility. Upcoming versions have support for OpenGL 3.x. Closed-source "binary blobs" will generally support the highest OpenGL version for the hardware, including up to OpenGL 4.2 Core.
When considering what hardware is available to your users, the Steam hardware Survey may help. Note that most users have DirectX 9-compatible hardware, which is roughly feature-equivalent to OpenGL 2.0. Wikipedia's OpenGL article also specifies what hardware came with initial support for which versions.
If you use a library like GLEW or GLEE or any toolkit that depends on them or offers similar functionality (like SFML, or even Allegro since 4.3), then you'll not need to concern yourself with whether your code will compile. These toolkits will take care of the details of enabling extensions and providing all of the symbols you need.
Given all of this, I'd suggest targeting OpenGL 2.1 to get the widest audience possible with the best feature support.
Your safe bet is OpenGL 2.1, it needs to be supported by the driver on your target system though. OpenGL ES, used on several mobile platforms, is basically a simplified OpenGL 2, so even porting to those would be fairly easy. I highly recommend using libGlew as VJo said.
It's less about operating systems, and more about video card drivers.
I think 1.4 is the highest version which enjoys support by all consumer graphics systems: ATI (AMD), nVidia, and Intel IGP. Intel is definitely the limiting factor here, even when ATI or nVidia doesn't have hardware support, they release OpenGL 4.1 drivers which use software to emulate the missing features. Not so with Intel.
OpenGL is not a library you usually compile and ship yourself (unless you're a Linux distributor and are packaging X.Org/Mesa). Your program just dynamically links against libGL.so (Linux/BSD), opengl32.dll (Windows, on 64 Bit systems, it's also calles opengl32.dll, but it's in fact a 64 Bit DLL) or the OpenGL Framework (MacOS X). This gives your program access to the system's OpenGL installation. The version/profile you want to use has no influence on the library you link!
Then after your program has been initialized you can test, which OpenGL version is available. If you want to use OpenGL-3 or 4 you'll have to jump a few additional hoops in Windows to make full use of it, but normally some kind of wrapper helps you with context creation anyway, boiling it down to only a few lines.
Then in the program you can implement multiple code paths for the various versions. Usually lower OpenGL verion codepaths share a large subset with higher version codepaths. I recommend writing new code in the highest version available, then adding additional code paths (oftenly just substitutions which can be done by C preprocessor macros or similar) for lower versions until you reach the lowest common denominator of features you really need.
Then you need to use OpenGL 1.1, and use needed (and supported) functions through the use of wglGetProcAddress (on windows) or glXGetProcAddress (on linux).
Instead of using those two functions, you can use GLEW library, which does that for you and is cross platform.

C/C++ Cross platform Library allowing the utilisation of GPU for floating point calculations

Does any one know of any cross platform c/c++ libraries which will utilise the GPU for the purposes of floating point calculations, not specifically graphics oriented calcs. Which ones are in common use, which ones recommended , which ones have you had experience of. Specifically it should be open source with a GPL license.
addendum:- Any libraries you know of that are not GPU manufacturer specific.
addendum:- OpenCL has been brought up in a few answers as having cross GPU compatability. Does anyone have experience using it and can vouch for it's maturity? I'm guessing that if it's Kronos it'll be pretty good.
I would very much doubt that you have a reasonable chance of finding something like this as open source, as "utilise GPU" usually implies "heftily hardware specific, top secret NDA driver stuff".
However, OpenCL is as cross platform as you can get (works with every major vendor and even has at least one software fallback implementation) and it is reasonably free insofar as there are no fees and no restrictions on how you may use it. The only non-free thing is that it's not open source and you can't modify it.
ATI/AMD and nVidia have been offering OpenCL working on G80 and RHD, respectively, for some time, also ATI/AMD has been offering a software implementaion for a good time. As for Intel, I remember reading that they were working for OpenCL for Sandy Bridge generation about a year or so ago, so it should probably be finished by now as well.
How about OpenCL?
Here is the project page at the Kronos Group.
It all depends on the chip you are targeting but NVIDIA offers an SDK in the form of CUDA for Windows, Mac, and Linux. The license is not opensource but depending on what you need that might not actually be a big hurdle.

Porting a project to OpenGL3

I'm working on a C++ cross-platform OpenGL application (Windows, Linux and MacOS) and I am wondering if some of you could share some advices on porting a large application to OpenGL 3. The reason I am looking into OpenGL 3 is because I think we could benefit a lot from using the new "Sync objects". Nvidia has supported such an extension since the Geforce 256 days (gl_nv_fences) but there seems to be no equivalent functionality on ATI hardware before OpenGL 3.0+...
Our code makes quite heavy use of glut/freeglut, glu functions, OpenGL 2 extensions and CUDA (on supported hardware). The problem I am now facing is that "gl3.h" and "gl.h" are mutually incompatible (as stated in gl3.h). Do you guys know if there is a GL3 glut equivalent ? Also, looking at the CUDA-toolkit header files, it seems that GL-CUDA interoperability is only available when using older versions of OpenGL... (cuda_gl_interop.h includes gl.h...). Am I missing something ?
Thanks a lot for your help.
The last update to glut was version 3.7, roughly 10 years ago. Taking that into account, I doubt that it'll ever support OpenGL 3.x (or 4.x).
The people working on OpenGlut seem to be considering the possibility of OpenGL 3.x support, but haven't done anything with it yet.
FLTK has a (partial) glut simulation, but it's partial enough that a program that "makes heavy use of glut" may not work with it in the first place. Since FLTK is in active development, I'd guess it'll eventually support OpenGL 3.x (or 4.x), but I don't believe it's provided yet, and it may be open to question how soon it will be either.
Edit: As far as CUDA goes, the obvious (though certainly non-trivial) answer would be to use OpenCL instead. This is considerably more compatible both with hardware (e.g., with ATI/AMD boards) and with newer versions of OpenGL.
That leaves glu. Frankly, I don't think there is a clear or obvious answer to this. OpenGL is moving away from supporting things like glu, and instead dropping support for even more of the vaguely glu-like functionality that used to be part of the core OpenGL spec (e.g., all the matrix manipulation primitives). Personally, I think this is a mistake, but whether it's good or bad, it's how things are. Unfortunately, glu is a bit like glut -- the last update to the spec was in 1998, and corresponds to OpenGL 1.2. That doesn't make an update seem at all likely. Unfortunately, I don't know of any really direct replacements for it either. There are clearly other graphics libraries that provide (at least some) similar capabilities, but all of them I can think of would require substantial rewriting.

OpenCL or CUDA Which way to go?

I'm investigating ways of using GPU in order to process streaming data. I had two choices but couldn't decide which way to go?
My criterias are as follows:
Ease of use (good API)
Community and Documentation
Performance
Future
I'll code in C and C++ under linux.
OpenCL
interfaced from your production code
portable between different graphics hardware
limited operations but preprepared shortcuts
CUDA
separate language (CUDA C)
nVidia hardware only
almost full control over the code (coding in a C-like language)
lot of profiling and debugging tools
Bottom line -- OpenCL is portable, CUDA is nVidia only. However, being an independent language, CUDA is much more powerful and has a bunch of really good tools.
Ease of use -- OpenCL is easier to use out of the box, but once you setup the CUDA coding environment it's almost like coding in C.
Community and Documentation -- both have extensive documentation and examples, however I think CUDA has better.
Performance -- CUDA allows for greater control, hence can be better fine-tuned for higher performance.
Future -- hard to say really.
My personal experiences were:
API: OpenCL has slightly more complex api. However most time you will spent with writing kernel code, and here both are almost identical.
Community: CUDA has a much bigger community then OpenCL up til now, but this will probably about to even out.
Documentation: Both are very well documented.
Performance: We made the experience, that OpenCL drivers are not yet fully optimized.
Future: The future lies with OpenCL as it is an open standard, not restricted to a vendor or specific hardware!
This assessment is from 2010, so probably out-dated.
OpenCL all the way unless you have a specific reason to use CUDA. OpenCL runs well on multicores like Intel i7 in addition to running on GPUs. By using OpenCL you can run it on a much wider range of hardware from Droid cell phones to the IBM Power7 compute nodes of the world's largest supercomputer, Blue Waters, which is supposed to come online next year.