In order to save power it is common in recent graphics architectures to dynamically switch between a discrete high-performance and an integrated lower-performance GPU, where the high-performance GPU is only enabled when the need for extra performance is present.
This technology is branded as nvidia Optimus and AMD Enduro for the two main GPU vendors.
However due to the non-standardized way in which these technologies work, managing them from a developer's perspective can be a nightmare. For example in this PDF from nvidia on the subject, they explain the many intricacies, limitations and pitfalls that you will have to worry about as a developer to manage nvidia Optimus on just one platform.
As an example, in the linked PDF above, the following is a tip for selecting GPU on Windows:
extern "C" {
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;
}
However that will only work for nvidia GPUs on Windows platform. What would be the equivalent for AMD/Intel on OSX/Linux, and on AMD hardware?
So in more detail my question is, how can I
Detect the presence of Optimus/Enduro and possibly other dynamically=switching GPU architectures programmatically?
Select which of the GPUs should be enabled programmatically?
Do so in a manner that is cross-platform over all relevant platforms?
Do so in a manner that works together with all technologies that might use GPU such as DX/OpenGL/Vulkan/OpenCL/CUDA/Qt?
I am working with C++14/Qt5.7 codebase under Ubuntu 16.04-amd64 using nVidia hardware.
Related
I have a relatively small section of code that deals with huge datasets which I've already parallelized using openmp and am keen to increase performance further using the GPU. The program is C++, developed under VS2015, runs exclusively on Windows and will need to support 64 bit versions from 7 upwards on as wide a variety of GPUs as is feasible. Technologies I've been looking at so far include AMP, OpenCL, HLSL, and CUDA. Questions already asked, such as this with an informative answer by Ade Miller, make me question whether AMP is the way to go although it looks like the easiest option. I'm dismissing CUDA as it limits me in terms of hardware supported, and am tending towards OpenCL while currently working my way through the following book. As such, I've the following questions;
Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL? Reason I ask this is that the OpenCL.DLL downloaded with the latest version of the CUDA SDK is 1.9. I had to download the Intel SDK for OpenCL to get a 2.x version.
If I go with OpenCL, what do I have to distribute with my application (assuming OpenCL.DLL as a minimum) and are there any licensing issues? Are default drivers for most cards going to support OpenCL and if so which versions?
With respect to the above, am I actually better of with AMP, as it works with anything that has DirectX 11 or better?
(Apologies if the above is slightly off topic, if anyone believes that it is perhaps they could point me to a better forum to ask these questions)
Is OpenCL a good approach here, as other posts suggest it may also be on the way out?
OpenCL seems to be most widely supported GPU computing platform. Supported by nVidia, AMD and Intel. Works on most mobile platforms as well. It is also large set of libraries available: ViennaCL, clBLast, clBlast, Boost-Compute and so on.
If I go for OpenCL while wanting to support the widest range of GPUs, am I better off with a 1.x version of OpenCL?
Yes, currently the safest is to stick with 1.2 - and actually it is more then enough.
All major desktop GPU vendors (Intel, AMD, nVidia) support at least OpenCL 1.2.
Actually only nVidia didn't released official 2.0 support - it is still in beta stage.
Also note that some older GPUs will support OpenCL 1.2 only as well.
I would like to develop a library with an algorithm that can run on the CPU or the GPU. The GPU can be Nvidia (then the algorithm will use CUDA) or not (then the algorithm will use OpenCL).
I would like to emulate a GPU in this project because maybe:
I will use different computer to develop the software and some of them don't have a GPU.
The software will be finally executed in servers that can have a GPU or not and the unit test must be executed and passed.
Is there a way to emulate a GPU for unit testing purposes?
In the following link:
GPU Emulator for CUDA programming without the hardware
They show a solution but only for CUDA, not for OpenCL and the software they propose "GPUOcelot" is no longer actively maintained.
It depends on what you mean on emulation. You cannot emulate the speed of GPUs.
The GPU is architecturally very different from the CPU, with a lot of working threads (1000s, 10000s, ...), that's why we use it. The CPU can have only a few threads, even when you parallelize the code. They also have different instruction sets.
You can however emulate the execution using special softwares, like NVEmulate for NVIDIA GPUs and OpenCL Emulator-Debugger for AMD.
A related question: GPU Emulator for CUDA programming without the hardware, where the accepted answer recommends gpuocelot for CUDA emulation.
I don't know the full state of the art but I can provide a very limited set of things to look at which may be useful.
The accepted answer for this question is now out of date.
The question of compiling and runnning GPU code for CUDA or OpenCL on a machine that does not natively support it has come up on here several times (but sadly its often taken as off-topic). This answer is for those questions too.
Many of the answers refer to software solutions that have not been maintained. There seem to be only two answers which stand the test of time which treat this as a mu question.
Use a real GPU - i.e. buy a cheap cuda card if you don't already have one.
Rent someone elses GPU in the cloud
However emulators do exist.
Also GPU virtualization is well covered by the wikipedia page. There is strong support for getting virtual machines to use the hosts hardware.
Docker and virtualbox both for example support GPU passthough.
Reasons to emulate
To learn and keep up to date with changes to CUDA and OpenCL
To estimate the effect of the various APIs on performance.
To test that your code works on a variety of different platforms.
As a proxy for hardware you don't have access to (as per this question)
Kind of emulation
For testing you might accept a slow implementation as long as it is compliant and reliable.
For production running on different hardware you would more likely accept similar, but not 100% equivalent constructs but (e.g. different warp size, different high-level libraries for FFT, ...) and much more complicated performance-optimized implementations of primitives. You would probaly demand at least 80% of the Cuda speed for comparable hardware.
(Thanks to https://stackoverflow.com/users/13130048/sebastian for those two points)
For the second case you would likely need not just GPU virtualisation but additional optimisation passes.
Why are there less emulators and why don't they survive the test of time?
GPUs are affordable. It is only high performance that costs.
GPUs (not to mention TPUs and FPGAs) are developing rapidly.
Some hardware tricks are kept secret from competitors so emulating actual hardware is difficult.
The CUDA and openCL standards are changing too but less quickly.
There is arguably a need for more programmers that understand them. Compiling your code without running and testing it would simply be unprofessional. There would seem to be an obvious need for emulation where you don't have all the possible or interesting hardware combinations physical available.
That being the case its surprising that so many of these emulation projects have not stood the test of time or been endorsed/provided by GPU manufacturers.
There are some active emulation projects however.
Active GPU EMulation Projects
There are at least two active emulation projects maintained as of October 2022:
gpgpu-sim
oclgrind - openCL device simulator
I cannot speak to how good these are and how commonly they are used compared to using real GPUs (either your own or rented).
Honorable mentions
Cuda to OpenCL source to source transpilers.
These appear to be maintained but are not themselves emulators.
CU2CL
coriander
Why is this not a solved problem?
There are a number of challengs to overcome. My take on these would be something like:
provide a runtime emulating a particular version of the CUDA or openCL standard
provide a compiler targeting this runtime (ideally gcc or clang)
get the backing of a vendor (e.g. Nvidia or the kronos group)
get the backing of a community (i.e. a decent userbase and set of contributors)
build support into a popular emulation environment (e.g. virtualbox)
You could also argue the case that almost all people working in this area have access to real GPUs so this is not necessary at all.
The vendors of point 3 are doing well with points 1 and 2 and 4.
An emulator has to both build on that and take some mindshare of its own.
This is an uphill struggle. I hope and believe there will be success in the future.
Looking at virtualbox the last discussion I can find is from 2011.
https://forums.virtualbox.org/viewtopic.php?f=9&t=41155
Seemingly retired projects
These have been mentioned in answers to previous other attempts to ask and answer this kind of question.
gpuocelot - no longer maintained
mcuda - looks unmaintained
cuda-waste - on google code which was frozen long ago
nvemulate - cude emulator Nvidia - retired a while back
Other seemingly retired projects of interest:
openTPU - a Tensor PU emulator from 2017
gdev - 2010
Implementing Open-Source CUDA Runtime - paper from 2013
Earlier (out of date) questions:
GPU Emulator for CUDA programming without the hardware
Asked 2010 - most recent answer 2016
CUDA without CUDA enabled gpu
Asked 2010
How can I emulate a GPU for testing code written in Pytorch?
Asked 2021 - pytorch specific
CUDA code without a GPU
Asked 2014
CUDA on a system that has no GPU
Asked 2013
Using the built-in graphics cards without a NVIDIA graphics card, Can I use the CUDA and Caffe library?
Asked 2016
I understand that AMD created an alternative implementation of OpenCL that runs on x86 CPUs. This is very useful from the standpoint of simplified debugging. Unfortunately, OpenCL isn't an option for me.
Are there any Open GL x86 implementations in existence? This would greatly ease my development process, at the cost of some CPU time, of course. I would then run the same code on a GPU, later, with no changes necessary.
Mesa might be an option for you.
From their website:
Mesa is the OpenGL implementation for several types of hardware made by Intel, AMD and NVIDIA, plus the VMware virtual GPU. There's also several software-based renderers: swrast (the legacy Mesa rasterizer), softpipe (a gallium reference driver) and llvmpipe (LLVM/JIT-based high-speed rasterizer).
When using Mesa you can set the LIBGL_ALWAYS_SOFTWARE environment variable, which will cause Mesa to "always use software rendering".
OpenGL is not an instruction set, neither is it a library. It's a drawing API for interfacing with GPUs (yes there are software based rasterizers like Mesa softpipe). Most computers you can find these days support OpenGL.
When you use the OpenGL API it's not like your OpenGL calls get "translated" into a special instruction set for the GPU that's then part of your program. OpenGL operations will just create calls that eventually end up in a device driver, just like reading or writing to a file.
As I understand GPU vendors defined standard interface to be used by OS Developers to communicate with their specific driver. So DirectX and OpenGL are just wrappers for that interface. When OS developers decide to create new version of Graphic API , GPU vendors expand their interface (new routines are faster and older ones are left for compatibility issues) and OS developers use this new part of interface.
So, when it is said that GPU vendors' support for DirectX is better than for OpenGL, does it simply mean that GPU vendors primarily take into account Microsoft's future plans of developing DirectX API structure and adjust future development of this interface to their needs? Or there is some technical reasons before this?
As I understand GPU vendors defined standard interface to be used by OS Developers to communicate with their specific driver. So DirectX and OpenGL are just wrappers for that interface.
No, not really. DirectX and OpenGL are just specifications that define APIs. But a specification is nothing more than a document, not software. The OpenGL API specification is controlled by Khronos, the DirectX API specification is controlled by Microsoft. Each OS then defines a so called ABI (Application Binary Interface) that specifies which system level APIs are supported by the OS (OpenGL and DirectX are system level APIs) and what rules an actual implementation must adhere to, when being run on the OS in question.
The actual OpenGL or Direct3D implementation happens in the hardware's drivers (and in fact the hardware itself is part of the implementation as well).
When OS developers decide to create new version of Graphic API , GPU vendors expand their interface
In fact it's the other way round: Most of the graphic APIs specifications are laid out by the graphics hardware vendors. After all they are close to where the rubber hits the road. In the case of Khronos the GPU makers are part of the controlling group of Khronos. In the case of DirectX the hardware makers submit drafts to and review the changes and suggestions made by Microsoft. But in the end each new APIs release reflects the common denominator of the capabilities of the next hardware generation in development.
So, when it is said that GPU vendors' support for DirectX is better than for OpenGL, does it simply mean that GPU vendors primarily take into account Microsoft's future plans of developing DirectX API structure and adjust future development of this interface to their needs?
No, it means that each GPU vendor implements his own version of OpenGL and the Direct3D backend, which is where all the magic happens. However OpenGL puts a lot of emphasis on backward compatibility and ease of transition to newer functionality. Direct3D development OTOH is quick in cutting the ties with earlier versions. This also means that full blown compatibility profile OpenGL implementations are quite complex beasts. That's also the reason why recent versions of OpenGL core profiles did (overdue) work in cutting down support for legacy features; this reduction of API complexity is also quite a liberating thing for developers. If you develop purely for a core profile it simplifies a lot of things; for example you no longer have to worry about a plethora of internal state when writing plugin.
Another factor is, that for Direct3D there's exactly one shader compiler, which is not part of the driver infrastructure / implementation itself, but gets run at program build time. OpenGL implementations however must implement their own GLSL shader compiler, which complicates things. IMHO the lack of a unified AST or immediate shader code is one of the major shortcomings of OpenGL.
There is not a 1:1 correspondence between the graphics hardware abstraction and graphics API like OpenGL and Direct3D. WDDM, which is Windows Vista's driver model defines things like common scheduling, memory management, etc. so that DirectX and OpenGL applications work interoperably, but very little of the design of DirectX, OpenGL or GPUs in general has to do with this. Think of it like the kernel, nobody creates a CPU specifically to run it, and you do not have to re-compile the kernel everytime a new iteration of a processor architecture comes out that adds a new subset of instructions.
Application developers and IHVs (GPU vendors, as you call them) are the ones who primarily deal with changes to GPU architecture. It may appear that the operating system has more to do with the equation than it actually does because Microsoft (more so) and Apple--who both maintain their own proprietary operating systems--are influential in the design of DirectX and OpenGL. These days OpenGL closely follows the development of commodity desktop GPU hardware, but this was not always the case - it contains baggage from the days of custom SGI workstations and lots of things in compatibility profiles have not been hardware native on desktop GPUs in decades. DirectX, on the other hand, has always followed desktop hardware. It used to be if you wanted an indication of where desktop GPUs were headed, D3D was a good marker.
OpenGL is arguably more complicated than DirectX because until recently it never let go of anything, whereas DirectX radically redefined the API and stripped legacy support with every iteration. Both APIs have settled down in recent years, but D3D still maintains a bit of an edge considering it only has to be implemented on a single platform and Microsoft writes the one and only shader compiler. If anything, the shader compiler and minimal feature set (void of legacy baggage) in D3D is probably why you get the impression that vendors support it better.
With the emergence of AMD Mantle, the desktop picture might change again (think back to the days of 3Dfx and Glide)... it certainly goes to show that OS developers have very little to do with graphics API design. NV and AMD both have proprietary APIs on the PS3, GameCube/Wii/WiiU, and PS4 that they have to implement in addition to D3D and OpenGL on the desktop, so the overall picture is much broader than you think.
I've been thinking of making an additional wrapper for my project to use OpenGL rather then Allegro. I was not sure which OpenGL version to go for since I know that some computers cannot run recent versions such as v4.4. Also, I need a version which compiles no problem in Linux, Windows, Mac.
You'll want to look at what kinds of graphics cards will be available on your target systems and bear some details in mind:
OpenGL up to 1.5 can be completely emulated in software in real time on most systems. You don't necessarily need hardware support for good performance.
OpenGL 1.4 has universal support. Virtually all hardware supports it.
Mac OS X only supports up to OpenGL 2.0 and OpenGL 2.1, depending on OS version and hardware. Systems using GMA950 have only OGL1.4 support. Mac OS X Lion 10.7 supports OpenGL 3.2 Core profile on supported hardware.
On Linux, it's not unusual for users to specifically prefer open source drivers over the alternative "binary blobs," so bear in mind that the version of Mesa that most people have supports only up to about OpenGL 2.1 compatibility. Upcoming versions have support for OpenGL 3.x. Closed-source "binary blobs" will generally support the highest OpenGL version for the hardware, including up to OpenGL 4.2 Core.
When considering what hardware is available to your users, the Steam hardware Survey may help. Note that most users have DirectX 9-compatible hardware, which is roughly feature-equivalent to OpenGL 2.0. Wikipedia's OpenGL article also specifies what hardware came with initial support for which versions.
If you use a library like GLEW or GLEE or any toolkit that depends on them or offers similar functionality (like SFML, or even Allegro since 4.3), then you'll not need to concern yourself with whether your code will compile. These toolkits will take care of the details of enabling extensions and providing all of the symbols you need.
Given all of this, I'd suggest targeting OpenGL 2.1 to get the widest audience possible with the best feature support.
Your safe bet is OpenGL 2.1, it needs to be supported by the driver on your target system though. OpenGL ES, used on several mobile platforms, is basically a simplified OpenGL 2, so even porting to those would be fairly easy. I highly recommend using libGlew as VJo said.
It's less about operating systems, and more about video card drivers.
I think 1.4 is the highest version which enjoys support by all consumer graphics systems: ATI (AMD), nVidia, and Intel IGP. Intel is definitely the limiting factor here, even when ATI or nVidia doesn't have hardware support, they release OpenGL 4.1 drivers which use software to emulate the missing features. Not so with Intel.
OpenGL is not a library you usually compile and ship yourself (unless you're a Linux distributor and are packaging X.Org/Mesa). Your program just dynamically links against libGL.so (Linux/BSD), opengl32.dll (Windows, on 64 Bit systems, it's also calles opengl32.dll, but it's in fact a 64 Bit DLL) or the OpenGL Framework (MacOS X). This gives your program access to the system's OpenGL installation. The version/profile you want to use has no influence on the library you link!
Then after your program has been initialized you can test, which OpenGL version is available. If you want to use OpenGL-3 or 4 you'll have to jump a few additional hoops in Windows to make full use of it, but normally some kind of wrapper helps you with context creation anyway, boiling it down to only a few lines.
Then in the program you can implement multiple code paths for the various versions. Usually lower OpenGL verion codepaths share a large subset with higher version codepaths. I recommend writing new code in the highest version available, then adding additional code paths (oftenly just substitutions which can be done by C preprocessor macros or similar) for lower versions until you reach the lowest common denominator of features you really need.
Then you need to use OpenGL 1.1, and use needed (and supported) functions through the use of wglGetProcAddress (on windows) or glXGetProcAddress (on linux).
Instead of using those two functions, you can use GLEW library, which does that for you and is cross platform.