Use CUDA without CUDA enabled GPU - ROCm or OpenCL - c++

I'm doing academic robotics research, so we need to integrate several libraries in the field of vision, sensing, actuators.
There's a huge problem when trying to use libraries that solve problems and also how to integrate them together, since some use CUDA, othres ROCm, others OpenCL. I don't have an NVidia hardware in my host machine.
I'm starting the research on how to be a bit independent on this (I'm willing to sacrifice on performance), but there are several libraries that compile CUDA to portable C++, or CUDA to OpenCL, so it seems it shouldn't be a blocker having either NVidia or AMD in my opinion.
I'd suggest having these libraries in mind
https://github.com/hughperkins/coriander (convert CUDA to OpenCL to run in other cards)
https://github.com/ROCm-Developer-Tools/HIP (convert CUDA to portable C++).
Can you suggest alternatives to this? There may be better ways on how to use CUDA enabled libraries on a non NVidia enabled host.
The specific case would be to run PoseCNN library (it was built with CUDA) without CUDA or Nvidia in an Ubuntu machine. https://github.com/yuxng/PoseCNN

Related

Loading clang-compiled OpenCL kernels into OpenCL programs

Using clang, I am able to compile OpenCL-C++ kernels (using clang -c). I am trying to load these compiled kernels into my OpenCL application, but am at a loss how to achieve that. I am using Ubuntu 22.04, with an Intel CPU and an Nvidia GPU. The GPU unfortunately does not support SPIR-V injection via clCreateProgramWithIL - if it did, I would happily take that route. I also cannot use clCreateProgamWithSource, because that unfortunately does not support C++ features inside the kernels.
Is there any way I can compile OpenCL-C++ kernels using clang and then load them into my OpenCL application? Or is there a way I can still use clCreateProgramWithSource with C++ features inside the kernels, maybe? Either way would work well! (There has been a similar question here but focusing on macOS, which has its own OpenCL implementation and compiler, as far as I know.)

x86 32-bit Support for Cuda

I am working on a vision system and using Opencv for image processing and I have to present the whole system as a 32 bit ActiveX control to be integrated in an IWS (Indosoft Web Studio) application as IWS is 32 bit.
How can I do that as I would need a 32 bit Opencv build with cuda support and there is no 32 bit Cuda toolkit any more.
Can anyone please clarify the following from Nvidia.
Native development using the CUDA Toolkit on x86_32 is unsupported.
Deployment and execution of CUDA applications on x86_32 is still
supported, but is limited to use with GeForce GPUs. To create 32-bit
CUDA applications, use the cross-development capabilities of the CUDA
Toolkit on x86_64.
Support for developing and running x86 32-bit applications on x86_64
Windows is limited to use with: GeForce GPUs CUDA Driver CUDA Runtime
(cudart) CUDA Math Library (math.h) CUDA C++ Compiler (nvcc) CUDA
Development Tools
I can see the point but I can't find any direction on how to use the cross-development capabilities of the CUDA Toolkit on x86_64.
Echoing a comment into an answer -- yes you can cross compile to 32 bit output using a 64 bit CUDA tool chain in Windows. However, NVIDIA ceased delivering 32 bit CUDA application libraries many years ago. Quoting Robert Crovella:
This means that CUFFT, CUBLAS, NPP, and other such libraries are only
provided for use when the x64 platform is selected. If OpenCV had any
dependency on NPP, for example, you would be out of luck
Given OpenCV has dependencies on CUFFT, CUBLAS, and NPP, it is extremely unlikely that you can build and run a 32 bit OpenCV version using a modern CUDA toolkit because of the lack of libraries.

Setting up openCL SDKs

I have a task on uni starts with setting the visual studio environment to :
OpenCL SDKs:
AMD – AMD APP (Accelerated Parallel Processing)
NVIDIA – CUDA (Compute Unified Device Architecture)
Intel – Intel SDK for OpenCL Applications
OpenCL uses an “Installable Client Driver” (ICD), model
To allow platforms from different vendors to co-exist
Applications can choose a platform at runtime
And I don't know how to do it ..
i need halp and thanks
I checked by running Regedit for the settings but I only found the default
In order to make OpenCL available for pre-compiled programs you simply need to install the Nvidia, AMD or Intel GPU drivers, depending on which GPU you have (not that older Intel integrated GPUs don't support OpenCL).
For CPU OpenCL support you can install the Intel runtime (Intel only) or POCL (open source, all modern CPUs supported, but you need to compile it from source). Unfortunately AMD does not provide APP SDK with CPU support anymore (although a simple web search will still get you the executables).
All of the above automatically register the respective ICD, so you don't have to do anything special about it.
For developing OpenCL applications you need a standalone OpenCL ICD loader (.lib/.a and .dll) and the OpenCL headers (.h), which you can get from those links, though you need to compile the former. These are also provided in ready to use, binary form in OpenCL SDKs such as the ones provided by Intel (which includes Intel's OpenCL CPU runtime) or AMD.

How to run a compiled CUDA code on a machine that doesn't have the CUDA toolkit installed?

will any memory bound application benefit from high memory throughput of tesla(cc2.0) more than high number of cuda cores of geforce (cc5.0)?
how can i run exe filed compiled on machine with geforce card on another machine with tesla card without installing VS2010 and cuda on tesla machine (ie i want this exe file to be stand alone application)?
will any memory bound application benefit from high memory throughput of tesla(cc2.0) more than high number of cuda cores of geforce (cc5.0)?
A memory bound CUDA application will likely run fastest on whichever GPU has higher memory bandwidth. There are certainly other factors that could affect this, but this is a reasonable general principle. I'm not sure which 2 cards you are referring to, but it's entirely possible that a particular GeForce GPU could have higher memory bandwidth than a particular Tesla GPU. The cc2.0 Tesla GPUs (e.g. M2050, C/M2070, C/M2075, M2090) probably do have higher memory bandwidth (over 100GB/s) than the cc5.0 GeForce GPUs I am aware of (e.g. GeForce GTX 750/750Ti -- less than 90GB/s).
how can i run exe filed compiled on machine with geforce card on another machine with tesla card without installing VS2010 and cuda on tesla machine (ie i want this exe file to be stand alone application)?
There are a few things that are pretty easy to do, which will make it easier to move a compiled CUDA code from one machine to another.
make sure the CUDART library is statically linked. This should be the default settings for recent CUDA versions. You can read more about it here. If you are using other libraries (e.g. CUBLAS, etc.) you will want to make sure those other libraries are statically linked also (if possible) or bundle the library (.so file in linux, .dll in windows) with your application.
compile for a range of compute architectures. If you know, for example that you only need to and want to target cc2.0 and cc5.0, then make sure your nvcc compile command line contains switches that target both cc2.0 and cc5.0. This is a fairly complicated topic, but if you review the CUDA sample codes (makefiles or VS projects) you will find examples of projects that build for a wide variety of architectures. For maximum compatibility, you probably want to make sure you are including both PTX and SASS in your executables. You can read more about it here and here.
Make sure the machines have compatible drivers. For example, if you compile a CUDA code using CUDA 7.0 toolkit, you will only be able to run it on a machine that has a compatible GPU driver installed (the driver is a separate item from the toolkit. A GPU driver is required to make use of a GPU, the CUDA toolkit is not.) For CUDA 7, this roughly means you want an r346 or newer driver installed on any machine that you want to run a CUDA 7 compiled code on. Other CUDA toolkit versions have other associated minimum driver versions. For reference, this answer gives an idea of the approximate minimum GPU driver versions needed for some recent CUDA toolkit versions.

Can you begin programming OpenCL without downloading an SDK?

I am trying to get a program that will run on both ATI and NVidia, and as such, I want to avoid using either SDK. Is it possible to do this without an SDK, using only VS2010 and Windows (XP or 7)?
If so, how can I go about configuring VS2010 Linker so that it will work?
Strictly speaking, no SDK is needed. In fact, no SDK is desired, as both the NVIDIA and AMD/ATI SDKs tie the code to their environments, and, by extension, their hardware. What you do need is:
1) A GPU that will run OpenCL code. See this Question: List of OpenCl Compliant CPU/GPU
2) The OpenCL library (libOpenCL.so on Linux); this is usually included and installed with the Graphics driver, which may be downloaded from AMD or NVIDIA.
3) The OpenCL header files. These may be obtained from Khronos.org, but are included with all OpenCL SDKs that I am aware of. On a Linux system these typically go in the directory /usr/include/CL
The NVIDIA and AMD SDKs provide a number of utilities and wrappers that make using the OpenCL API easier, but they are not required for writing OpenCL code, or for making API calls. These wrappers and utilities are not poratble. If you're interested in writing portable code, stick to the OpenCL spec, also available from Khronos.org.
To write code, all that you need to do is include opencl.h in your host program, and then make the API calls that are necessary to set up the OpenCL environment and run your OpenCL program. Also, don't forget to link against the OpenCL library (give gcc the -lOpenCL flag under Linux).
OpenCL is a standard. It only defines conventions. To use it, you need a driver for your graphical card. NVidia, AMD (ATI) and Apple all provide such drivers. You definitively need a SDK.
#virtuallinux alludes to the right answer: If you're worried about accidentally using some vendor-specific extensions, get the Khronos SDK.