Running a LAPACK build on different processors - c++

If I build LAPACK on Windows / VS2010 using an Intel processor, will I be able to run the compiled code on another processor? I ask this, because I see that there are different instructions for compiling LAPACK on Intel or non-Intel processors.
I am thinking of using LAPACK as part of my project. I was going to build my project on my computer which has an Intel processor and then take the executable and simply run it on any other PC, regardless of the type of the processor it has. I have already considered using packages such as Eigen or MKL; but MKL is not cheap and Eigen was not fast enough for my application, so I decided to go with LAPACK / LAPACKE.

Related

How to locate and link with the exact same Intel MKL that MATLAB is shipped with?

The question in the header summarizes what I aim to achieve, which is more precisely detailed as below.
The goal is to compile C++ based mex files that rely on Intel MKL function calls (e.g. matrix inverse calculation).
In order to do so, I would like to ensure that I use the exact same Intel MKL libraries which MATLAB is shipped with, so as to avoid any compatibility issues. In this particular case, this is:
>> version('-blas')
ans =
'Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications, CNR branch AVX
'
>> version('-lapack')
ans =
'Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications, CNR branch AVX
Linear Algebra PACKage Version 3.7.0
'
Warning: the above Intel MKL BLAS & LAPACK are not the same as the ones that are available for download from Intel’s official website. The latter ones I would prefer not to use for the above-mentioned potential compatibility reasons.
In which MATLAB folder(s) are the above reference static/dynamic Intel MKL libraries are located?
I have extensively searched after them in the many MATLAB folders, but I unfortunately could not find them. It seems that they are ‘buried’ somewhere deep in MATLAB.
How is it possible to do this at all?
My setup: Windows 10, MATLAB R2091b, Intel MKL.
I am very grateful for any help. Thank you in advance.
On my Win64 machine I find them here
[matlabroot '/extern/lib/win64/microsoft']
and here
[matlabroot '/extern/lib/win64/mingw64']
The BLAS library is named libmwblas.lib, and the LAPACK library is named libmwlapack.lib
For reference, note that in R2007a and earlier, The Mathworks shipped the BLAS and LAPACK libraries as a single combined library. They weren't shipped as two separate libraries until R2007b and later.

How can we distribute compiled source code if it is specific to the hardware it was compiled on?

Suppose we take a compiled language, for example, C++. Now let's take an example Framework, suppose Qt. Qt has it's source code publically available and has the options for users to download the binary files and let users use their API. My question is however, when they compiled their code, it was compiled to their specific HardWare, Operating System, all that stuff. I understand how many Software Require recompilation for different types of Operating Systems (Including 32 vs 64bit) and offer multiple downloads on their website, however how does it not go even further to suggest it is also Hardware Specific and eventually result in the redistribution of compiled executes extremely frustrating to produce?
Code gets compiled to a target base CPU (e.g. 32-bit x86, x86_64, or ARM), but not necessarily a specific processor like the Core i9-10900K. By default, the compiler typically generates the code to run on the widest range of processors. And Intel and AMD guarantee forward compatibility for running that code on newer processors. Compilers often offer switches for optimizing to run on newer processors with new instruction sets, but you rarely do that since not all your customers have that config. Or perhaps you build your code twice (once for older processors, and an optimized build for newer processors).
There's also a concept called cross-compiling. That's where the compiler generates code for a completely different processor than it runs on. Such is the case when you build your iOS app on a Mac. The compiler itself is an x86_64 program, but it's generating ARM CPU instruction set to run on the iPhone.
Code gets compiled and linked with a certain set of OS APIs and external runtime libraries (including the C/C++ runtime). If you want your code to run on Windows 7 or Mac OSX Maverics, you wouldn't statically link to an API that only exists on Windows 10 or Mac OS Big Sur. The code would compile, but it wouldn't run on the older operating systems. Instead, you'd do a workaround or conditionally load the API if it is available. Microsoft and Apple provides the forward compatibility of providing those same runtime library APIs to be available on later OS releases.
Additionally Windows supports running 32-bit processes on 64-bit chips and OS. Mac can even emulate x86_64 on their new ARM based devices coming out later this year. But I digress.
As for Qt, they actually offer several pre-built configurations for their reference binary downloads. Because, at least on Windows, the MSVCRT (C-runtime APIs from Visual Studio) are closely tied to different compiler versions of Visual Studio. So they offer various downloads to match the configuration you want to build your your code for (32-bit, 64-bit, VS2017, VS2019, etc...). So when you put together a complete application with 3rd party dependencies, some of these build, linkage, and CPU/OS configs have to be accounted for.

x86 32-bit Support for Cuda

I am working on a vision system and using Opencv for image processing and I have to present the whole system as a 32 bit ActiveX control to be integrated in an IWS (Indosoft Web Studio) application as IWS is 32 bit.
How can I do that as I would need a 32 bit Opencv build with cuda support and there is no 32 bit Cuda toolkit any more.
Can anyone please clarify the following from Nvidia.
Native development using the CUDA Toolkit on x86_32 is unsupported.
Deployment and execution of CUDA applications on x86_32 is still
supported, but is limited to use with GeForce GPUs. To create 32-bit
CUDA applications, use the cross-development capabilities of the CUDA
Toolkit on x86_64.
Support for developing and running x86 32-bit applications on x86_64
Windows is limited to use with: GeForce GPUs CUDA Driver CUDA Runtime
(cudart) CUDA Math Library (math.h) CUDA C++ Compiler (nvcc) CUDA
Development Tools
I can see the point but I can't find any direction on how to use the cross-development capabilities of the CUDA Toolkit on x86_64.
Echoing a comment into an answer -- yes you can cross compile to 32 bit output using a 64 bit CUDA tool chain in Windows. However, NVIDIA ceased delivering 32 bit CUDA application libraries many years ago. Quoting Robert Crovella:
This means that CUFFT, CUBLAS, NPP, and other such libraries are only
provided for use when the x64 platform is selected. If OpenCV had any
dependency on NPP, for example, you would be out of luck
Given OpenCV has dependencies on CUFFT, CUBLAS, and NPP, it is extremely unlikely that you can build and run a 32 bit OpenCV version using a modern CUDA toolkit because of the lack of libraries.

TBB Intel Threading Building Blocks for Raspberry Pi 3

So I am trying to compile Intel's TBB C++ library which enables parallelisms in programs. I am particularly needing this to use C++ React, which is a library which provides reactive library (e.g. asynchronous loops) for a project I am doing.
I have figured out how to compile it for Raspberry Pi 2. But my problem is that the guides I have seen have only updated for the ARM-7a architecture.
Currently, when I try to make a build which uses TBB as a dependency, I get this error:
In file included from /home/pi/tbb43_20150611oss/include/tbb/tbb_machine.h:247:0,
from /home/pi/tbb43_20150611oss/include/tbb/task.h:25,
from /home/pi/tbb43_20150611oss/include/tbb/task_group.h:24,
from /home/pi/cpp.react-master/include/react/engine/PulsecountEngine.h:18,
from /home/pi/cpp.react-master/src/engine/PulsecountEngine.cpp:7:
/home/pi/tbb43_20150611oss/include/tbb/machine/gcc_armv7.h:31:2: error: #error compilation requires an ARMv7-a architecture.
#error compilation requires an ARMv7-a architecture.
I just want to know how I can port TBB to work on ARM-53 for the new Raspberry Pi.
An easy solution such as replacing _ARM_ARCH_7A_ in gcc_arm7.h would be nice, but how do people go about porting TBB for other architectures?
Thank you
If you want to contribute to TBB (e.g. to port it for some other architecture), you can go to "submit contribution" page on the open source site and send your patch.
To port TBB on ARMv8, you have at least several options:
If ARMv8 and ARMv7 are very similar, you can try to extend the check on line 30 in gcc_arm7.h to work with ARMv8;
If ARMv8 and ARMv7 are quite different, you can create gcc_arm8.h (or gcc_arm with support v7 and v8) and improve the logic in tbb_machine.h near lines 246-248;
Theoretically, if gcc on ARMv8 supports built-in atomics, you can use gcc_generic.h on ARMv8 (see tbb_machine.h:249)
It looks like that you do not need to improve make files but I'd recommend running make test to be sure that modified TBB works correctly on your system.
[UPDATE] TBB has been ported to ARMv8 since version 2018 U5.
Latest update August 2018,
Check out my git: https://github.com/abhiTronix/TBB_Raspberry_pi
Latest binary (2018 - Update 4) of TBB for the Raspberry Pi exclusively for Raspberry Pi (.deb) file
compiled for a Raspberry Pi 2/3 Model B/B+ running Raspbian Stretch.
Enjoy ;)

Use Intel OpenCL.dll alongside a NVIDA CUDA installation

I have a computer that has an Intel CPU and an NVIDIA GPU, running Windows 7. I have a software module that is written in NVIDIA CUDA, and another module written in OpenCL. I would like to run the OpenCL module on the CPU, using the Intel implementation of OpenCL, and at the same time, use the CUDA module.
In my system I installed first the CUDA SDK, and then the SDK from Intel.
I've compiled the program in Visual Studio 2012, instructing the linker to use the Intel's library (and I compiled against the OpenCL headers provided by intel).
However when I run a simple program to query the hardware I'm only able to see the NVIDIA card.
I've tried modifying the Windows Registry, and the PATH variable, with no look. When I query the dependencies with "Dependecy Walker" I see that the program depends on a dll located in c:\windows\system32, which is not the folder where the Intel dll is. I've tried deleting this dll but I still see this dependency, and I'm only able to access the GPU.
Any idea about what could be happening?
On Windows, "OpenCL.dll" is the ICD provided by Khronos and redistributed by AMD, NVIDIA and Intel.
The actual drivers are referenced by the Registry, and the ICD enumerates them all.
When you query the OpenCL platforms, you'll see one for each installed driver (AMD, NVIDIA, Intel).
Within each platform there will be devices (or device), for example, in the NVIDIA platform you'll find your NVIDIA GPU and under the Intel platform you'll find your CPU.
Don't replace OpenCL.dll
Run clinfo or GPU-Z to see what OpenCL platforms and devices it sees.
Re-install the Intel CPU driver (a new one was just posted 2 days ago) to make sure their driver is installed.
Note: Your CPU needs to have SSE 4.2 for the Intel CPU driver to work.
You could try the Installable Client Driver (ICD) Loader. However, I have no experience if it works on Windows.
Or:
Since you don't want to use the GPU with OpenCL you can simply copy the Intel OpenCL.dll into your working directory. The working directory is visited first when .dlls are loaded. So, even if the Nvidia OpenCL.dll is installed into your windows/system32 directory the Intel library is found first and therefore loaded. There may be better solutions maybe load the dll on demand as discussed here Dynamically load a function from a DLL but as a fast solution it should work.