MAGMA library: difference between magma_dgemm and magmablas_dgemm - c++

In the most recent magma linear algebra library (version 1.6.1), http://icl.cs.utk.edu/magma/software/, in the testing code exercising dgemm functionality (source code: testing_dgemm.cpp), there was a call to functions magma_dgemm and magmablas_dgemm. Can someone clarify the difference between the two? Which one is more general (not tied to just GPU)?
Wirawan

An inspection of the source code reveals that magmablas_Xgemm is actually a C function that launches an appropriate gemm kernel on the GPU. Thus magmablas_Xgemm is a GPU-specific routine. On the other hand, magma_Xgemm is intended to be accelerator-agnostic routine that (currently) can be used for either GPU (NVIDIA/AMD, ...) or MIC.
Ref files, relative to MAGMA source directory (the CUDA edition):
./magmablas/dgemm_fermi.cu
./interface_cuda/blas_d.cpp

So, basically MAGMA includes gemm, both magma_gemm that wraps cublasgemm, and magmablas_*gemm that is Magma's open-source implementation.

Related

Compile code based on your library support

A portion of my C++ code is based on GPUs, so my one of my collegues that works on my same project doesn't have the possibility to compile it.
For example, in one file there is this line:
#include "opencv2/xfeatures2d/cuda.hpp"
Or in another file there is this line of code:
cv::cuda::GpuMat imgGpu, descriptorsGpu, keypoints;
imgGpu.upload(img);
Which are possible to compile only with CUDA (and GPUs) support.
How can we solve this? My only solution was to introduce a macro for every source file containing this code, wrap the section with macro and edit its value if you have the library supports, but this is a kind of nightmare.
Any better solution?
PS: our project is makefile-based
A preferred approach is to isolate all GPU dependent code into a separate library. It may be worth it to build a mock or dummy substitute library that exposes same API but does not require CUDA. This separation of responsibilities may prove invaluable if one day you need to substitute CUDA for Vulcan or some other framework.

How to tell if Suitesparse/CHOLMOD is using GPU?

I built Julia, which incorporates SuiteSparse, from scratch. When building the SuiteSparse dependency I ensured the instructions were followed for setting the relevant parts of the SuiteSparse_config.mk file.
However, having completed the build the execution time for c = A\b with 220k unknowns (very regular structure for A) isn't changed.
How can I test whether CHOLMOD is actively using the GPU or not?
I did notice that something similar was asked here. It was for a C/CUDA environment, but perhaps it applies.
From that answer:
Only the long integer version of CHOLMOD can leverage GPU acceleration.
The long integer version is distinguished by api calls like cholmod_l_start instead of cholmod_start.
It may be the case that Julia does not use the "long integer" version of CHOLMOD calls. I see no evidence for it in cholmod.jl.
As I said earlier, perhaps one of the Julia Language developers will pipe up if you file the issue in the repo. Otherwise, you may need to build Julia after changing cholmod.jl first.

GNU scientific lib : gsl_blas_dcopy vs gsl_vector_memcpy

I am using the GNU scientific library and I was wondering what was the differences between those two function to copy a vector to another :
gsl_blas_dcopy and gsl_vector_memcpy
do you have any idea which one would be the fastest ?
In the GSL manual, section 8.3.6, it says
However, it is useful to have a small number of utility functions
which do not require the full blas code. The following functions fall
into this category.
int gsl_vector_memcpy
So both are basically the same. If you already need BLAS functionality, use gsl_blas_dcopy.
Rumors say a BLAS implementation for you specific CPU might be the fastest.

C++ and Matlab combination

I am writing a simulation of some differential equation. My idea was the following:
1. Write the core simulation (moving forward in time, takes a lot of time) in C++
2. Do initialisation and the analysis of the results with a program
like Matlab/Scilab
The reason for (1) is that C++ is faster if implemented correctly.
The reason for (2) is that for me it is easier to make analysis, like plotting etc..., with a program like Matlab.
Is it possible to do it like this, how do I call C++ from Matlab?
Or do you have some suggestions to do it in a different way?
You could certainly do as you suggest. But I suggest instead that you start by developing your entire solution in Matlab and only then, if its performance is genuinely holding your work back, consider translating key elements into C++. This will optimise the use of your time, possibly at the cost of your computer's time. But a computer is a modern donkey without a humane society to intervene when you flog it to death.
As you suggest, well written C++ can be expected to be faster than interpreted Matlab. But ask yourself What is Matlab written in ? For much of its computationally-intensive core functionality Matlab calls libraries written in C++ (or whatever). Your task would be not to write code faster than interpreted Matlab, but faster than C++ (or whatever) written by specialists urged on by a huge market of installed software.
Yes, Matlab has a C/C++ API.
This API permits to:
Write C++ functions which can be invoked from Matlab
Read/Write data from a .mat file
Invoke the Matlab engine from C++
I am working to something similar to what you are trying to do, my approach is:
Import in C++ the input data from a .mat file
Run the simulation
Export the results back in a .mat file
The Matlab API is in C, and I suggest you to write a C++ wrapper for your convenience.
In order to work with Matlab mxArray, I suggest to take a look at the boost::multi_array library.
In particular you can initialize an object of type multi_array_ref from a Matlab mxArray like this:
boost::multi_array_ref<double,2> vec ( mxGetPr (p), boost::extents[10][10], boost::fortran_storage_order() );
This approach made the code much more readable.
You can call your own C, C++, or Fortran subroutines from the MATLAB command line as if they were built-in functions. These programs, called binary MEX-files, are dynamically-linked subroutines that the MATLAB interpreter loads and executes.
You should set compiler, look here Setting up mex to use the Visual Studio 2010 compiler.
All about MEX-files here: http://www.mathworks.com/help/matlab/matlab_external/using-mex-files-to-call-c-c-and-fortran-programs.html.

Share Fortran library without revealing source code

I have a software developed in-house. It is written in Fortran and consists of 3 kinds of files: 1) the solver files, 2) the models' files and 3) a file where the models used are defined. The solver also uses some libraries namely lapack and HSL ma41. Usually, I select the needed models for the user, compile all together and provide an executable.
I want to allow users to add their own models or modify the existing ones without being able to change/modify/see the solver source code.
One thought was to compile the solver into an object file. Then the user would compile the definition file and his models and link them together with the libraries. Is that possible? I guess then the user must have the same platform as the one the solver was compiled on? (ie Intel compiler on Windows 64-bit) So I'll need to built a library for any possible combination OS/hardware/compiler?
Another idea is to send the solver source also but use obfuscation. I can't find any tested/reliable solutions for that online? Is it a good option?
Thanks in advance.
You can distribute the object code in a library, as you propose. If the entry points for your code are in Fortran modules, then you also need to distribute the mod files (or equivalent for your compiler) that also result from compilation of the modules.
(If any of the entry points for your library code are external procedures then it is a convenience for your users if you provide interface blocks for those external procedures. These interface blocks can be in source form (the interface block contains no information beyond what your library's documentation would have to provide), or again could be pre-compiled into a module.)
Object code may be platform (architecture) specific, compiler specific, compiler version specific and in some cases compile options specific. Careful design and specification of the interface between your solver and the clients models can mitigate some of the potential variation. For example - many platforms have a well defined (perhaps through explicit specification or near ubiquitous convention) C application binary interface, so interfaces described using the C equivalent are typically robust, at the cost of significant loss of capability over a common-processor Fortran to Fortran interface.