Using OpenBLAS with GSL - c++

I compiled GSL and OpenBLAS from source with all default options in both cases. My GSL libraries are installed in /usr/local/lib and OpenBLAS in /opt/OpenBLAS/lib. How do I use OpenBLAS with GSL in C++?
The main reason I am doing this is because OpenBLAS utilizes all cores which Atlas does not in default configuration. My main aim is to multiply two large matrices (10000 x 10000) and perform 2D convolutions. Is there a better alternative to OpenBLAS or GSL for this?
I am using:
Linux Mint 17.2
GCC version 4.8.4
20 Core Intel CPU
I have been experimenting with the same thing in Octave with OpenBLAS. Will I get a significant performance improvement by using C++?

I would use an existing linear algebra library like Armadillo. AFAIK it wraps your BLAS implementation for matrix multiplications. I like it because it provides you with a syntax very similar to the one in Matlab or Octave.
Other linear algebra libraries like Eigen will also do the job.
But i do not expect them to perform (much) better than Octave or Matlab as long as the call to the underlying library remains same. Also checkout why matlab is so fast and how armadillo is parallelized.

Related

Multithreaded MKL + OpenMP compiled with GCC

My understanding, from reading the Intel MKL documentation and posts such as this--
Calling multithreaded MKL in from openmp parallel region --
is that building OpenMP parallelization into your own code AND MKL internal OpenMP for MKL functions such as DGESVD or DPOTRF is impossible unless building with the Intel compiler. For example, I have a large linear system I'd like to solve using MKL, but I'd also like to take advantage of parallelization to build the system matrix (my own code independent of MKL), in the same binary executable.
Intel states in the MKL documentation that 3rd party compilers "may have to disable multithreading" for MKL functions. So the options are:
openmp parallelization of your own code (standard #pragma omp ... etc) and single-thread calls to MKL
multi-thread calls to MKL functions ONLY, and single-threaded code everywhere else
use the Intel compiler (I would like to use gcc, so not an option for me)
parallelize both your code and MKL with Intel TBB? (not sure if this would work)
Of course, MKL ships with it's own openmp build libiomp*, which gcc can link against. Is it possible to use this library to achieve parallelization of your own code in addition to MKL functions? I assume some direct management of threads would be involved. However as far as I can tell there are no iomp dev headers included with MKL, which may answer that question (--> NO).
So it seems at this point like the only answer is Intel TBB (Thread Building Blocks). Just wondering if I'm missing something or if there's a clever workaround.
(Edit:) Another solution might be if MKL has an interface to accept custom C++11 lambda functions or other arbitrary code (e.g., containing nested for loops) for parallelization via whatever internal threading scheme is being used. So far I haven't seen anything like this.
Intel TBB will also enable better nested parallelism, which might help in some cases. If you want to enable GNU OpenMP with MKL, there are following options:
Dynamically Selecting the Interface and Threading Layer. Links against mkl_rt library and then
set env var MKL_THREADING_LAYER=GNU prior to loading MKL
or call mkl_set_threading_layer(MKL_THREADING_GNU);
Linking with Threading Libraries directly (though, the link has no mentioning of GNU OpenMP explicitly). This is not recommended when you are building a library, a plug-in, or an extension module (e.g. Python's package), which can be mixed with other components that might use MKL differently. Link against mkl_gnu_thread.

C++ templates and OpenBLAS

There exist C++ libraries such as Eigen or Boost::uBlas that implement matrix types and computations.
There also exist libraries such as LAPACK, Goto-BLAS, OpenBLAS and ATLAS that implement highly optimized dense matrix computations over floating-point types.
I was wondering whether some C++ libraries, perhaps through specialization, call OpenBLAS for the types supported by OpenBLAS. It would seem the best of both worlds.
I don't know about Boost::uBlas, but using the current version (3.3 or higher) of Eigen it is possible to link to "any F77 compatible BLAS or LAPACK libraries" so assuming OpenBLAS is F77 compatible, yes. See this for details.

What is Armadillo+Atlas , Armadillo+OpenBLAS, Armadillo+uBLAS, Armadillo+MKL?

In many website they talk about Armadillo+something else. What do they mean?
I use Armadillo library in form of
#include <armadillo>
in Linux environment.
In this website
http://nghiaho.com/?p=1726
Armadillo+OpenBLAS is mentioned. What do they mean? How to use Armadillo+OpenBLAS?
UPDATE
Now is more than a year later. I just add this point that Armadillo is a wrapper over implementations such as BLAS or OpenBLAS. It is not a matrix operation implementation.
Instead of linking Armadillo based code with BLAS, you link with OpenBLAS. This can be done manually, or the Armadillo installer can figure out that OpenBLAS is present. See the FAQ for details.
Basically you need to install OpenBLAS first, then install Armadillo (not from a Linux repository, but the downloaded version).
Armadillo can do its own math or it can call 3rd party libraries to do the math. Atlas, BLAS, OpenBLAS, uBLAS, lapack, MKL are examples of such 3rd party libraries. If Armadillo does its own maths, it will be single thread. Some of these 3rd party libraries can do multi-thread eg OpenBLAS. Some libraries can use GPU eg nvBLAS from Nvidia. Note that nvBLAS only does partial blas implementation and you still need another blas library for what nvBLAS does not do.
You can control Armadillo by editing armadillo_bits/config.hpp or by using -D compiler option to set the relevant precompiler directives for your needs.
Something that might save you time: the order in which you link armadillo and 3rd party libraries is important. Armadillo calls to say lapack and lapack calls to blas so the order should be:
-larmadillo -llapack -lblas otherwise you will have link errors.
Be careful with the OpenBLAS version, i.e. you should install the version 0.2.14.
Otherwise you will have problems if you want to use multithreads.
So:
1 - remove everything that you have already installed (Armadillo or openBLAS).
2 - Install openBLAS ver 0.2.14
3 - Install Armadillo (if you use the repository probably you will not have access to the last version).
4 - Enjoy it!
In addition, you should use the key -lopenblas instead -lblas. Also, you must add the path to the folders (include, lib) in the openblas package (previously downloaded and made). In my experience, the order and number of installed packages doesn't matter. I experimented with different versions of openblas packages without reinstalling armadillo.

how to use lapack under windows

I want to use lapack and make C++ matrix wrapper for it, but lapack is written in Fortran, there are some clapack but I want to use it from source. compile firstly *.f and *.cpp files to object files then link it into a application..
the following apps and sources that I have.
visual studio proff edition,dev c++,ultimate++,mingw whatever
g95 and gfortran (under mingw) compiler
lapack (latest source)
blas (included in lapack)
How can I make an application please help...
My Operating System is Windows 7 and CPU Core2Duo and I dont have Intel math kernel
You can use the official C bindings for LAPACK, and then build your C++ wrapper around that. That avoids the issue of having to worry about the Fortran calling conventions, and the C bindings are a bit friendlier for C/C++ programmers than calling the Fortran routines directly.
Additionally, you could use one of the C++ matrix libraries that are already available, instead of rolling your own. I recommend Eigen.
PS.: Eigen matrices/vectors have a data() member that allows calling LAPACK without having to make temporary copies.

GSL blas routines slow in Visual Studio

I just installed GSL and BLAS on Visual Studio 2010 successfully using this guide:
However the matrix multiplications using cblas are ridicously slow. A friend on Linux had the same problem. Instead of linking via GSL to BLAS, he linked directly to cBLAS (I don't exactly understand what this means but maybe you do?) and it got about ten times as fast.
How can I do this in Visual Studio? In the file I downloaded I couldn't find any more files that I could build with Visual Studio.
BLAS was the fortran mathematics library of simple operations, like multiplying or adding vectors and matrices. It implemented the vector-vector, vector-matrix, and matrix-matrix operations.
Later, different libraries was created which do the same as original BLAS but with more performance. The interface was saved, so you can use any of BLAS-compatible library, e.g. from your CPU vendor.
This FAQ http://www.netlib.org/blas/faq.html has some libraries listed; wikipedia has another list: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
The only problem with GSL - is in using C language. Interface of BLAS may be converted to C in various ways (the problem is in fortran functions name translation to c functions name, e.g. fortran DGEMM may be called DGEMM or DGEMM in C). GSL uses CBLAS convention: cblas_ prefix, e.g. GEMM will be named cblas_gemm.
So, try some libraries, from the lists, and check, is there cblas_ function aliases in the library. If yes, gsl may use this library.