c++ libraries for dealing with distributed matrices on a grid - c++

Which c++ libraries (or libraries in Fortran with c++ interfaces) do you recommend for doing BLAS or Sparse BLAS operations on distributed matrices in terms of speed and ease of use?

Use PBLAS directly. I don't happen to know of any C++ library to ease this. There is http://cppscalapack.sourceforge.net/, but it seems not maintained any more (last update in 2004) and in an alpha stage.
Distributed linear algebra is quite cumbersome to do, and this will involve a lot of work whatever library you happen to use. I therefore think that using PBLAS directly, abstracting the computations in classes as you go is a quite sensible thing to do: understanding the Fortran interface is not the hard part of the problem.

Related

Using MPI with Armadillo C++ for parallel diagonalization

There has been a post regarding usage of MPI with Armadillo in C++: here
My question is, wether Lapack and OpenBlas did implement MPI support?
I could not anything so far in their documentations.
There seems to be a different library called ScaLAPACK, which uses MPI. Is this library compatible with Armadillo. Is it any different from LAPACK in use?
I need to diagonalise extremely large matrices (over 1TB of memory), therefore I need to spread the memory over multiple nodes on a cluster using MPI, but I don't know how to deal with that using Armadillo.
Do you have any useful tip/reference where I could find how to do that?
Any Blas is single-process. Some Blas implementations do multi-threading. So MPI has nothing to do with this: in an MPI run, each process calls a non-distributed Blas routine.
Scalapack is distributed memory, based on MPI. It is very different from Lapack. Matrix handling is considerably more complicated. Some applications / libraries are able to use Scalapack, but you can not switch out Lapack for Scalapack: support for Scalapack needs to be added explicitly.
Armadillo mentions threading support through OpenMP, and there is no mention of MPI. Therefore, you can not use Armadillo over multiple nodes.
If you want to do distributed eigenvalue calculations, take a look at the PETSc library and the SLEPc package on top of it. Those are written in C, so they can easily (though not entirely idiomatically) be used from C++.

Generalizing to multiple BLAS/LAPACK Libraries

I am developing a linear algebra tool in C++, which relies heavily on matrix multiplication and decompositions (like LU, SVD), and is meant to be applied to large matrices. I developed it using Intel MKL for peak performance, but I don't want to release an Intel MKL only version, as I assume it will not work for people without Intel or who don't want to install MKL. Instead, I should release a more general code that is not Intel MKL-specific, but rather allows the user to specify which implementation of BLAS and LAPACK they would like to use (e.g. OpenBLAS, or ATLAS).
Although the function prototypes seem to be the same across implementations, there are several (helper?) functions and types that are specific to Intel MKL. For example, there is the MKL_INT type that I use, and also the mkl_malloc. This article suggests using macros to redefine the types, which was also my first thought. I assume I would also then have macros for the headers as well.
I believe it is standard for code to be written such that it is agnostic to the BLAS/LAPACK implementation, and I wanted to know if there was a cleaner way than relying on macros--particularly since the latter would require recompiling the code to switch, which does not seem to be necessary for other tools I have used.
Most scientific codes that rely on BLAS/LAPACK calls are implementation-agnostic. They usually require that the library is just linked as appropriate.
You've commented that the function prototypes are the same across implementations. This allows you to just have the prototypes in some myblas.h and mylapack.h headers then link whichever library you'd like to use.
It sounds like your primary concern is the implementation-specific stuff that you've utilized for MKL. The solution is to just not use this stuff. For example, the MKL types like MKL_INT are not special. They are C datatypes that have been defined to allow generalize between LP32/LP64/ILP64 libraries which MKL provides. See this table.
Also, stuff like mkl_malloc isn't special. It was introduced before the C standard had a thread-safe aligned alloc. In fact, that is all mkl_malloc is. So instead, just use aligned_alloc, or if you don't want to commit to C11 use _mm_malloc, memalign, etc...
On the other hand, MKL does provide some useful extensions to BLAS/LAPACK which aren't standardized (like transpositions, for example). However, this type of stuff is usually easy to implement with a special case BLAS/LAPACK call or easy enough to implement by yourself. MKL also has internal threading if you choose to use it, however, many BLAS/LAPACK libraries offer this.

Libraries for parallel distributed cholesky decomposition in c/c++ in mpi environment?

What libraries are available for parallel distributed cholesky decomposition of dense matrices in C/C++ in mpi environment?
I've found the ScaLAPACK library, and this might be the solution I'm looking for. It seems that it's a bit fiddly to call though, lots of Fortran <-> C conversions to do, which makes me think that maybe it is not widely used, and therefore maybe there are some other libraries that are used instead?
Alternatively, are there some wrappers for ScaLAPACK that make it relatively not too painful to use in a C or C++ environment, when one is already using MPI, and MPI has already been initialized in the program?
Are these dense or sparse matrices?
Trilinos is a huge library for parallel scientific computation. The sub-package Amesos can link to Scalapack for parallel, direct solution of dense systems and to UMFPACK, SuperLU or MUMPS for sparse systems. Trilinos is mostly in C++, but there are Python bindings if that's your taste. It might be overkill, but it'll get the job done.
Intel MKL might also be a choice, since it calls ScaLAPACK on the inside. Note that Intel supports student use of this library, but in this case you have to use an open source MPI version. Also the Intel forum is very helpful.
Elemental is also an option, written in C++, which is surely a big advantage when you want to integrate with your C/C++ application and the project leader, Jack Poulson is a very friendly and helps a lot.
OpenBLAS, SuperLU and PETSc are also interesting and you may want to read more in my answer.

(Re)Starting with C++ (for scientific computing)

I have a fair hang of programming in various languages. I have been implementing my codes for research using MATLAB (during the past few months) and for the first time really noticed the difference in execution speed of MATLAB v$ C. (As much as I love the blazingly fast prototyping capabilities).
I am looking to pickup C++ and start using it in my research. I am aware of OOP and have programmed fair bit of Java (relatively long back) and C++ (even longer back). I would like to really get deep into C++ now and hence need suggestions for resources on the same:
What C++ things I need to pick up (STLs and. ) to really make good use of C++?
What is a good tutorial/manual to get started with?
What are the numerical/scientific libraries for C++? GSL? Is there a equivalent (features) of Scipy/Numpy for C++?
I shall be programming on Linux, so I shall be using g++ .
Any pointers to previous SO questions also appreciated.
You'll want to get to grips with parallel programming as quickly as possible. For message-passing I like this book by Karniadakis and Kirby. Of the books on OpenMP, for distributed-memory programming, this one is the best.
If you can get access to them, then Intel's Threading Building Blocks, Maths Kernel Library, and Integrated Performance Primitives are good to have. If not, there are plenty of open source alternatives, start looking at Netlib.
Oh, I almost forgot BOOST, which is a must.
In regards to numerical stuff like Numpy, you should have a look at both:
Blitz++ http://www.oonumerics.org/blitz/
and
Jama/TNT http://math.nist.gov/tnt/download.html
On the library side, check out Armadillo. It almost gives you the full extent of MATLAB's array manipulation syntax and uses LAPACK and BLAS (ATLAS) under the hood.
This tutorial absolutely rocks, but you may not want to tackle it initially.
http://www.parashift.com/c++-faq/
Make sure to read up on the STL (standard template library) and other stuff, using sites like:
http://cplusplus.com/
And, check out the Boost library:
http://www.boost.org/
To make really good use of C++, you need to learn at least the STL, that alone will save you lots of time, but as parashift mentions, C++ OOP is only programming with objects, if you don't use dynamic bindings.
TRNG is a parallel random number generation library. It allows you to create multiple independent streams and was designed for use on clusters.

Package for distributing calculations

Do you know of any package for distributing calculations on several computers and/or several cores on each computer? The calculation code is in c++, the package needs to be able to cope with data >2GB and work on a windows x64 machine. Shareware would be nice, but isn't a requirement.
A suitable solution would depend on the type of calculation and data you wish you process, the granularity of parallelism you wish to achieve, and how much effort you are willing to invest in it.
The simplest would be to just use a suitable solver/library that supports parallelism (e.g.
scalapack). Or if you wish to roll your own solvers, you can squeeze out some paralleisation out of your current code using OpenMP or compilers that provide automatic paralleisation (e.g Intel C/C++ compiler). All these will give you a reasonable performance boost without requiring massive restructuring of your code.
At the other end of the spectrum, you have the MPI option. It can afford you the most performance boost if your algorithm parallelises well. It will however require a fair bit of reengineering.
Another alternative would be to go down the threading route. There are libraries an tools out there that will make this less of a nightmare. These are worth a look: Boost C++ Parallel programming library and Threading Building Block
You may want to look at OpenMP
There's an MPI library and the DVM system working on top of MPI. These are generic tools widely used for parallelizing a variety of tasks.