Generalizing to multiple BLAS/LAPACK Libraries - c++

I am developing a linear algebra tool in C++, which relies heavily on matrix multiplication and decompositions (like LU, SVD), and is meant to be applied to large matrices. I developed it using Intel MKL for peak performance, but I don't want to release an Intel MKL only version, as I assume it will not work for people without Intel or who don't want to install MKL. Instead, I should release a more general code that is not Intel MKL-specific, but rather allows the user to specify which implementation of BLAS and LAPACK they would like to use (e.g. OpenBLAS, or ATLAS).
Although the function prototypes seem to be the same across implementations, there are several (helper?) functions and types that are specific to Intel MKL. For example, there is the MKL_INT type that I use, and also the mkl_malloc. This article suggests using macros to redefine the types, which was also my first thought. I assume I would also then have macros for the headers as well.
I believe it is standard for code to be written such that it is agnostic to the BLAS/LAPACK implementation, and I wanted to know if there was a cleaner way than relying on macros--particularly since the latter would require recompiling the code to switch, which does not seem to be necessary for other tools I have used.

Most scientific codes that rely on BLAS/LAPACK calls are implementation-agnostic. They usually require that the library is just linked as appropriate.
You've commented that the function prototypes are the same across implementations. This allows you to just have the prototypes in some myblas.h and mylapack.h headers then link whichever library you'd like to use.
It sounds like your primary concern is the implementation-specific stuff that you've utilized for MKL. The solution is to just not use this stuff. For example, the MKL types like MKL_INT are not special. They are C datatypes that have been defined to allow generalize between LP32/LP64/ILP64 libraries which MKL provides. See this table.
Also, stuff like mkl_malloc isn't special. It was introduced before the C standard had a thread-safe aligned alloc. In fact, that is all mkl_malloc is. So instead, just use aligned_alloc, or if you don't want to commit to C11 use _mm_malloc, memalign, etc...
On the other hand, MKL does provide some useful extensions to BLAS/LAPACK which aren't standardized (like transpositions, for example). However, this type of stuff is usually easy to implement with a special case BLAS/LAPACK call or easy enough to implement by yourself. MKL also has internal threading if you choose to use it, however, many BLAS/LAPACK libraries offer this.

Related

How to find Boost libraries that does not contain any platform specific code

For our current project, we are thinking to use Boost framework.
However, the project should be truly cross-platform and might be shipped to some exotic platforms. Therefore, we would like to use only Boost packages (libraries) that does not contain any platform specific code: pure C++ and that's all.
Boost has the idea of header-only packages (libraries).
Can one assume that these packages (libraries) are free from platform specific code?
In case if not, is there a way to identify these kind of packages of Boost?
All C++ code is platform-specific to some extent. On the one side, there is this ideal concept of "pure standard C++ code", and on the other side, there is reality. Most of the Boost libraries are designed to maintain the ideal situation on the user-side, meaning that you, as the user of Boost, can write platform-agnostic standard C++ code, while all the underlying platform-specific code is hidden away in the guts of those Boost libraries (for those that need them).
But at the core of this issue is the problem of how to define platform-specific code versus standard C++ code in the real world. You can, of course, look at the standard document and say that anything outside of it is platform-specific, but that's nothing more than an academic discussion.
If we start from this scenario: assume we have a platform that only has a C++ compiler and a C++ standard library implementation, and no other OS or OS-specific API to rely on for other things that aren't covered by the standard library. Well, at that point, you still have to ask yourself:
What compiler is this? What version?
Is the standard library implementation correct? Bug-free?
Are those two entirely standard-compliant?
As far as I know, there is essentially no universal answer to this and there are no realistic guarantees. Most exotic platforms rely on exotic (or old) compilers with partial or non-compliant standard library implementations, and sometimes have self-imposed restrictions (e.g., no exceptions, no RTTI, etc.). An enormous amount of "pure standard C++ code" would never compile on these platforms.
Then, there is also the reality that most platforms today, even really small embedded systems have an operating system. The vast majority of them are POSIX compliant to some level (except for Windows, but Windows doesn't support any exotic platform anyways). So, in effect, platform-specific code that relies on POSIX functions is not really that bad since it is likely that most exotic platforms have them, for the most part.
I guess what I'm really getting at here is that this pure dividing line that you have in your mind about "pure C++" versus platform-specific code is really just an imaginary one. Every platform (compiler + std-lib + OS + ext-libs) lies somewhere along a continuum of level of support for standard language features, standard library features, OS API functions, and so on. And by that measure, all C++ code is platform-specific.
The only real question is how wide of a net it casts. For example, most Boost libraries (except for recent "flimsy" ones) generally support compilers down to a reasonable level of C++98 support, and many even try to support as far back as early 90s compilers and std-libs.
To know if a library, part of Boost or not, has wide enough support for your intended applications or platforms, you have the define the boundaries of that support. Just saying "pure C++" is not enough, it means nothing in the real world. You cannot say that you will be using C++11 compilers just after you've taken Boost.Thread as an example of a library with platform-specific code. Many C++11 implementations have very flimsy support for std::thread, but others do better, and that issue is as much of a "platform-specific" issue as using Boost.Thread will ever be.
The only real way to ever be sure about your platform support envelope is to actual set up machines (e.g., virtual machines, emulators, or real hardware) that will provide representative worst-cases. You have to select those worst-case machines based on a realistic assessment of what your clients may be using, and you have to keep that assessment up to date. You can create a regression test suite for your particular project, that uses the particular (Boost) libraries, and test that suite on all your worst-case test environments. Whatever doesn't pass the test, doesn't pass the test, it's that simple. And yes, you might find out in the future that some Boost library won't work under some new exotic platform, and if that happens you need to either get the Boost dev-team to add code to support it, or you have to re-write your code to get around it, but that's what software maintenance is all about, and it's a cost you have to anticipate, and such problems will come not only from Boost, but from the OS and from the compiler vendors too! At least, with Boost, you can fix the code yourself and contribute it to Boost, which you can't always do with OS or compiler vendors.
We had "Boost or not" discussion too. We decided not to use it.
We had some untypical hardware platforms to serve with one source code. Especially running boost on AVR was simply impossible because RTTI and exceptions, which Boost requires for a lot of things, aren't available.
There are parts of boost which use compiler specific "hacks" to e.g. get information about class structure.
We tried splitting the packages, but the inter dependency is quite high (at least 3 or 4 years ago).
In the meantime, C++11 was underway and GCC started supporting more and more. With that many reasons to use from boost faded (Which Boost features overlap with C++11?). We implemented the rest of our needs from scratch (with relative low effort thanks to variadic templates and other TMP features in C++11).
After a steep learning curve we have all we need without external libraries.
At the same time we have pondered the future of Boost. We expected the newly standardized C++11 features would be removed from boost. I don't know the current roadmap for Boost, but at the time our uncertainty made us vote against Boost.
This is not a real answer to your question, but it may help you decide whether to use Boost. (And sorry, it was to large for a comment)

Libraries for parallel distributed cholesky decomposition in c/c++ in mpi environment?

What libraries are available for parallel distributed cholesky decomposition of dense matrices in C/C++ in mpi environment?
I've found the ScaLAPACK library, and this might be the solution I'm looking for. It seems that it's a bit fiddly to call though, lots of Fortran <-> C conversions to do, which makes me think that maybe it is not widely used, and therefore maybe there are some other libraries that are used instead?
Alternatively, are there some wrappers for ScaLAPACK that make it relatively not too painful to use in a C or C++ environment, when one is already using MPI, and MPI has already been initialized in the program?
Are these dense or sparse matrices?
Trilinos is a huge library for parallel scientific computation. The sub-package Amesos can link to Scalapack for parallel, direct solution of dense systems and to UMFPACK, SuperLU or MUMPS for sparse systems. Trilinos is mostly in C++, but there are Python bindings if that's your taste. It might be overkill, but it'll get the job done.
Intel MKL might also be a choice, since it calls ScaLAPACK on the inside. Note that Intel supports student use of this library, but in this case you have to use an open source MPI version. Also the Intel forum is very helpful.
Elemental is also an option, written in C++, which is surely a big advantage when you want to integrate with your C/C++ application and the project leader, Jack Poulson is a very friendly and helps a lot.
OpenBLAS, SuperLU and PETSc are also interesting and you may want to read more in my answer.

c++ libraries for dealing with distributed matrices on a grid

Which c++ libraries (or libraries in Fortran with c++ interfaces) do you recommend for doing BLAS or Sparse BLAS operations on distributed matrices in terms of speed and ease of use?
Use PBLAS directly. I don't happen to know of any C++ library to ease this. There is http://cppscalapack.sourceforge.net/, but it seems not maintained any more (last update in 2004) and in an alpha stage.
Distributed linear algebra is quite cumbersome to do, and this will involve a lot of work whatever library you happen to use. I therefore think that using PBLAS directly, abstracting the computations in classes as you go is a quite sensible thing to do: understanding the Fortran interface is not the hard part of the problem.

Is it possible to use System C data types in C++ without the entire System C kernel?

System C provides arbitrary length integer types that can be manipulated either as numbers (i.e. with support for artihmetic) or as bit-vectors (i.e. with support for logic operations and working with sub-vectors).
System C also provides support for all sorts of other things I don't want, such as clocks, flip flops and such, as well as its own runtime. I'm picky - I want the datatypes without the overhead.
Can these data types be used independently of the rest of the system C kernel? If so, how?
At least TTBOMK, no. There are quite a few libraries that support arbitrary-length integers in C++ without the hardware-design "stuff" in SystemC though (e.g., NTL, GMP, MIRACL). Some of them do add more than just plain arbitrary-precision arithmetic (e.g., various functions used heavily on number theory).
OTOH, given the typical implementations, at least if you use them as static libraries, only what you actually use will be linked into your executable.
I'm not familiar with SystemC, but I do always like to point out that in open source projects, you can get the answer from the horse's mouth.
Browsing the CPP files which implement the integer type, it seems to depend on things in datatypes/, utils/, and kernel/:
http://github.com/systemc/systemc-2.2.0/tree/master/src/sysc/datatypes/int/
If the static linking that Jerry suggests doesn't pare it down enough to what seems reasonable (due to some kind of unnecessary global or subsystem inits), you could fork it off GitHub for your minimalist version if it's important to do so...but there's always a cost to maintaining your own branch.
(Or you could contribute a meta-system for paring down bits of system-C people don't need which might get incorporated into the main distribution!)
I had this same need: I wanted the flexible length integers and bit-vectors in C++. I did not want the rest of SystemC, particularly the runtime.
I investigated GMP and similar and found them overly large and complex for my needs.
I found a very similar set of datatypes that may provide exactly what you want are available as AC Datatypes at https://hlslibs.org They are available under Apache 2.0 license. The integer datatypes are very similar to SystemC integer/bit-vector datatypes.
However, they are very lightweight: just include the appropriate header file. No runtime component like SystemC.

Does Blitz++ use BLAS routines when it is possible and appropriate to

I know that Blitz++ gets its performance plus by extensive usage of expression templates and template metaprogramms. But at some point you can't get more out of your code by using these techniques - you have to multiply and sum some floats up. At this point you can get a final performance kick by using the highly optimized (especially for special architectures) BLAS routines. Does the current implementation of Blitz++ use BLAS routines whenever it is possible?
Only for benchmarks you must specify it when you configure blitz++:
./configure -with-blas=...
Blitz does not use Blas routines.