Matrix handler large sequences - c++

I m going to do an algorithm like needleman-wunsch algorithm or smith-watterman algorithm for large sequences. So I'm going to need a way to create matrices of different sizes so my question is which library gives me the best performance on that and would be easy to use.
P.S: I know that OpenCV, and Boost can handle the matrices but I don't know if there are good to do operations on it.

If “the best performance” is the requirement, then you have to look at NVIDIA CUDA or Intel MKL. Libraries like C++ boost uBLAS concentrate not on performance, but on usability.

Related

Fixed size SVD and solver in CUDA (in the device)

I implemented a program on the GPU (CUDA) which only uses the host (in C++) to start new kernels. During the calculation on the device I need SVD and solving systems of 3x3 (dense) matrices, fixed size.
I've got my own SVD and solver implementation but it is not numerical stable (thus not usable). Due to me being rather new with C++ and CUDA I would prefer to use a library instead. (numerical stuff is very tricky)
Now I have trouble finding that library:
cuSOLVER is not callable from the device
cuLA is not callable form the device (and abandoned so it seems)
Eigen looks promising (should be callable from device?) but it is unclear what the status is on CUDA support (it says experimental). I find people saying it works, others got compile errors?
Preferable I would also being able to do general matrix operations with the library (transpose, inversion, sum, multiply, ...) as my own implementations will likely be less efficient and numerically stable for those.
Any ideas on how to achieve this?
UPDATE:
Seems like Eigen supports basic functions like *,+, transpose and even eigenvalues but SVD, inverse ect is not yet supported. This is at the time of writing.
According to the website, a subset of features works for fixed size matrices (3x3 in your case) from Eigen 3.3. The current stable release is 3.2.6 while 3.3 is in alpha. I don't know if specifically SVD is supported in CUDA. I would recommend trying a small MCVE to see if it works (as well as the other functions you require), and if so, implementing it in your project.
I'm having a similar problem; want to generate random vectors within a kernel function which requires performing cholesky/eigenvalue decompositions of NxN (N<=5) covariance matrices. Since, as you noted, the MAGMA and CULA libraries are not available from the device, and there seems to be no cuSOLVER device API yet, I've resorted to implementing these myself following algorithms outlined in, for example, Numerical Recipes in C. As for solving linear systems, I'd suggest checking out the cuBLAS (level 2 functions), as it provides some basic functionality. If you want to invert matrices, I'd suggest cublasmatinvBatched(). I haven't used it myself, will give it a try during the weekend, but from the description it sounds promising. Hope others will chime into this thread with better solutions...

Matrix classes in c++

I'm doing some linear algebra math, and was looking for some really lightweight and simple to use matrix class that could handle different dimensions: 2x2, 2x1, 3x1 and 1x2 basically.
I presume such class could be implemented with templates and using some specialization in some cases, for performance.
Anybody know of any simple implementation available for use? I don't want "bloated" implementations, as I'll running this in an embedded environment where memory is constrained.
Thanks
You could try Blitz++ -- or Boost's uBLAS
I've recently looked at a variety of C++ matrix libraries, and my vote goes to Armadillo.
The library is heavily templated and header-only.
Armadillo also leverages templates to implement a delayed evaluation framework (resolved at compile time) to minimize temporaries in the generated code (resulting in reduced memory usage and increased performance).
However, these advanced features are only a burden to the compiler and not your implementation running in the embedded environment, because most Armadillo code 'evaporates' during compilation due to its design approach based on templates.
And despite all that, one of its main design goals has been ease of use - the API is deliberately similar in style to Matlab syntax (see the comparison table on the site).
Additionally, although Armadillo can work standalone, you might want to consider using it with LAPACK (and BLAS) implementations available to improve performance. A good option would be for instance OpenBLAS (or ATLAS). Check Armadillo's FAQ, it covers some important topics.
A quick search on Google dug up this presentation showing that Armadillo has already been used in embedded systems.
std::valarray is pretty lightweight.
I use Newmat libraries for matrix computations. It's open source and easy to use, although I'm not sure it fits your definition of lightweight (it includes over 50 source files which Visual Studio compiles it into a 1.8MB static library).
CML matrix is pretty good, but may not be lightweight enough for an embedded environment. Check it out anyway: http://cmldev.net/?p=418
Another option, altough may be too late is:
https://launchpad.net/lwmatrix
I for one wasn't able to find simple enough library so I wrote it myself: http://koti.welho.com/aarpikar/lib/
I think it should be able to handle different matrix dimensions (2x2, 3x3, 3x1, etc) by simply setting some rows or columns to zero. It won't be the most fastest approach since internally all operations will be done with 4x4 matrices. Although in theory there might exist that kind of processors that can handle 4x4-operations in one tick. At least I would much rather believe in existence of such processors that than go optimizing those low level matrix calculations. :)
How about just store the matrix in an array, like
2x3 matrix = {2,3,val1,val2,...,val6}
This is really simple, and addition operations are trivial. However, you need to write your own multiplication function.

ublas vs. matrix template library (MTL4)

I'm writing a software for hyperbolic partial differential equations in c++. Almost all notations are vector and matrix ones. On top of that, I need the linear algebra solver. And yes, the vector's and matrix's sizes can vary considerably (from say 1000 to sizes that can be solved only by distributed memory computing, eg. clusters or similar architecture). If I had lived in utopia, I'd had had linear solver which scales great for clusters, GPUs and multicores.
When thinking about the data structure that should represent the variables, I came accros the boost.ublas and MTL4.
Both libraries are blas level 3 compatible, MTL4 implements sparse solver and is much faster than ublas. They both don't have implemented support for multicore processors, not to mention parallelization for distributed memory computations. On the other hand, the development of MTL4 depends on sole effort of 2 developers (at least as I understood), and I'm sure there is a reason that the ublas is in the boost library. Furthermore, intel's mkl library includes the example for binding their structure with ublas.
I'd like to bind my data and software to the data structure that will be rock solid, developed and maintained for long period of time.
Finally, the question. What is your experience with the use of ublas and/or mtl4, and what would you recommend?
thanx,
mightydodol
With your requirements, I would probably go for BOOST::uBLAS. Indeed, a good deployment of uBLAS should be roughly on par with MTL4 regarding speed.
The reason is that there exist bindings for ATLAS (hence shared-memory parallelization that you can efficiently optimize for your computer), and also vendor-tuned implementations like the Intel Math Kernel Library or HP MLIB.
With these bindings, uBLAS with a well-tuned ATLAS / BLAS library doing the math should be fast enough. If you link against a given BLAS / ATLAS, you should be roughly on par with MTL4 linked against the same BLAS / ATLAS using the compiler flag -DMTL_HAS_BLAS, and most likely faster than the MTL4 without BLAS according to their own observation (example see here, where GotoBLAS outperforms MTL4).
To sum up, speed should not be your decisive factor as long as you are willing to use some BLAS library. Usability and support is more important. You have to decide, whether MTL or uBLAS is better suited for you. I tend towards uBLAS given that it is part of BOOST, and MTL4 currently only supports BLAS selectively. You might also find this slightly dated comparison of scientific C++ packages interesting.
One big BUT: for your requirements (extremely big matrices), I would probably skip the "syntactic sugar" uBLAS or MTL, and call the "metal" C interface of BLAS / LAPACK directly. But that's just me... Another advantage is that it should be easier than to switch to ScaLAPACK (distributed memory LAPACK, have never used it) for bigger problems. Just to be clear: for house-hold problems, I would not suggest calling a BLAS library directly.
If you're programming vectors, matrices, and linear algebra in C++, I'd look at Eigen:
http://eigen.tuxfamily.org/
It's faster than uBLAS (not sure about MTL4) and much cleaner syntax.
For new projects, it's probably best to stay away from Boost's uBlas. The uBlas FAQ even has this warning since late 2012:
Q: Should I use uBLAS for new projects?
... the last major improvement of uBLAS was in 2008 and no significant change was committed since 2009. ... Performance? There are faster alternatives. Cutting edge? uBLAS is more than 10 years old and missed all new stuff from C++11.
There is one C++ library missing in this list: FLENS
http://flens.sf.net
Disclaimer: Yes, this is my baby
It is header only
Comes with a simple, non-performant, generic (i.e. templated) C++ reference implemenation of BLAS.
If available you can use an optimized BLAS implementation as backend. In this case its like using BLAS directly (some Benchmark I should update).
You can use overloaded operators instead of calling BLAS functions.
It comes with its own, stand-alone, generic re-implemenation of a bunch of LAPACK functions. We call this port FLENS-LAPACK.
FLENS-LAPACK has exactly the same accuracy and performance as Netlib's LAPACK. And in my experience (FLENS-)LAPACK+ATLAS or (FLENS-)LAPACK+OpenBLAS gives you the same performance as ACML or MKL.
FLENS has a different policy regarding the creation of temporary vector/matrices in the evaluation of linear algebra expressions. The FLENS policy is: Never create them!!!. However, in a special debug-mode we allow the creation of temporaries "when necessary". This "when necessary" policy thing is the default in other libraries like Eigen or Armadillo or in Matlab.
You can see the performance differences directly here:
http://www.osl.iu.edu/research/mtl/mtl4/doc/performance.php3
Both are reasonable libraries to use in terms of their interfaces, I don't think that because uBLAS got through the BOOST review process it's necessarily way more robust. I've had my share of nightmares with unobvious side effects and unintended consequences from uBLAS implementations.
That's not to say uBLAS is bad, it's really good, but I think given the dramatic performances differences for MTL these days, it's worth using it instead of uBLAS even though it's arguably a bit more risky becuase of it's "only 2 developer" support group.
At the end of the day, it's about speed with a matrix library, go with MTL4.
From my own experience, MTL4 is much faster than uBLAS and it is also faster than Eigen.
There is a parallel version of MTL4. Just have a look at simunova

best lib for vector array in c++

I have to do calculation on array of 1,2,3...9 dimensional vectors, and the number of those vectors varies significantly (say from 100 to up to couple of millions). Of course, it would be great if the data container can be easily decomposed to enable parallel algorithms.
I came across blitz++(almost impossible to compile for me), but are there any other fast libs that manipulate array of vector data? Is boost::fusion worth a look? Furthermore, vtk's vtkDoubleArray seems nice, but vtk is lib used only for visualization. I must admit that having array of tuples is a tempting idea, but I didn't see any benchmarks regarding boost::fusion and/or vtkDoubleArray. Just as they are not built for speed in mind. Any thoughts?
best regards,
mightydodol
Eigen, supports auto-vectorisation of vector on certains compilers (GCC 4, VC++ 2008).
For linear algebra, you probably want to evaluate Boost uBLAS, which is a subset of the full BLAS package. As you mention, Boost Fusion may also be appropriate, depending on the algorithms you are implementing.
I believe you can use the non-GUI parts of VTK such as vtkDoubleArray without linking in the visualisation libraries if you don't need them. Note that VTK is designed for efficiency of rendering, not of calculations. If you don't want to render the results, you might as well use one of the scientific packages that provide optimized algorithms.
There is a Parallel flavour of BLAS called (strangely enough) PBLAS. I don't think this is available through the Boost wrapping, so you would use the C interface directly.
Without knowing what yo want to do with your arrays, it's hard to give really firm advice. If high performance manipulation of the arrays is needed then Blitz++ is probably your best bet. If you are having trouble compiling it then perhaps you need to change your compiler or system. They do support g++ so a recent version on just about anything should get you going.
I haven't used Boost::fusion but a quick read of the manual suggests that it's major goal is just to make heterogeneous containers. I don't think that's what you want.
I have tried to use the GSL but find it hopelessly awkward for anything I have wanted to do.
I'm no expert, but you might want to consider using a MATLAB API.
There is the GNU Scientific Library for operation in vector or matrix
I would try using Blitz++, it will give you a really good performance. Armadillo is also quite efficient.

Open source C++ library for vector mathematics

I would need some basic vector mathematics constructs in an application. Dot product, cross product. Finding the intersection of lines, that kind of stuff.
I can do this by myself (in fact, have already) but isn't there a "standard" to use so bugs and possible optimizations would not be on me?
Boost does not have it. Their mathematics part is about statistical functions, as far as I was able to see.
Addendum:
Boost 1.37 indeed seems to have this. They also gracefully introduce a number of other solutions at the field, and why they still went and did their own. I like that.
Re-check that ol'good friend of C++ programmers called Boost. It has a linear algebra package that may well suits your needs.
I've not tested it, but the C++ eigen library is becoming increasingly more popular these days. According to them, they are on par with the fastest libraries around there and their API looks quite neat to me.
Armadillo
Armadillo employs a delayed evaluation
approach to combine several operations
into one and reduce (or eliminate) the
need for temporaries. Where
applicable, the order of operations is
optimised. Delayed evaluation and
optimisation are achieved through
recursive templates and template
meta-programming.
While chained operations such as
addition, subtraction and
multiplication (matrix and
element-wise) are the primary targets
for speed-up opportunities, other
operations, such as manipulation of
submatrices, can also be optimised.
Care was taken to maintain efficiency
for both "small" and "big" matrices.
I would stay away from using NRC code for anything other than learning the concepts.
I think what you are looking for is Blitz++
Check www.netlib.org, which is maintained by Oak Ridge National Lab and the University of Tennessee. You can search for numerical packages there. There's also Numerical Recipes in C++, which has code that goes with it, but the C++ version of the book is somewhat expensive and I've heard the code described as "terrible." The C and FORTRAN versions are free, and the associated code is quite good.
There is a nice Vector library for 3d graphics in the prophecy SDK:
Check out http://www.twilight3d.com/downloads.html
For linear algebra: try JAMA/TNT . That would cover dot products. (+matrix factoring and other stuff) As far as vector cross products (really valid only for 3D, otherwise I think you get into tensors), I'm not sure.
For an extremely lightweight (single .h file) library, check out CImg. It's geared towards image processing, but has no problem handling vectors.