NumPy style arrays for C++? [closed] - c++

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Are there any C++ (or C) libs that have NumPy-like arrays with support for slicing, vectorized operations, adding and subtracting contents element-by-element, etc.?

Here are several free software that may suit your needs.
The GNU Scientific Library is a GPL software written in C. Thus, it has a C-like allocation and way of programming (pointers, etc.). With the GSLwrap, you can have a C++ way of programming, while still using the GSL. GSL has a BLAS implementation, but you can use ATLAS instead of the default CBLAS, if you want even more performances.
The boost/uBLAS library is a BSL library, written in C++ and distributed as a boost package. It is a C++-way of implementing the BLAS standard. uBLAS comes with a few linear algebra functions, and there is an experimental binding to ATLAS.
eigen is a linear algebra library written in C++, distributed under the MPL2 license (starting from version 3.1.1) or LGPL3/GPL2 (older versions). It's a C++ way of programming, but more integrated than the two others (more algorithms and data structures are available). Eigen claims to be faster than the BLAS implementations above, while not following the de-facto standard BLAS API. Eigen does not seem to put a lot of effort on parallel implementation.
Armadillo is LGPL3 library for C++. It has binding for LAPACK (the library used by numpy). It uses recursive templates and template meta-programming, which is a good point (I don't know if other libraries are doing it also?).
xtensor is a C++ library that is BSD licensed. It offers A C++ API very similar to that of NumPy. See https://xtensor.readthedocs.io/en/latest/numpy.html for a cheat sheet.
These alternatives are really good if you just want to get data structures and basic linear algebra. Depending on your taste about style, license or sysadmin challenges (installing big libraries like LAPACK may be difficult), you may choose the one that best suits your needs.

Try out xtensor. (See the NumPy to Xtensor Cheat Sheet).
xtensor is a C++ library meant for numerical analysis with multi-dimensional array expressions.
xtensor provides
an extensible expression system enabling numpy-style broadcasting.
an API following the idioms of the C++ standard library.
tools to manipulate array expressions and build upon xtensor.
Example
Initialize a 2-D array and compute the sum of one of its rows and a 1-D array.
#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
xt::xarray<double> arr1
{{1.0, 2.0, 3.0},
{2.0, 5.0, 7.0},
{2.0, 5.0, 7.0}};
xt::xarray<double> arr2
{5.0, 6.0, 7.0};
xt::xarray<double> res = xt::view(arr1, 1) + arr2;
std::cout << res;
Outputs
{7, 11, 14}
Initialize a 1-D array and reshape it inplace.
#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
xt::xarray<int> arr
{1, 2, 3, 4, 5, 6, 7, 8, 9};
arr.reshape({3, 3});
std::cout << arr;
Outputs
{{1, 2, 3},
{4, 5, 6},
{7, 8, 9}}

DyND is designed to be, among other things, a NumPy-like library for C++. Things like broadcasting, arithmetic operators, and slicing all work fine. On the other hand, it is still very experimental and many features haven't been implemented yet.
Here's a simple implementation of the de Casteljau algorithm in C++ using DyND arrays:
#include <iostream>
#include <dynd/array.hpp>
using namespace dynd;
nd::array decasteljau(nd::array a, double t){
size_t e = a.get_dim_size();
for(size_t i=0; i < e-1; i++){
a = (1.-t) * a(irange()<(e-i-1)) + t * a(0<irange());
}
return a;
}
int main(){
nd::array a = {1., 2., 2., -1.};
std::cout << decasteljau(a, .25) << std::endl;
}
I wrote a blog post a little while back with more examples and side-by-side comparisons of the syntax for Fortran 90, DyND in C++, and NumPy in Python.
Disclaimer: I'm one of the current DyND developers.

This is an old question. Still felt like answering. Thought might help many, Especially pydevs coding in C++.
If you have already worked with python numpy, then NumCpp is a great choice. It's minimalistic in syntax and has got similar functions or methods as py numpy.
The comparison part in the readme doc is also very very cool.
NumCpp
nc::NdArray<int> arr = {{4, 2}, {9, 4}, {5, 6}};
arr.reshape(5, 3);
arr.astype<double>();

Use LibTorch (PyTorch frontend for C++) and be happy.

If you want to use multidimensional array(like numpy) for image processing or neural network, you can use OpenCV cv::Mat along with tons of image processing algorithms. In case you just want to use it for matrix operations ONLY, you just have to compile respective opencv modules to reduce the size and have tiny OpenCV library.
cv::Mat(Matrix) is an n-dimensional array that can be used to store various type of data, such as RGB, HSV or grayscale images, vectors with real or complex values, other matrices etc.
A Mat contains the following information: width, height, type, channels, data, flags, datastart, dataend and so on.
It has several methods for matrix manipulation. Bonus you can create then on CUDA cores as well as cv::cuda::GpuMat.
Consider I want to create a matrix with 10 rows, 20 columns, type CV_32FC3:
int R = 10, C = 20;
Mat m1;
m1.create(R, C, CV_32FC3); //creates empty matrix
Mat m2(cv::Size(R, C), CV_32FC3); // creates a matrix with R rows, C columns with data type T where R and C are integers,
Mat m3(R, C, CV_32FC3); // same as m2
BONUS:
Compile tiny and compact opencv library for just matrix operations. One of the ways is like as mentioned in this article.
OR
compile opencv source code using following cmake command:
$ git clone https://github.com/opencv/opencv.git
$ cd opencv
$ git checkout <version you want to checkout>
$ mkdir build
$ cd build
$ cmake -D WITH_CUDA=OFF -D WITH_MATLAB=OFF -D BUILD_ANDROID_EXAMPLES=OFF -D BUILD_DOCS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -DANDROID_STL=c++_shared -DBUILD_SHARED_LIBS=ON -D BUILD_opencv_objdetect=OFF -D BUILD_opencv_video=OFF -D BUILD_opencv_videoio=OFF -D BUILD_opencv_features2d=OFF -D BUILD_opencv_flann=OFF -D BUILD_opencv_highgui=OFF -D BUILD_opencv_ml=OFF -D BUILD_opencv_photo=OFF -D BUILD_opencv_python=OFF -D BUILD_opencv_shape=OFF -D BUILD_opencv_stitching=OFF -D BUILD_opencv_superres=OFF -D BUILD_opencv_ts=OFF -D BUILD_opencv_videostab=OFF -D BUILD_opencv_dnn=OFF -D BUILD_opencv_imgproc=OFF ..
$ make -j $nproc
$ sudo make install
Try this example:
#include "opencv2/core.hpp"
#include<iostream>
int main()
{
std::cout << "OpenCV Version " << CV_VERSION << std::endl;
int R = 2, C = 4;
cv::Mat m1;
m1.create(R, C, CV_32FC1); //creates empty matrix
std::cout << "My Mat : \n" << m1 << std::endl;
}
Compile the code with following command:
$ g++ -std=c++11 opencv_mat.cc -o opencv_mat `pkg-config --libs opencv` `pkg-config --cflags opencv`
Run the executable:
$ ./opencv_mat
OpenCV Version 3.4.2
My Mat :
[0, 0, 0, 0;
0, 0, 0, 0]

Eigen is a good linear algebra library.
http://eigen.tuxfamily.org/index.php?title=Main_Page
It is quite easy to install since it's a header-only library. It relies on template in order to to generate well optimized code. It vectorizes automatically the matrix operations.
It also fully support coefficient wise operations, such as the "per element multiplication" between two matrices for instance. It is what you need?

xtensor is good, but I ended up writing a mini-library myself as a toy project with c++20, while trying to keep the interface as simple as possible. Here it is: https://github.com/gbalduzz/NDArray
Example code:
using namespace nd;
NDArray<int, 2> m(3, 3); // 3x3 matrix
m = 2; // assign 2 to all
m(-1, all) = 1; // assign 1 to the last row.
auto tile = m(range{1, end}, range{1, end}); // 2x2 tile
std::sort(tile.begin(), tile.end());
std::cout << m; // prints [[2, 2, 2], [2, 1, 1], [1, 2, 2]]
It does not provide fancy arithmetic operators collapsing multiple operations together, yet, but you can broadcast arbitrary lambdas to a set of tensors with the same shape, or use lazily evaluated arithmetic operators.
Let me know what do you think about the interface and how it compares with the other options, and if this has any hope, what sort of operations you would like to see implemented.
Free license and no dependency!
Addendum: I managed to properly compile and run xtensor, and the result is that my library is significantly faster when iterating over views (2 to 3X)

Blitz++ supports arrays with an arbitrary number of axes, whereas Armadillo only supports up to three (vectors, matrices, and cubes). Eigen only supports vectors and matrices (not cubes). The downside is that Blitz++ doesn't have linear algebra functions beyond the basic entrywise operations and tensor contractions. Development seems to have slowed down quite some time ago, but perhaps that's just because the library does what it does and not many changes need to be made.

VIGRA contains a good N-dimensional array implementation:
http://ukoethe.github.io/vigra/doc/vigra/Tutorial.html
I use it extensively, and find it very simple and effective. It's also header only, so very easy to integrate into your development environment. It's the closest thing I've come across to using NumPy in terms of it's API.
The main downside is that it isn't so widely used as the others, so you won't find much help online. That, and it's awkwardly named (try searching for it!)

While GLM is designed to mesh easily with OpenGL and GLSL, it is a fully functional header only math library for C++ with a very intuitive set of interfaces.
It declares vector & matrix types as well as various operations on them.
Multiplying two matrices is a simple as (M1 * M2). Subtracting two vectors (V1- V2).
Accessing values contained in vectors or matrices is equally simple. After declaring a vec3 vector for example, one can access its first element with vector.x. Check it out.

Eigen is a template library for linear algebra (matrices, vectors…). It is header only and free to use (LGPL).

The GSL is great, it does all of what you're asking and much more. It is licensed under the GPL though.

Related

Element-wise operations in C++

Is there a preexisting library that will let me create array-like objects which have the following properties:
Run time size specification (chosen at instantition, not grown or shrunk afterwards)
Operators overloaded to perform element wise operations (i.e. c=a+b will result in a vector c with c[i]=a[i]+b[i] for all i, and similarly for *, -, /, etc)
A good set of functions which act elementwise, for example x=sqrt(vec) will have elements x[i]=sqrt(vec[i])
Provide "summarising" functions such as sum(vec), mean(vec) etc
(Optional) Operations can be sent to a GPU for processing.
Basically something like the way arrays work in Fortran, with all of the implementation hidden. Currently I am using vector from the STL and manually overloading the operators, but I feel like this is probably a solved problem.
In the dusty corners of standard library, long forgotten by everyone, sits a class called valarray. Look it up and see if it suits your needs.
From manual page at cppreference.com:
std::valarray is the class for representing and manipulating arrays of values. It supports element-wise mathematical operations and various forms of generalized subscript operators, slicing and indirect access.
A code snippet for illustration:
#include <valarray>
#include <algorithm>
#include <iterator>
#include <iostream>
int main()
{
std::valarray<int> a { 1, 2, 3, 4, 5};
std::valarray<int> b = a;
std::valarray<int> c = a + b;
std::copy(begin(c), end(c),
std::ostream_iterator<int>(std::cout, " "));
}
Output: 2 4 6 8 10
You can use Cilk Plus Extentions (https://www.cilkplus.org/) that provides array notation by applying element-wise operations to arrays of the same shape for C/C++. It explores the vector parallelism from your processor as well co-processor.
Example:
Standard C code:
for (i=0; i<MAX; i++)
c[i]=a[i]+b[i];
Cilk Plus - Array notation:
c[i:MAX]=a[i:MAX]+b[i:MAX];
Stride sections like:
float d[10] = {0,1,2,3,4,5,6,7,8,9};
float x[3];
x[:] = d[0:3:2]; //x contains 0,2,4 values
You can use reductions on arrays sections:
_sec_reduce_add(a[0:n]);
Interest reading:
http://software.intel.com/en-us/articles/getting-started-with-intel-cilk-plus-array-notations
The Thrust library, which is part of the CUDA toolkit, provides an STL-like interface for vector operations on GPUs. It also has an OpenMP back end, however the GPU support utilizes CUDA, so you are limited to NVIDIA GPUs. You will have to do your own wrapping (say with expression templates) if you want to have expressions like c=a+b work for vectors
https://code.google.com/p/thrust/
The VienaCL library takes a more high level approach, providing vector and matrix operations like you want. It has both CUDA and OpenCL back ends, so you can use GPUs (and multi-core CPUs) from different vendors.
http://viennacl.sourceforge.net/
The vexcl library also looks very promising (again with support for both OpenCL and CUDA)
https://github.com/ddemidov/vexcl

Is there a `numpy.minimum` equivalent in GSL?

I'm working on porting a complex data analysis routine I "prototyped" in Python to C++. I used Numpy extensively throughout the Python code. I'm looking at employing the GSL in the C++ port since it implements all of the various numerical routines I require (whereas Armadillo, Eigen, etc. only have a subset of what I need, though their APIs are closer to what I am looking for).
Is there an equivalent to numpy.minimum in the GSL (i.e., element-wise minimum of two matrices)? This is just one example of the abstractions from Numpy that I am looking for. Do things like this simply have to be reimplemented manually when using the GSL? I note that the GSL provides for things like:
double gsl_matrix_min (const gsl_matrix * m)
But that simply provides the minimum value of the entire matrix. Regardless of element-wise comparisons, it doesn't even seem possible to report the minimum along a particular axis of a single matrix using the GSL. That surprises me.
Are my expectations misplaced?
You can implement an element-wise minimum easily in Armadillo, via the find() and .elem() functions:
mat A; A.randu(5,5);
mat B; B.randu(5,5);
umat indices = find(B < A);
mat C = A;
C.elem(indices) = B.elem(indices);
For other functions that are not present in Armadillo, it might be possible to interface Armadillo matrices with GSL functions, through the .memptr() function.

What is the best way to create a pentadiagonal sparse matrix for finite difference method in c/c++?

In MATLAB, it is very convenient to create a pentadiagonal sparse matrix using commands like this:
I = eye(m); % create identity matrix
e = ones(m,1); % create an array of all 1's
T = spdiags([e -4*e e],[-1 0 1],m,m);
S = spdiags([e e],[-1 1],m,m);
A = (kron(I,T) + kron(S,I))/hˆ2;
I was wondering if there is any neat trick to do the same in c/c++.
There is no sparse Matrix type in C++. But there are a lot of open source algebra libraries around the web (or you can write your own).
Boost uBLAS supports sparse matrices, and it's probably the best choice if you want just to "experiment" finite differences.
If you need more advanced solvers, you should take a look at GSL, or consider the C version of LAPACK.
As for your original question, as far as i know none of those libraries implements a kron function, since it is only a "convenience" routine.

Worse performance using Eigen than using my own class

A couple of weeks ago I asked a question about the performance of matrix multiplication.
I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.
StackOverflow users recommended:
uBLAS
EIGEN
BLAS
At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.
After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before.
Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.
Parts of program which take the most time looks like that
TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
{
for (int i=0;i<pointVector.size();i++ )
{
//Eigen::MatrixXd outcome =
Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i]) + translation;
//delete prototypePointVector[i]; // ((rotation*scale)* (*prototypePointVector[i]) + translation).ConvertToPoint();
MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
//assosiatedPointIndexVector[i] = prototypePointVector[i]->associatedTemplateIndex = i;
}
return this;
}
and
Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
{
double membershipSum = 0,outcome = 0;
double currentPower = 0;
Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
outcomePoint << 0,0;
Eigen::MatrixXd templatePoint;
for (int i=0;i< imageDataVector.size();i++)
{
currentPower =0;
membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
outcomePoint.noalias() += (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
}
outcomePoint.noalias() = outcomePoint/=membershipSum;
return outcomePoint; //.ConvertToMatrix();
}
As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.
Is there any way to speed up these functions?
Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).
If you're using Eigen's MatrixXd types, those are dynamically sized. You should get much better results from using the fixed size types e.g Matrix4d, Vector4d.
Also, make sure you're compiling such that the code can get vectorized; see the relevant Eigen documentation.
Re your thought on using the Direct3D extensions library stuff (D3DXMATRIX etc): it's OK (if a bit old fashioned) for graphics geometry (4x4 transforms etc), but it's certainly not GPU accelerated (just good old SSE, I think). Also, note that it's floating point precision only (you seem to be set on using doubles). Personally I'd much prefer to use Eigen unless I was actually coding a Direct3D app.
Make sure to have compiler optimization switched on (e.g. at least -O2 on gcc). Eigen is heavily templated and will not perform very well if you don't turn on optimization.
Which version of Eigen are you using? They recently released 3.0.1, which is supposed to be faster than 2.x. Also, make sure you play a bit with the compiler options. For example, make sure SSE is being used in Visual Studio:
C/C++ --> Code Generation --> Enable Enhanced Instruction Set
You should profile and then optimize first the algorithm, then the implementation. In particular, the posted code is quite innefficient:
for (int i=0;i<pointVector.size();i++ )
{
Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i]) + translation;
I don't know the library, so I won't even try to guess the number of unnecessary temporaries that you are creating, but a simple refactor:
Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
Eigen::MatrixXd outcome = tmp*(*pointVector[i]) + translation;
Can save you a good amount of expensive multiplications (and again, probably new temporary matrices that get discarded right away.
A couple of points.
Why are you multiplying rotation*scale inside of the loop when that product will have the same value each iteration? That is a lot of wasted effort.
You are using dynamically sized matrices rather than fixed sized matrices. Someone else mentioned this already, and you said you shaved off 2 sec.
You are passing arguments as a vector of pointers to matrices. This adds an extra pointer indirection and destroys any guarantee of data locality, which will give poor cache performance.
I hope this isn't insulting, but are you compiling in Release or Debug? Eigen is very slow in debug builds, because it uses lots of trivial templated functions that are optimized out of release but remain in debug.
Looking at your code, I am hesitant to blame Eigen for performance problems. However, most linear algebra libraries (including Eigen) are not really designed for your use case of lots of tiny matrices. In general, Eigen will be better optimized for 100x100 or larger matrices. You very well may be better off using your own matrix class or the DirectX math helper classes. The DirectX math classes are completely independent from your video card.
Looking back at your previous post and the code in there, my suggestion would be to use your old code, but improve its efficiency by moving things around. I'm posting on that previous question to keep the answers separate.

How to optimize matrix multiplication operation [duplicate]

This question already has answers here:
Optimized matrix multiplication in C
(14 answers)
Closed 4 years ago.
I need to perform a lot of matrix operations in my application. The most time consuming is matrix multiplication. I implemented it this way
template<typename T>
Matrix<T> Matrix<T>::operator * (Matrix& matrix)
{
Matrix<T> multipliedMatrix = Matrix<T>(this->rows,matrix.GetColumns(),0);
for (int i=0;i<this->rows;i++)
{
for (int j=0;j<matrix.GetColumns();j++)
{
multipliedMatrix.datavector.at(i).at(j) = 0;
for (int k=0;k<this->columns ;k++)
{
multipliedMatrix.datavector.at(i).at(j) += datavector.at(i).at(k) * matrix.datavector.at(k).at(j);
}
//cout<<(*multipliedMatrix)[i][j]<<endl;
}
}
return multipliedMatrix;
}
Is there any way to write it in a better way?? So far matrix multiplication operations take most of time in my application. Maybe is there good/fast library for doing this kind of stuff ??
However I rather can't use libraries which uses graphic card for mathematical operations, because of the fact that I work on laptop with integrated graphic card.
Eigen is by far one of the fastest, if not the fastest, linear algebra libraries out there. It is well written and it is of high quality. Also, it uses expression template which makes writing code that is more readable. Version 3 just released uses OpenMP for data parallelism.
#include <iostream>
#include <Eigen/Dense>
using Eigen::MatrixXd;
int main()
{
MatrixXd m(2,2);
m(0,0) = 3;
m(1,0) = 2.5;
m(0,1) = -1;
m(1,1) = m(1,0) + m(0,1);
std::cout << m << std::endl;
}
Boost uBLAS I think is definitely the way to go with this sort of thing. Boost is well designed, well tested and used in a lot of applications.
Consider GNU Scientific Library, or MV++
If you're okay with C, BLAS is a low-level library that incorporates both C and C-wrapped FORTRAN instructions and is used a huge number of higher-level math libraries.
I don't know anything about this, but another option might be Meschach which seems to have decent performance.
Edit: With respect to your comment about not wanting to use libraries that use your graphics card, I'll point out that in many cases, the libraries that use your graphics card are specialized implementations of standard (non-GPU) libraries. For example, various implementations of BLAS are listed on it's Wikipedia page, only some are designed to leverage your GPU.
There is a book called Introduction to Algorithms. You may like to check the chapter of Dynamic Programming. It has an excellent matrix multiplication algo using dynamic programming. Its worth a read. Well, this info was in case you want to write your own logic instead of using a library.
There are plenty of algorithms for efficient matrix multiplication.
Algorithms for efficient matrix multiplication
Look at the algorithms, find an implementations.
You can also make a multi-threaded implementation for it.