Element-wise operations in C++

Element-wise operations in C++ - c++

Is there a preexisting library that will let me create array-like objects which have the following properties:
Run time size specification (chosen at instantition, not grown or shrunk afterwards)
Operators overloaded to perform element wise operations (i.e. c=a+b will result in a vector c with c[i]=a[i]+b[i] for all i, and similarly for *, -, /, etc)
A good set of functions which act elementwise, for example x=sqrt(vec) will have elements x[i]=sqrt(vec[i])
Provide "summarising" functions such as sum(vec), mean(vec) etc
(Optional) Operations can be sent to a GPU for processing.
Basically something like the way arrays work in Fortran, with all of the implementation hidden. Currently I am using vector from the STL and manually overloading the operators, but I feel like this is probably a solved problem.

In the dusty corners of standard library, long forgotten by everyone, sits a class called valarray. Look it up and see if it suits your needs.
From manual page at cppreference.com:
std::valarray is the class for representing and manipulating arrays of values. It supports element-wise mathematical operations and various forms of generalized subscript operators, slicing and indirect access.
A code snippet for illustration:
#include <valarray>
#include <algorithm>
#include <iterator>
#include <iostream>
int main()
{
std::valarray<int> a { 1, 2, 3, 4, 5};
std::valarray<int> b = a;
std::valarray<int> c = a + b;
std::copy(begin(c), end(c),
std::ostream_iterator<int>(std::cout, " "));
}
Output: 2 4 6 8 10

You can use Cilk Plus Extentions (https://www.cilkplus.org/) that provides array notation by applying element-wise operations to arrays of the same shape for C/C++. It explores the vector parallelism from your processor as well co-processor.
Example:
Standard C code:
for (i=0; i<MAX; i++)
c[i]=a[i]+b[i];
Cilk Plus - Array notation:
c[i:MAX]=a[i:MAX]+b[i:MAX];
Stride sections like:
float d[10] = {0,1,2,3,4,5,6,7,8,9};
float x[3];
x[:] = d[0:3:2]; //x contains 0,2,4 values
You can use reductions on arrays sections:
_sec_reduce_add(a[0:n]);
Interest reading:
http://software.intel.com/en-us/articles/getting-started-with-intel-cilk-plus-array-notations

The Thrust library, which is part of the CUDA toolkit, provides an STL-like interface for vector operations on GPUs. It also has an OpenMP back end, however the GPU support utilizes CUDA, so you are limited to NVIDIA GPUs. You will have to do your own wrapping (say with expression templates) if you want to have expressions like c=a+b work for vectors
https://code.google.com/p/thrust/
The VienaCL library takes a more high level approach, providing vector and matrix operations like you want. It has both CUDA and OpenCL back ends, so you can use GPUs (and multi-core CPUs) from different vendors.
http://viennacl.sourceforge.net/
The vexcl library also looks very promising (again with support for both OpenCL and CUDA)
https://github.com/ddemidov/vexcl

Related

Is there a `numpy.minimum` equivalent in GSL?

I'm working on porting a complex data analysis routine I "prototyped" in Python to C++. I used Numpy extensively throughout the Python code. I'm looking at employing the GSL in the C++ port since it implements all of the various numerical routines I require (whereas Armadillo, Eigen, etc. only have a subset of what I need, though their APIs are closer to what I am looking for).
Is there an equivalent to numpy.minimum in the GSL (i.e., element-wise minimum of two matrices)? This is just one example of the abstractions from Numpy that I am looking for. Do things like this simply have to be reimplemented manually when using the GSL? I note that the GSL provides for things like:
double gsl_matrix_min (const gsl_matrix * m)
But that simply provides the minimum value of the entire matrix. Regardless of element-wise comparisons, it doesn't even seem possible to report the minimum along a particular axis of a single matrix using the GSL. That surprises me.
Are my expectations misplaced?

You can implement an element-wise minimum easily in Armadillo, via the find() and .elem() functions:
mat A; A.randu(5,5);
mat B; B.randu(5,5);
umat indices = find(B < A);
mat C = A;
C.elem(indices) = B.elem(indices);
For other functions that are not present in Armadillo, it might be possible to interface Armadillo matrices with GSL functions, through the .memptr() function.

NumPy style arrays for C++? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Are there any C++ (or C) libs that have NumPy-like arrays with support for slicing, vectorized operations, adding and subtracting contents element-by-element, etc.?

Here are several free software that may suit your needs.
The GNU Scientific Library is a GPL software written in C. Thus, it has a C-like allocation and way of programming (pointers, etc.). With the GSLwrap, you can have a C++ way of programming, while still using the GSL. GSL has a BLAS implementation, but you can use ATLAS instead of the default CBLAS, if you want even more performances.
The boost/uBLAS library is a BSL library, written in C++ and distributed as a boost package. It is a C++-way of implementing the BLAS standard. uBLAS comes with a few linear algebra functions, and there is an experimental binding to ATLAS.
eigen is a linear algebra library written in C++, distributed under the MPL2 license (starting from version 3.1.1) or LGPL3/GPL2 (older versions). It's a C++ way of programming, but more integrated than the two others (more algorithms and data structures are available). Eigen claims to be faster than the BLAS implementations above, while not following the de-facto standard BLAS API. Eigen does not seem to put a lot of effort on parallel implementation.
Armadillo is LGPL3 library for C++. It has binding for LAPACK (the library used by numpy). It uses recursive templates and template meta-programming, which is a good point (I don't know if other libraries are doing it also?).
xtensor is a C++ library that is BSD licensed. It offers A C++ API very similar to that of NumPy. See https://xtensor.readthedocs.io/en/latest/numpy.html for a cheat sheet.
These alternatives are really good if you just want to get data structures and basic linear algebra. Depending on your taste about style, license or sysadmin challenges (installing big libraries like LAPACK may be difficult), you may choose the one that best suits your needs.

Try out xtensor. (See the NumPy to Xtensor Cheat Sheet).
xtensor is a C++ library meant for numerical analysis with multi-dimensional array expressions.
xtensor provides
an extensible expression system enabling numpy-style broadcasting.
an API following the idioms of the C++ standard library.
tools to manipulate array expressions and build upon xtensor.
Example
Initialize a 2-D array and compute the sum of one of its rows and a 1-D array.
#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
xt::xarray<double> arr1
{{1.0, 2.0, 3.0},
{2.0, 5.0, 7.0},
{2.0, 5.0, 7.0}};
xt::xarray<double> arr2
{5.0, 6.0, 7.0};
xt::xarray<double> res = xt::view(arr1, 1) + arr2;
std::cout << res;
Outputs
{7, 11, 14}
Initialize a 1-D array and reshape it inplace.
#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
xt::xarray<int> arr
{1, 2, 3, 4, 5, 6, 7, 8, 9};
arr.reshape({3, 3});
std::cout << arr;
Outputs
{{1, 2, 3},
{4, 5, 6},
{7, 8, 9}}

DyND is designed to be, among other things, a NumPy-like library for C++. Things like broadcasting, arithmetic operators, and slicing all work fine. On the other hand, it is still very experimental and many features haven't been implemented yet.
Here's a simple implementation of the de Casteljau algorithm in C++ using DyND arrays:
#include <iostream>
#include <dynd/array.hpp>
using namespace dynd;
nd::array decasteljau(nd::array a, double t){
size_t e = a.get_dim_size();
for(size_t i=0; i < e-1; i++){
a = (1.-t) * a(irange()<(e-i-1)) + t * a(0<irange());
}
return a;
}
int main(){
nd::array a = {1., 2., 2., -1.};
std::cout << decasteljau(a, .25) << std::endl;
}
I wrote a blog post a little while back with more examples and side-by-side comparisons of the syntax for Fortran 90, DyND in C++, and NumPy in Python.
Disclaimer: I'm one of the current DyND developers.

This is an old question. Still felt like answering. Thought might help many, Especially pydevs coding in C++.
If you have already worked with python numpy, then NumCpp is a great choice. It's minimalistic in syntax and has got similar functions or methods as py numpy.
The comparison part in the readme doc is also very very cool.
NumCpp
nc::NdArray<int> arr = {{4, 2}, {9, 4}, {5, 6}};
arr.reshape(5, 3);
arr.astype<double>();

Use LibTorch (PyTorch frontend for C++) and be happy.

If you want to use multidimensional array(like numpy) for image processing or neural network, you can use OpenCV cv::Mat along with tons of image processing algorithms. In case you just want to use it for matrix operations ONLY, you just have to compile respective opencv modules to reduce the size and have tiny OpenCV library.
cv::Mat(Matrix) is an n-dimensional array that can be used to store various type of data, such as RGB, HSV or grayscale images, vectors with real or complex values, other matrices etc.
A Mat contains the following information: width, height, type, channels, data, flags, datastart, dataend and so on.
It has several methods for matrix manipulation. Bonus you can create then on CUDA cores as well as cv::cuda::GpuMat.
Consider I want to create a matrix with 10 rows, 20 columns, type CV_32FC3:
int R = 10, C = 20;
Mat m1;
m1.create(R, C, CV_32FC3); //creates empty matrix
Mat m2(cv::Size(R, C), CV_32FC3); // creates a matrix with R rows, C columns with data type T where R and C are integers,
Mat m3(R, C, CV_32FC3); // same as m2
BONUS:
Compile tiny and compact opencv library for just matrix operations. One of the ways is like as mentioned in this article.
OR
compile opencv source code using following cmake command:
$ git clone https://github.com/opencv/opencv.git
$ cd opencv
$ git checkout <version you want to checkout>
$ mkdir build
$ cd build
$ cmake -D WITH_CUDA=OFF -D WITH_MATLAB=OFF -D BUILD_ANDROID_EXAMPLES=OFF -D BUILD_DOCS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_TESTS=OFF -DANDROID_STL=c++_shared -DBUILD_SHARED_LIBS=ON -D BUILD_opencv_objdetect=OFF -D BUILD_opencv_video=OFF -D BUILD_opencv_videoio=OFF -D BUILD_opencv_features2d=OFF -D BUILD_opencv_flann=OFF -D BUILD_opencv_highgui=OFF -D BUILD_opencv_ml=OFF -D BUILD_opencv_photo=OFF -D BUILD_opencv_python=OFF -D BUILD_opencv_shape=OFF -D BUILD_opencv_stitching=OFF -D BUILD_opencv_superres=OFF -D BUILD_opencv_ts=OFF -D BUILD_opencv_videostab=OFF -D BUILD_opencv_dnn=OFF -D BUILD_opencv_imgproc=OFF ..
$ make -j $nproc
$ sudo make install
Try this example:
#include "opencv2/core.hpp"
#include<iostream>
int main()
{
std::cout << "OpenCV Version " << CV_VERSION << std::endl;
int R = 2, C = 4;
cv::Mat m1;
m1.create(R, C, CV_32FC1); //creates empty matrix
std::cout << "My Mat : \n" << m1 << std::endl;
}
Compile the code with following command:
$ g++ -std=c++11 opencv_mat.cc -o opencv_mat `pkg-config --libs opencv` `pkg-config --cflags opencv`
Run the executable:
$ ./opencv_mat
OpenCV Version 3.4.2
My Mat :
[0, 0, 0, 0;
0, 0, 0, 0]

Eigen is a good linear algebra library.
http://eigen.tuxfamily.org/index.php?title=Main_Page
It is quite easy to install since it's a header-only library. It relies on template in order to to generate well optimized code. It vectorizes automatically the matrix operations.
It also fully support coefficient wise operations, such as the "per element multiplication" between two matrices for instance. It is what you need?

xtensor is good, but I ended up writing a mini-library myself as a toy project with c++20, while trying to keep the interface as simple as possible. Here it is: https://github.com/gbalduzz/NDArray
Example code:
using namespace nd;
NDArray<int, 2> m(3, 3); // 3x3 matrix
m = 2; // assign 2 to all
m(-1, all) = 1; // assign 1 to the last row.
auto tile = m(range{1, end}, range{1, end}); // 2x2 tile
std::sort(tile.begin(), tile.end());
std::cout << m; // prints [[2, 2, 2], [2, 1, 1], [1, 2, 2]]
It does not provide fancy arithmetic operators collapsing multiple operations together, yet, but you can broadcast arbitrary lambdas to a set of tensors with the same shape, or use lazily evaluated arithmetic operators.
Let me know what do you think about the interface and how it compares with the other options, and if this has any hope, what sort of operations you would like to see implemented.
Free license and no dependency!
Addendum: I managed to properly compile and run xtensor, and the result is that my library is significantly faster when iterating over views (2 to 3X)

Blitz++ supports arrays with an arbitrary number of axes, whereas Armadillo only supports up to three (vectors, matrices, and cubes). Eigen only supports vectors and matrices (not cubes). The downside is that Blitz++ doesn't have linear algebra functions beyond the basic entrywise operations and tensor contractions. Development seems to have slowed down quite some time ago, but perhaps that's just because the library does what it does and not many changes need to be made.

VIGRA contains a good N-dimensional array implementation:
http://ukoethe.github.io/vigra/doc/vigra/Tutorial.html
I use it extensively, and find it very simple and effective. It's also header only, so very easy to integrate into your development environment. It's the closest thing I've come across to using NumPy in terms of it's API.
The main downside is that it isn't so widely used as the others, so you won't find much help online. That, and it's awkwardly named (try searching for it!)

While GLM is designed to mesh easily with OpenGL and GLSL, it is a fully functional header only math library for C++ with a very intuitive set of interfaces.
It declares vector & matrix types as well as various operations on them.
Multiplying two matrices is a simple as (M1 * M2). Subtracting two vectors (V1- V2).
Accessing values contained in vectors or matrices is equally simple. After declaring a vec3 vector for example, one can access its first element with vector.x. Check it out.

Eigen is a template library for linear algebra (matrices, vectors…). It is header only and free to use (LGPL).

The GSL is great, it does all of what you're asking and much more. It is licensed under the GPL though.

Worse performance using Eigen than using my own class

A couple of weeks ago I asked a question about the performance of matrix multiplication.
I was told that in order to enhance the performance of my program I should use some specialised matrix classes rather than my own class.
StackOverflow users recommended:
uBLAS
EIGEN
BLAS
At first I wanted to use uBLAS however reading documentation it turned out that this library doesn't support matrix-matrix multiplication.
After all I decided to use EIGEN library. So I exchanged my matrix class to Eigen::MatrixXd - however it turned out that now my application works even slower than before.
Time before using EIGEN was 68 seconds and after exchanging my matrix class to EIGEN matrix program runs for 87 seconds.
Parts of program which take the most time looks like that
TemplateClusterBase* TemplateClusterBase::TransformTemplateOne( vector<Eigen::MatrixXd*>& pointVector, Eigen::MatrixXd& rotation ,Eigen::MatrixXd& scale,Eigen::MatrixXd& translation )
{
for (int i=0;i<pointVector.size();i++ )
{
//Eigen::MatrixXd outcome =
Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i]) + translation;
//delete prototypePointVector[i]; // ((rotation*scale)* (*prototypePointVector[i]) + translation).ConvertToPoint();
MatrixHelper::SetX(*prototypePointVector[i],MatrixHelper::GetX(outcome));
MatrixHelper::SetY(*prototypePointVector[i],MatrixHelper::GetY(outcome));
//assosiatedPointIndexVector[i] = prototypePointVector[i]->associatedTemplateIndex = i;
}
return this;
}
and
Eigen::MatrixXd AlgorithmPointBased::UpdateTranslationMatrix( int clusterIndex )
{
double membershipSum = 0,outcome = 0;
double currentPower = 0;
Eigen::MatrixXd outcomePoint = Eigen::MatrixXd(2,1);
outcomePoint << 0,0;
Eigen::MatrixXd templatePoint;
for (int i=0;i< imageDataVector.size();i++)
{
currentPower =0;
membershipSum += currentPower = pow(membershipMatrix[clusterIndex][i],m);
outcomePoint.noalias() += (*imageDataVector[i] - (prototypeVector[clusterIndex]->rotationMatrix*prototypeVector[clusterIndex]->scalingMatrix* ( *templateCluster->templatePointVector[prototypeVector[clusterIndex]->assosiatedPointIndexVector[i]]) ))*currentPower ;
}
outcomePoint.noalias() = outcomePoint/=membershipSum;
return outcomePoint; //.ConvertToMatrix();
}
As You can see, these functions performs a lot of matrix operations. That is why I thought using Eigen would speed up my application. Unfortunately (as I mentioned above), the program works slower.
Is there any way to speed up these functions?
Maybe if I used DirectX matrix operations I would get better performance ?? (however I have a laptop with integrated graphic card).

If you're using Eigen's MatrixXd types, those are dynamically sized. You should get much better results from using the fixed size types e.g Matrix4d, Vector4d.
Also, make sure you're compiling such that the code can get vectorized; see the relevant Eigen documentation.
Re your thought on using the Direct3D extensions library stuff (D3DXMATRIX etc): it's OK (if a bit old fashioned) for graphics geometry (4x4 transforms etc), but it's certainly not GPU accelerated (just good old SSE, I think). Also, note that it's floating point precision only (you seem to be set on using doubles). Personally I'd much prefer to use Eigen unless I was actually coding a Direct3D app.

Make sure to have compiler optimization switched on (e.g. at least -O2 on gcc). Eigen is heavily templated and will not perform very well if you don't turn on optimization.

Which version of Eigen are you using? They recently released 3.0.1, which is supposed to be faster than 2.x. Also, make sure you play a bit with the compiler options. For example, make sure SSE is being used in Visual Studio:
C/C++ --> Code Generation --> Enable Enhanced Instruction Set

You should profile and then optimize first the algorithm, then the implementation. In particular, the posted code is quite innefficient:
for (int i=0;i<pointVector.size();i++ )
{
Eigen::MatrixXd outcome = (rotation*scale)* (*pointVector[i]) + translation;
I don't know the library, so I won't even try to guess the number of unnecessary temporaries that you are creating, but a simple refactor:
Eigen::MatrixXd tmp = rotation*scale;
for (int i=0;i<pointVector.size();i++ )
{
Eigen::MatrixXd outcome = tmp*(*pointVector[i]) + translation;
Can save you a good amount of expensive multiplications (and again, probably new temporary matrices that get discarded right away.

A couple of points.
Why are you multiplying rotation*scale inside of the loop when that product will have the same value each iteration? That is a lot of wasted effort.
You are using dynamically sized matrices rather than fixed sized matrices. Someone else mentioned this already, and you said you shaved off 2 sec.
You are passing arguments as a vector of pointers to matrices. This adds an extra pointer indirection and destroys any guarantee of data locality, which will give poor cache performance.
I hope this isn't insulting, but are you compiling in Release or Debug? Eigen is very slow in debug builds, because it uses lots of trivial templated functions that are optimized out of release but remain in debug.
Looking at your code, I am hesitant to blame Eigen for performance problems. However, most linear algebra libraries (including Eigen) are not really designed for your use case of lots of tiny matrices. In general, Eigen will be better optimized for 100x100 or larger matrices. You very well may be better off using your own matrix class or the DirectX math helper classes. The DirectX math classes are completely independent from your video card.

Looking back at your previous post and the code in there, my suggestion would be to use your old code, but improve its efficiency by moving things around. I'm posting on that previous question to keep the answers separate.

How to optimize matrix multiplication operation [duplicate]

This question already has answers here:
Optimized matrix multiplication in C
(14 answers)
Closed 4 years ago.
I need to perform a lot of matrix operations in my application. The most time consuming is matrix multiplication. I implemented it this way
template<typename T>
Matrix<T> Matrix<T>::operator * (Matrix& matrix)
{
Matrix<T> multipliedMatrix = Matrix<T>(this->rows,matrix.GetColumns(),0);
for (int i=0;i<this->rows;i++)
{
for (int j=0;j<matrix.GetColumns();j++)
{
multipliedMatrix.datavector.at(i).at(j) = 0;
for (int k=0;k<this->columns ;k++)
{
multipliedMatrix.datavector.at(i).at(j) += datavector.at(i).at(k) * matrix.datavector.at(k).at(j);
}
//cout<<(*multipliedMatrix)[i][j]<<endl;
}
}
return multipliedMatrix;
}
Is there any way to write it in a better way?? So far matrix multiplication operations take most of time in my application. Maybe is there good/fast library for doing this kind of stuff ??
However I rather can't use libraries which uses graphic card for mathematical operations, because of the fact that I work on laptop with integrated graphic card.

Eigen is by far one of the fastest, if not the fastest, linear algebra libraries out there. It is well written and it is of high quality. Also, it uses expression template which makes writing code that is more readable. Version 3 just released uses OpenMP for data parallelism.
#include <iostream>
#include <Eigen/Dense>
using Eigen::MatrixXd;
int main()
{
MatrixXd m(2,2);
m(0,0) = 3;
m(1,0) = 2.5;
m(0,1) = -1;
m(1,1) = m(1,0) + m(0,1);
std::cout << m << std::endl;
}

Boost uBLAS I think is definitely the way to go with this sort of thing. Boost is well designed, well tested and used in a lot of applications.

Consider GNU Scientific Library, or MV++
If you're okay with C, BLAS is a low-level library that incorporates both C and C-wrapped FORTRAN instructions and is used a huge number of higher-level math libraries.
I don't know anything about this, but another option might be Meschach which seems to have decent performance.
Edit: With respect to your comment about not wanting to use libraries that use your graphics card, I'll point out that in many cases, the libraries that use your graphics card are specialized implementations of standard (non-GPU) libraries. For example, various implementations of BLAS are listed on it's Wikipedia page, only some are designed to leverage your GPU.

There is a book called Introduction to Algorithms. You may like to check the chapter of Dynamic Programming. It has an excellent matrix multiplication algo using dynamic programming. Its worth a read. Well, this info was in case you want to write your own logic instead of using a library.

There are plenty of algorithms for efficient matrix multiplication.
Algorithms for efficient matrix multiplication
Look at the algorithms, find an implementations.
You can also make a multi-threaded implementation for it.

C++ valarray vs. vector

I like vectors a lot. They're nifty and fast. But I know this thing called a valarray exists. Why would I use a valarray instead of a vector? I know valarrays have some syntactic sugar, but other than that, when are they useful?

valarray is kind of an orphan that was born in the wrong place at the wrong time. It's an attempt at optimization, fairly specifically for the machines that were used for heavy-duty math when it was written -- specifically, vector processors like the Crays.
For a vector processor, what you generally wanted to do was apply a single operation to an entire array, then apply the next operation to the entire array, and so on until you'd done everything you needed to do.
Unless you're dealing with fairly small arrays, however, that tends to work poorly with caching. On most modern machines, what you'd generally prefer (to the extent possible) would be to load part of the array, do all the operations on it you're going to, then move on to the next part of the array.
valarray is also supposed to eliminate any possibility of aliasing, which (at least theoretically) lets the compiler improve speed because it's more free to store values in registers. In reality, however, I'm not at all sure that any real implementation takes advantage of this to any significant degree. I suspect it's rather a chicken-and-egg sort of problem -- without compiler support it didn't become popular, and as long as it's not popular, nobody's going to go to the trouble of working on their compiler to support it.
There's also a bewildering (literally) array of ancillary classes to use with valarray. You get slice, slice_array, gslice and gslice_array to play with pieces of a valarray, and make it act like a multi-dimensional array. You also get mask_array to "mask" an operation (e.g. add items in x to y, but only at the positions where z is non-zero). To make more than trivial use of valarray, you have to learn a lot about these ancillary classes, some of which are pretty complex and none of which seems (at least to me) very well documented.
Bottom line: while it has moments of brilliance, and can do some things pretty neatly, there are also some very good reasons that it is (and will almost certainly remain) obscure.
Edit (eight years later, in 2017): Some of the preceding has become obsolete to at least some degree. For one example, Intel has implemented an optimized version of valarray for their compiler. It uses the Intel Integrated Performance Primitives (Intel IPP) to improve performance. Although the exact performance improvement undoubtedly varies, a quick test with simple code shows around a 2:1 improvement in speed, compared to identical code compiled with the "standard" implementation of valarray.
So, while I'm not entirely convinced that C++ programmers will be starting to use valarray in huge numbers, there are least some circumstances in which it can provide a speed improvement.

Valarrays (value arrays) are intended to bring some of the speed of Fortran to C++. You wouldn't make a valarray of pointers so the compiler can make assumptions about the code and optimise it better. (The main reason that Fortran is so fast is that there is no pointer type so there can be no pointer aliasing.)
Valarrays also have classes which allow you to slice them up in a reasonably easy way although that part of the standard could use a bit more work. Resizing them is destructive and they lack iterators they have iterators since C++11.
So, if it's numbers you are working with and convenience isn't all that important use valarrays. Otherwise, vectors are just a lot more convenient.

During the standardization of C++98, valarray was designed to allow some sort of fast mathematical computations. However, around that time Todd Veldhuizen invented expression templates and created blitz++, and similar template-meta techniques were invented, which made valarrays pretty much obsolete before the standard was even released. IIRC, the original proposer(s) of valarray abandoned it halfway into the standardization, which (if true) didn't help it either.
ISTR that the main reason it wasn't removed from the standard is that nobody took the time to evaluate the issue thoroughly and write a proposal to remove it.
Please keep in mind, however, that all this is vaguely remembered hearsay. Take this with a grain of salt and hope someone corrects or confirms this.

I know valarrays have some syntactic sugar
I have to say that I don't think std::valarrays have much in way of syntactic sugar. The syntax is different, but I wouldn't call the difference "sugar." The API is weird. The section on std::valarrays in The C++ Programming Language mentions this unusual API and the fact that, since std::valarrays are expected to be highly optimized, any error messages you get while using them will probably be non-intuitive.
Out of curiosity, about a year ago I pitted std::valarray against std::vector. I no longer have the code or the precise results (although it shouldn't be hard to write your own). Using GCC I did get a little performance benefit when using std::valarray for simple math, but not for my implementations to calculate standard deviation (and, of course, standard deviation isn't that complex, as far as math goes). I suspect that operations on each item in a large std::vector play better with caches than operations on std::valarrays. (NOTE, following advice from musiphil, I've managed to get almost identical performance from vector and valarray).
In the end, I decided to use std::vector while paying close attention to things like memory allocation and temporary object creation.
Both std::vector and std::valarray store the data in a contiguous block. However, they access that data using different patterns, and more importantly, the API for std::valarray encourages different access patterns than the API for std::vector.
For the standard deviation example, at a particular step I needed to find the collection's mean and the difference between each element's value and the mean.
For the std::valarray, I did something like:
std::valarray<double> original_values = ... // obviously I put something here
double mean = original_values.sum() / original_values.size();
std::valarray<double> temp(mean, original_values.size());
std::valarray<double> differences_from_mean = original_values - temp;
I may have been more clever with std::slice or std::gslice. It's been over five years now.
For std::vector, I did something along the lines of:
std::vector<double> original_values = ... // obviously, I put something here
double mean = std::accumulate(original_values.begin(), original_values.end(), 0.0) / original_values.size();
std::vector<double> differences_from_mean;
differences_from_mean.reserve(original_values.size());
std::transform(original_values.begin(), original_values.end(), std::back_inserter(differences_from_mean), std::bind1st(std::minus<double>(), mean));
Today I would certainly write that differently. If nothing else, I would take advantage of C++11 lambdas.
It's obvious that these two snippets of code do different things. For one, the std::vector example doesn't make an intermediate collection like the std::valarray example does. However, I think it's fair to compare them because the differences are tied to the differences between std::vector and std::valarray.
When I wrote this answer, I suspected that subtracting the value of elements from two std::valarrays (last line in the std::valarray example) would be less cache-friendly than the corresponding line in the std::vector example (which happens to also be the last line).
It turns out, however, that
std::valarray<double> original_values = ... // obviously I put something here
double mean = original_values.sum() / original_values.size();
std::valarray<double> differences_from_mean = original_values - mean;
Does the same thing as the std::vector example, and has almost identical performance. In the end, the question is which API you prefer.

valarray was supposed to let some FORTRAN vector-processing goodness rub off on C++. Somehow the necessary compiler support never really happened.
The Josuttis books contains some interesting (somewhat disparaging) commentary on valarray (here and here).
However, Intel now seem to be revisiting valarray in their recent compiler releases (e.g see slide 9); this is an interesting development given that their 4-way SIMD SSE instruction set is about to be joined by 8-way AVX and 16-way Larrabee instructions and in the interests of portability it'll likely be much better to code with an abstraction like valarray than (say) intrinsics.

I found one good usage for valarray.
It's to use valarray just like numpy arrays.
auto x = linspace(0, 2 * 3.14, 100);
plot(x, sin(x) + sin(3.f * x) / 3.f + sin(5.f * x) / 5.f);
We can implement above with valarray.
valarray<float> linspace(float start, float stop, int size)
{
valarray<float> v(size);
for(int i=0; i<size; i++) v[i] = start + i * (stop-start)/size;
return v;
}
std::valarray<float> arange(float start, float step, float stop)
{
int size = (stop - start) / step;
valarray<float> v(size);
for(int i=0; i<size; i++) v[i] = start + step * i;
return v;
}
string psstm(string command)
{//return system call output as string
string s;
char tmp[1000];
FILE* f = popen(command.c_str(), "r");
while(fgets(tmp, sizeof(tmp), f)) s += tmp;
pclose(f);
return s;
}
string plot(const valarray<float>& x, const valarray<float>& y)
{
int sz = x.size();
assert(sz == y.size());
int bytes = sz * sizeof(float) * 2;
const char* name = "plot1";
int shm_fd = shm_open(name, O_CREAT | O_RDWR, 0666);
ftruncate(shm_fd, bytes);
float* ptr = (float*)mmap(0, bytes, PROT_WRITE, MAP_SHARED, shm_fd, 0);
for(int i=0; i<sz; i++) {
*ptr++ = x[i];
*ptr++ = y[i];
}
string command = "python plot.py ";
string s = psstm(command + to_string(sz));
shm_unlink(name);
return s;
}
Also, we need python script.
import sys, posix_ipc, os, struct
import matplotlib.pyplot as plt
sz = int(sys.argv[1])
f = posix_ipc.SharedMemory("plot1")
x = [0] * sz
y = [0] * sz
for i in range(sz):
x[i], y[i] = struct.unpack('ff', os.read(f.fd, 8))
os.close(f.fd)
plt.plot(x, y)
plt.show()

The C++11 standard says:
The valarray array classes are defined to be free of certain forms of
aliasing, thus allowing operations on these classes to be optimized.
See C++11 26.6.1-2.

With std::valarray you can use the standard mathematical notation like v1 = a*v2 + v3 out of the box. This is not possible with vectors unless you define your own operators.

std::valarray is intended for heavy numeric tasks, such as Computational Fluid Dynamics or Computational Structure Dynamics, in which you have arrays with millions, sometimes tens of millions of items, and you iterate over them in a loop with also millions of timesteps. Maybe today std::vector has a comparable performance but, some 15 years ago, valarray was almost mandatory if you wanted to write an efficient numeric solver.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js