Can I concatenate a std::tuple of Eigen::Vectors? - c++

If I have a std::tuple of statically allocated Eigen::Vectors (from the popular Eigen library), for example
std::tuple<Eigen::Vector2f, Eigen::Vector3f, Eigen::Vector2f>
Is there a way I can turn this into a single Eigen::Vector7f (i.e., Eigen::Matrix<float, 7, 1>) of the three vectors concatenated? It feels like I should be able to do this at compile time, given that the sizes and types of everything are known.

You can write:
template<int R1, int R2, int R3>
auto foo(std::tuple<
Eigen::Matrix<float,R3,1> > t)
Eigen::Matrix<float,R1+R2+R3,1> res;
res.template block<R1,1>(0,0) = std::get<0>(t);
res.template block<R2,1>(R1,0) = std::get<1>(t);
res.template block<R3,1>(R1+R2,0) = std::get<2>(t);
return res;
int main() {
Eigen::Vector2f v1;
v1 << 1,2;
Eigen::Vector3f v2;
v2 << 3,4,5;
std::cout << foo(std::make_tuple(v1,v2,v1)) << std::endl;
as output:
Live demo
Below is more generic version taking multiple vectors as tuple components:
template<class RES, int ... R, size_t ... Indices>
void concatenateHelper(RES& res,
const std::tuple< Eigen::Matrix<float,R,1>... >& t,
int idx = 0;
int fakeArray [] = {(res.template block<R,1>(idx,0) = std::get<Indices>(t),idx += R,0)...};
template<int ... R>
auto concatenate(const std::tuple< Eigen::Matrix<float,R,1> ... >& t)
Eigen::Matrix<float, (R + ...),1> res;
return res;


C++: function that works with container and container of pointers as well

I think I'm facing something that I imagine is a quite common problem here.
I'd like to write a function that would be able to accept both a container (let's say std::vector) of objects, and a container of pointers to those objects.
What would be the proper way to do so?
Right now, I'm thinking
int sum(std::vector<int *> v)
int s = 0;
for (int * i : v) s += *i;
return s;
int sum(std::vector<int> v)
std::vector<int *> vp;
for (size_t i = 0; i < v.size(); ++i)
vp[i] = &v[i];
return sum(vp);
But it doesn't seem quite right, does it?
Consider the standard algorithm library where the problem you see has a solution.
Most algorithms have some default behavior but often allow you to customize that behavior via functor parameters.
For your specific case the algorithm of choice is std::accumulate.
Because this algorithm already exists I can restrict to a rather simplified illustration here:
#include <iostream>
#include <functional>
template <typename T,typename R,typename F = std::plus<>>
R sum(const std::vector<T>& v,R init,F f = std::plus<>{})
for (auto& e : v) init = f(init,e);
return init;
int main() {
std::vector<int> x{1,2,3,4};
std::vector<int*> y;
for (auto& e : x ) y.push_back(&e);
std::cout << sum(x,0) << "\n";
std::cout << sum(y,0,[](auto a, auto b) {return a + *b;});
std::plus is a functor that adds two values. Because the return type may differ from the vectors element type an additional template parameter R is used. Similar to std::accumulate this is deduced from the initial value passed as parameter. When adding int the default std::plus<> is fine. When adding integers pointed to by pointers, the functor can add the accumulator with the dereferenced vector element. As already mentioned this is just a simple toy example. In the above link you can find a possible implementation of std::accumulate (which uses iterators rather than the container directly).
With C++20 (or another ranges library), you can easily add or remove pointerness
template <std::ranges::range R, typename T>
concept range_of = requires std::same<std::ranges::range_value_t<R>, T>;
template <range_of<int *> IntPointers>
int sum_pointers(IntPointers int_pointers)
int result = 0;
for (int * p : int_pointers) result += *p;
return result;
void call_adding_pointer()
std::vector<int> v;
sum_pointers(v | std::ranges::views::transform([](int & i){ return &i; });
template <range_of<int> Ints>
int sum(Ints ints)
int result = 0;
for (int i : ints) result += i;
return result;
void call_removing_pointer()
std::vector<int *> v;
sum(v | std::ranges::views::transform([](int * p){ return *p; });
You can make a function template, which behaves differently for pointer and non-pointer:
#include <iostream>
#include <vector>
using namespace std;
template <class T>
auto sum(const std::vector<T> &vec)
if constexpr (std::is_pointer_v<T>)
typename std::remove_pointer<T>::type sum = 0;
for (const auto & value : vec) sum += *value;
return sum;
if constexpr (!std::is_pointer_v<T>)
T sum = 0;
for (const auto & value : vec) sum += value;
return sum;
int main(){
std::vector<int> a{3, 4, 5, 8, 10};
std::vector<int*> b{&a[0], &a[1], &a[2], &a[3], &a[4]};
cout << sum(a) << endl;
cout << sum(b) << endl;
You can move almost everything out of the if constexpr to reduce code duplication:
template <class T>
auto sum(const std::vector<T> &vec)
typename std::remove_pointer<T>::type sum = 0;
for (const auto & value : vec)
if constexpr (std::is_pointer_v<T>)
sum += *value;
if constexpr (!std::is_pointer_v<T>)
sum += value;
return sum;
Based on #mch solution:
template<typename T>
std::array<double, 3> center(const std::vector<T> & particles)
if (particles.empty())
return {0, 0, 0};
std::array<double, 3> cumsum = {0, 0, 0};
if constexpr (std::is_pointer_v<T>)
for (const auto p : particles)
cumsum[0] += p->getX();
cumsum[1] += p->getY();
cumsum[2] += p->getZ();
if constexpr (not std::is_pointer_v<T>)
for (const auto p : particles)
cumsum[0] += p.getX();
cumsum[1] += p.getY();
cumsum[2] += p.getZ();
double f = 1.0 / particles.size();
cumsum[0] *= f;
cumsum[1] *= f;
cumsum[2] *= f;
return cumsum;
Much cleaner and more efficient solution using std::invoke:
std::array<double, 3> centroid(const std::vector<T> & particles)
if (particles.empty())
return {0, 0, 0};
std::array<double, 3> cumsum{0.0, 0.0, 0.0};
for (auto && p : particles)
cumsum[0] += std::invoke(&topology::Particle::getX, p);
cumsum[1] += std::invoke(&topology::Particle::getY, p);
cumsum[2] += std::invoke(&topology::Particle::getZ, p);
double f = 1.0 / particles.size();
cumsum[0] *= f;
cumsum[1] *= f;
cumsum[2] *= f;
return cumsum;

Eigen::Ref for concatenating matrices

If I want to concatenate two matrices A and B, I would do
using Eigen::MatrixXd;
const MatrixXd A(n, p);
const MatrixXd B(n, q);
MatrixXd X(n, p+q);
X << A, B;
Now if n, p, q are large, defining X in this way would mean creating copies of A and B. Is it possible to define X as an Eigen::Ref<MatrixXd> instead?
No, Ref is not designed for that. We/You would need to define a new expression for that, that could be called Cat. If you only need to concatenate two matrices horizontally, in Eigen 3.3, this can be implemented in less than a dozen of lines of code as a nullary expression, see some exemple there.
Edit: here is a self-contained example showing that one can mix matrices and expressions:
#include <iostream>
#include <Eigen/Core>
using namespace Eigen;
template<typename Arg1, typename Arg2>
struct horizcat_helper {
typedef Matrix<typename Arg1::Scalar,
Arg1::ColsAtCompileTime==Dynamic || Arg2::ColsAtCompileTime==Dynamic
? Dynamic : Arg1::ColsAtCompileTime+Arg2::ColsAtCompileTime,
Arg1::MaxColsAtCompileTime==Dynamic || Arg2::MaxColsAtCompileTime==Dynamic
? Dynamic : Arg1::MaxColsAtCompileTime+Arg2::MaxColsAtCompileTime> MatrixType;
template<typename Arg1, typename Arg2>
class horizcat_functor
const typename Arg1::Nested m_mat1;
const typename Arg2::Nested m_mat2;
horizcat_functor(const Arg1& arg1, const Arg2& arg2)
: m_mat1(arg1), m_mat2(arg2)
const typename Arg1::Scalar operator() (Index row, Index col) const {
if (col < m_mat1.cols())
return m_mat1(row,col);
return m_mat2(row, col - m_mat1.cols());
template <typename Arg1, typename Arg2>
CwiseNullaryOp<horizcat_functor<Arg1,Arg2>, typename horizcat_helper<Arg1,Arg2>::MatrixType>
horizcat(const Eigen::MatrixBase<Arg1>& arg1, const Eigen::MatrixBase<Arg2>& arg2)
typedef typename horizcat_helper<Arg1,Arg2>::MatrixType MatrixType;
return MatrixType::NullaryExpr(arg1.rows(), arg1.cols()+arg2.cols(),
int main()
MatrixXd mat(3, 3);
mat << 0, 1, 2, 3, 4, 5, 6, 7, 8;
auto example1 = horizcat(mat,2*mat);
std::cout << example1 << std::endl;
auto example2 = horizcat(VectorXd::Ones(3),mat);
std::cout << example2 << std::endl;
return 0;
I'll add the C++14 version of #ggaels horizcat as an answer. The implementation is a bit sloppy in that it does not consider the Eigen compile-time constants, but in return it's only a two-liner:
auto horizcat = [](auto expr1, auto expr2)
auto get = [expr1=std::move(expr1),expr2=std::move(expr2)](auto row, auto col)
{ return col<expr1.cols() ? expr1(row, col) : expr2(row, col - expr1.cols());};
return Eigen::Matrix<decltype(get(0,0)), Eigen::Dynamic, Eigen::Dynamic>::NullaryExpr(expr1.rows(), expr1.cols() + expr2.cols(), get);
int main()
Eigen::MatrixXd mat(3, 3);
mat << 0, 1, 2, 3, 4, 5, 6, 7, 8;
auto example1 = horizcat(mat,2*mat);
std::cout << example1 << std::endl;
auto example2 = horizcat(Eigen::MatrixXd::Identity(3,3), mat);
std::cout << example2 << std::endl;
return 0;
Note that the code is untested.
That should be appropriate for most applications. However, in case you're using compile-time matrix dimensions and require maximum performance, prefer ggaels answer. In all other cases, also prefer ggaels answer, because he is the developer of Eigen :-)
I expanded ggael's answer to Array types, vertical concatenation, and more than two arguments:
#include <iostream>
#include <Eigen/Core>
namespace EigenCustom
using namespace Eigen;
constexpr Index dynamicOrSum( const Index& a, const Index& b ){
return a == Dynamic || b == Dynamic ? Dynamic : a + b;
enum class Direction { horizontal, vertical };
template<Direction direction, typename Arg1, typename Arg2>
struct ConcatHelper {
static_assert( std::is_same_v<
typename Arg1::Scalar, typename Arg2::Scalar
> );
using Scalar = typename Arg1::Scalar;
using D = Direction;
static constexpr Index
RowsAtCompileTime { direction == D::horizontal ?
Arg1::RowsAtCompileTime :
dynamicOrSum( Arg1::RowsAtCompileTime, Arg2::RowsAtCompileTime )
ColsAtCompileTime { direction == D::horizontal ?
dynamicOrSum( Arg1::ColsAtCompileTime, Arg2::ColsAtCompileTime ) :
MaxRowsAtCompileTime { direction == D::horizontal ?
Arg1::MaxRowsAtCompileTime :
dynamicOrSum( Arg1::MaxRowsAtCompileTime, Arg2::MaxRowsAtCompileTime )
MaxColsAtCompileTime { direction == D::horizontal ?
dynamicOrSum( Arg1::MaxColsAtCompileTime, Arg2::MaxColsAtCompileTime ) :
(std::is_base_of_v<MatrixBase<Arg1>, Arg1> &&
std::is_base_of_v<MatrixBase<Arg2>, Arg2> ) ||
(std::is_base_of_v<ArrayBase<Arg1>, Arg1> &&
std::is_base_of_v<ArrayBase<Arg2>, Arg2> )
using DenseType = std::conditional_t<
std::is_base_of_v<MatrixBase<Arg1>, Arg1>,
Scalar, RowsAtCompileTime, ColsAtCompileTime,
ColMajor, MaxRowsAtCompileTime, MaxColsAtCompileTime
Scalar, RowsAtCompileTime, ColsAtCompileTime,
ColMajor, MaxRowsAtCompileTime, MaxColsAtCompileTime
template<Direction direction, typename Arg1, typename Arg2>
class ConcatFunctor
using Scalar = typename ConcatHelper<direction, Arg1, Arg2>::Scalar;
const typename Arg1::Nested m_mat1;
const typename Arg2::Nested m_mat2;
ConcatFunctor(const Arg1& arg1, const Arg2& arg2)
: m_mat1(arg1), m_mat2(arg2)
const Scalar operator() (Index row, Index col) const {
if constexpr (direction == Direction::horizontal){
if (col < m_mat1.cols())
return m_mat1(row,col);
return m_mat2(row, col - m_mat1.cols());
} else {
if (row < m_mat1.rows())
return m_mat1(row,col);
return m_mat2(row - m_mat1.rows(), col);
template<Direction direction, typename Arg1, typename Arg2>
using ConcatReturnType = CwiseNullaryOp<
typename ConcatHelper<direction,Arg1,Arg2>::DenseType
template<Direction direction, typename Arg1, typename Arg2>
ConcatReturnType<direction, Arg1, Arg2>
const Eigen::DenseBase<Arg1>& arg1,
const Eigen::DenseBase<Arg2>& arg2
using DenseType = typename ConcatHelper<direction,Arg1,Arg2>::DenseType;
using D = Direction;
return DenseType::NullaryExpr(
direction == D::horizontal ? arg1.rows() : arg1.rows() + arg2.rows(),
direction == D::horizontal ? arg1.cols() + arg2.cols() : arg1.cols(),
ConcatFunctor<direction,Arg1,Arg2>( arg1.derived(), arg2.derived() )
template<Direction direction, typename Arg1, typename Arg2, typename ... Ts>
const Eigen::DenseBase<Arg1>& arg1,
const Eigen::DenseBase<Arg2>& arg2,
Ts&& ... rest
return concat<direction>(
concat<direction>(arg1, arg2),
std::forward<Ts>(rest) ...
template<typename Arg1, typename Arg2, typename ... Ts>
const Eigen::DenseBase<Arg1>& arg1,
const Eigen::DenseBase<Arg2>& arg2,
Ts&& ... rest
return concat<Direction::horizontal>(
arg1, arg2, std::forward<Ts>(rest) ...
template<typename Arg1, typename Arg2, typename ... Ts>
const Eigen::DenseBase<Arg1>& arg1,
const Eigen::DenseBase<Arg2>& arg2,
Ts&& ... rest
return concat<Direction::vertical>(
arg1, arg2, std::forward<Ts>(rest) ...
} // namespace EigenCustom
int main()
using namespace Eigen;
using namespace EigenCustom;
MatrixXd mat(3, 3);
mat << 0, 1, 2, 3, 4, 5, 6, 7, 8;
auto example1 = concat_horizontal(mat,2*mat);
std::cout << "example1:\n" << example1 << '\n';
auto example2 = concat_horizontal(VectorXd::Ones(3),mat);
std::cout << "example2:\n" << example2 << '\n';
auto example3 = concat_vertical(mat,RowVectorXd::Zero(3));
std::cout << "example3:\n" << example3 << '\n';
ArrayXXi arr (2,2);
arr << 0, 1, 2, 3;
auto example4 = concat_vertical(arr,Array2i{4,5}.transpose());
std::cout << "example4:\n" << example4 << '\n';
/* concatenating more than two arguments */
auto example5 = concat_horizontal(mat, mat, mat);
std::cout << "example5:\n" << example5 << '\n';
using RowArray2i = Array<int, 1, 2>;
auto example6 = concat_vertical( arr, RowArray2i::Zero(), RowArray2i::Ones() );
std::cout << "example6:\n" << example6 << '\n';
return 0;

Sorting vector based on tuple value

I have below data structure,
typedef vector< tuple<int,int,int> > vector_tuple;
In vector i am storing tuple<value,count,position>
I want to sort my vector based on count, If count is same then based on position sort the vector.
structure ordering
bool ordering()(....)
return /// ?
int main()
std::vector<int> v1{1,1,1,6,6,5,4,4,5,5,5};
std::vector<int> v2(v1);
vector_tuple vt;
std::tuple<int,int,int> t1;
std::vector<int>::iterator iter;
int sizev=v1.size();
for(int i=0; i < sizev ; i++)
auto countnu = count(begin(v2),end(v2),v1[i]);
if(countnu > 0)
auto t = std::make_tuple(v1[i], countnu, i);
sort(begin(vt),end(vt),ordering(get<1>vt); // I need to sort based on count and if count is same, sort based on position.
for (int i=0; i < vt.size(); i++)
cout << get<0>(vt[i]) << " " ;
cout << get<1>(vt[i]) << " " ;
cout << get<2>(vt[i]) << " \n" ;
Your compare method should look like:
auto ordering = [](const std::tuple<int, int, int>& lhs,
const std::tuple<int, int, int>& rhs) {
return std::tie(std::get<1>(lhs), std::get<2>(lhs))
< std::tie(std::get<1>(rhs), std::get<2>(rhs));
std::sort(std::begin(vt), std::end(vt), ordering);
All credit to Jarod42 for the std::tie answer.
Now let's make it generic by creating a variadic template predicate:
struct tuple_parts_ascending
static auto sort_order(const std::tuple<Ts...>& t)
return std::tie(std::get<Is>(t)...);
bool operator()(const std::tuple<Ts...>& l,
const std::tuple<Ts...>& r) const
return sort_order(l) < sort_order(r);
which we can invoke thus:
Full Code:
#include <vector>
#include <algorithm>
#include <tuple>
#include <iostream>
using namespace std;
typedef std::vector< std::tuple<int,int,int> > vector_tuple;
struct tuple_parts_ascending
static auto sort_order(const std::tuple<Ts...>& t)
return std::tie(std::get<Is>(t)...);
bool operator()(const std::tuple<Ts...>& l,
const std::tuple<Ts...>& r) const
return sort_order(l) < sort_order(r);
int main()
std::vector<int> v1{1,1,1,6,6,5,4,4,5,5,5};
std::vector<int> v2(v1);
vector_tuple vt;
std::tuple<int,int,int> t1;
std::vector<int>::iterator iter;
int sizev=v1.size();
for(int i=0; i < sizev ; i++)
auto countnu = count(begin(v2),end(v2),v1[i]);
if(countnu > 0)
auto t = std::make_tuple(v1[i], countnu, i);
for (int i=0; i < vt.size(); i++)
cout << get<0>(vt[i]) << " " ;
cout << get<1>(vt[i]) << " " ;
cout << get<2>(vt[i]) << " \n" ;
Expected results:
6 2 3
4 2 6
1 3 0
5 4 5
Going further, we could make this operation a little more generic and 'library-worthy' by allowing the ordering predicate and the indices to be passed as parameters (this solution required c++14):
namespace detail {
template<class Pred, std::size_t...Is>
struct order_by_parts
order_by_parts(Pred&& pred)
: _pred(std::move(pred))
static auto sort_order(const std::tuple<Ts...>& t)
return std::tie(std::get<Is>(t)...);
bool operator()(const std::tuple<Ts...>& l,
const std::tuple<Ts...>& r) const
return _pred(sort_order(l), sort_order(r));
Pred _pred;
template<class Pred, size_t...Is>
auto order_by_parts(Pred&& pred, std::index_sequence<Is...>)
using pred_type = std::decay_t<Pred>;
using functor_type = detail::order_by_parts<pred_type, Is...>;
return functor_type(std::forward<Pred>(pred));
Now we can sort like so:
order_by_parts(std::less<>(), // use predicate less<void>
std::index_sequence<1, 2>())); // pack indices into a sequence
Using lambdas:
std::sort(std::begin(vt),std::end(vt),[](const auto& l,const auto& r){
if(std::get<1>(l)== std::get<1>(r)){
return std::get<2>(l) < std::get<2>(r);
return std::get<1>(l) < std::get<1>(r);

What is the optimal way to concatenate two vectors whilst transforming elements of one vector?

Suppose I have
std::vector<T1> vec1 {/* filled with T1's */};
std::vector<T2> vec2 {/* filled with T2's */};
and some function T1 f(T2) which could of course be a lambda. What is the optimal way to concatenate vec1 and vec2 whilst applying f to each T2 in vec2?
The apparently obvious solution is std::transform, i.e.
vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), f);
but I say this is not optimal as std::back_inserter must make an unnecessary capacity check on each inserted element. What would be optimal is something like
vec1.insert(vec1.end(), vec2.begin(), vec2.end(), f);
which could get away with a single capacity check. Sadly this is not valid C++. Essentially this is the same reason why std::vector::insert is optimal for vector concatenation, see this question and the comments in this question for further discussion on this point.
Is std::transform the optimal method using the STL?
If so, can we do better?
Is there a good reason why the insert function described above was left out of the STL?
I've had a go at verifying if the multiple capacity checks do have any noticeable cost. To do this I basically just pass the id function (f(x) = x) to the std::transform and push_back methods discussed in the answers. The full code is:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <cstdint>
#include <chrono>
#include <numeric>
#include <random>
using std::size_t;
std::vector<int> generate_random_ints(size_t n)
std::default_random_engine generator;
auto seed1 = std::chrono::system_clock::now().time_since_epoch().count();
generator.seed((unsigned) seed1);
std::uniform_int_distribution<int> uniform {};
std::vector<int> v(n);
std::generate_n(v.begin(), n, [&] () { return uniform(generator); });
return v;
template <typename D=std::chrono::nanoseconds, typename F>
D benchmark(F f, unsigned num_tests)
D total {0};
for (unsigned i = 0; i < num_tests; ++i) {
auto start = std::chrono::system_clock::now();
auto end = std::chrono::system_clock::now();
total += std::chrono::duration_cast<D>(end - start);
return D {total / num_tests};
template <typename T>
void std_insert(std::vector<T> vec1, const std::vector<T> &vec2)
vec1.insert(vec1.end(), vec2.begin(), vec2.end());
template <typename T1, typename T2, typename UnaryOperation>
void push_back_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
vec1.reserve(vec1.size() + vec2.size());
for (const auto& x : vec2) {
template <typename T1, typename T2, typename UnaryOperation>
void transform_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), op);
int main(int argc, char **argv)
unsigned num_tests {1000};
size_t vec1_size {10000000};
size_t vec2_size {10000000};
auto vec1 = generate_random_ints(vec1_size);
auto vec2 = generate_random_ints(vec1_size);
auto f_std_insert = [&vec1, &vec2] () {
std_insert(vec1, vec2);
auto f_push_back_id = [&vec1, &vec2] () {
push_back_concat(vec1, vec2, [] (int i) { return i; });
auto f_transform_id = [&vec1, &vec2] () {
transform_concat(vec1, vec2, [] (int i) { return i; });
auto std_insert_time = benchmark<std::chrono::milliseconds>(f_std_insert, num_tests).count();
auto push_back_id_time = benchmark<std::chrono::milliseconds>(f_push_back_id, num_tests).count();
auto transform_id_time = benchmark<std::chrono::milliseconds>(f_transform_id, num_tests).count();
std::cout << "std_insert: " << std_insert_time << "ms" << std::endl;
std::cout << "push_back_id: " << push_back_id_time << "ms" << std::endl;
std::cout << "transform_id: " << transform_id_time << "ms" << std::endl;
return 0;
Compiled with:
g++ vector_insert_demo.cpp -std=c++11 -O3 -o vector_insert_demo
std_insert: 44ms
push_back_id: 61ms
transform_id: 61ms
The compiler will have inlined the lambda, so that cost can be safely be discounted. Unless anyone else has a viable explanation for these results (or is willing to check the assembly), I think it's reasonable to conclude there is a noticeable cost of the multiple capacity checks.
UPDATE: The performance difference is due to the reserve() calls, which, in libstdc++ at least, make the capacity be exactly what you request instead of using the exponential growth factor.
I did some timing tests, with interesting results. Using std::vector::insert along with boost::transform_iterator was the fastest way I found by a large margin:
Version 1:
std::vector<int> &vec1,
const std::vector<float> &vec2
auto v2begin = boost::make_transform_iterator(vec2.begin(),f);
auto v2end = boost::make_transform_iterator(vec2.end(),f);
Version 2:
std::vector<int> &vec1,
const std::vector<float> &vec2
for (auto x : vec2) {
Version 3:
std::vector<int> &vec1,
const std::vector<float> &vec2
Version 1: 0.59s
Version 2: 8.22s
Version 3: 8.42s
#include <algorithm>
#include <cassert>
#include <chrono>
#include <iterator>
#include <iostream>
#include <random>
#include <vector>
#include "appendtransformed.hpp"
using std::cerr;
template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
auto distribution = std::uniform_int_distribution<int>(0,999);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<int>();
return vec;
template <typename Engine>
static std::vector<float> randomFloats(Engine &engine,size_t n)
auto distribution = std::uniform_real_distribution<float>(0,1000);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<float>();
return vec;
static auto
appendTransformedFunction(int version) ->
void(*)(std::vector<int>&,const std::vector<float> &)
switch (version) {
case 1: return appendTransformed1;
case 2: return appendTransformed2;
case 3: return appendTransformed3;
cerr << "Unknown version: " << version << "\n";
return 0;
int main(int argc,char **argv)
if (argc!=2) {
cerr << "Usage: appendtest (1|2|3)\n";
auto version = atoi(argv[1]);
auto engine = std::default_random_engine();
auto vec1_size = 1000000u;
auto vec2_size = 1000000u;
auto count = 100;
auto vec1 = randomInts(engine,vec1_size);
auto vec2 = randomFloats(engine,vec2_size);
namespace chrono = std::chrono;
using chrono::system_clock;
auto appendTransformed = appendTransformedFunction(version);
auto start_time = system_clock::now();
for (auto i=0; i!=count; ++i) {
auto end_time = system_clock::now();
assert(vec1.size() == vec1_size+count*vec2_size);
auto sum = std::accumulate(vec1.begin(),vec1.end(),0u);
auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();
cerr << "Using version " << version << ":\n";
cerr << " sum=" << sum << "\n";
cerr << " elapsed: " << elapsed_seconds << "s\n";
Compiler: g++ 4.9.1
Options: -std=c++11 -O2
Is std::transform the optimal method using the STL?
I can't say that. If you reserve space, the difference should be ephemeral because the check might be optimized out by either the compiler or the CPU. The only way to find out is to measure your real code.
If you don't have a particular need, you should go for std::transform.
If so, can we do better?
What you want to have:
Reduce length checks
Take advantage of move semantics when push'n_back
You might also want to create a binary function, if needed.
template <typename InputIt, typename OutputIt, typename UnaryCallable>
void move_append(InputIt first, InputIt last, OutputIt firstOut, OutputIt lastOut, UnaryCallable fn)
if (std::distance(first, last) < std::distance(firstOut, lastOut)
while (first != last && firstOut != lastOut) {
*firstOut++ = std::move( fn(*first++) );
a call could be:
std::vector<T1> vec1 {/* filled with T1's */};
std::vector<T2> vec2 {/* filled with T2's */};
// ...
vec1.resize( vec1.size() + vec2.size() );
move_append( vec1.begin(), vec1.end(), vec2.begin(), vec2.end(), f );
I'm not sure you can do this with plain algorithms because back_inserter would call Container::push_back which will check in any case for reallocation. Also, the element won't be able to benefit from move semantics.
Note: the safety check depends on your usage, based on how you pass the elements to append. Also it should return a bool.
Some measurements here. I can't explain that big discrepancy.
I do not get the same results as #VaughnCato - although I do a slightly different test of std::string to int. According to my tests the push_back and std::transform methods are equally good, while the boost::transform method is slightly worse. Here is my full code:
I included another test case that instead of using reserve and back_inserter, just uses resize. This is essentially the same method as in #black's answer, and also the method suggested by #ChrisDrew in the question comments. I also performed the test 'both ways' that is std::string -> int, and int -> std::string.
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <cstdint>
#include <chrono>
#include <numeric>
#include <random>
#include <boost/iterator/transform_iterator.hpp>
using std::size_t;
std::vector<int> generate_random_ints(size_t n)
std::default_random_engine generator;
auto seed1 = std::chrono::system_clock::now().time_since_epoch().count();
generator.seed((unsigned) seed1);
std::uniform_int_distribution<int> uniform {};
std::vector<int> v(n);
std::generate_n(v.begin(), n, [&] () { return uniform(generator); });
return v;
std::vector<std::string> generate_random_strings(size_t n)
std::default_random_engine generator;
auto seed1 = std::chrono::system_clock::now().time_since_epoch().count();
generator.seed((unsigned) seed1);
std::uniform_int_distribution<int> uniform {};
std::vector<std::string> v(n);
std::generate_n(v.begin(), n, [&] () { return std::to_string(uniform(generator)); });
return v;
template <typename D=std::chrono::nanoseconds, typename F>
D benchmark(F f, unsigned num_tests)
D total {0};
for (unsigned i = 0; i < num_tests; ++i) {
auto start = std::chrono::system_clock::now();
auto end = std::chrono::system_clock::now();
total += std::chrono::duration_cast<D>(end - start);
return D {total / num_tests};
template <typename T1, typename T2, typename UnaryOperation>
void push_back_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
vec1.reserve(vec1.size() + vec2.size());
for (const auto& x : vec2) {
template <typename T1, typename T2, typename UnaryOperation>
void transform_concat_reserve(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), op);
template <typename T1, typename T2, typename UnaryOperation>
void transform_concat_resize(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
auto vec1_size = vec1.size();
vec1.resize(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), vec1.begin() + vec1_size, op);
template <typename T1, typename T2, typename UnaryOperation>
void boost_transform_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
auto v2_begin = boost::make_transform_iterator(vec2.begin(), op);
auto v2_end = boost::make_transform_iterator(vec2.end(), op);
vec1.insert(vec1.end(), v2_begin, v2_end);
int main(int argc, char **argv)
unsigned num_tests {1000};
size_t vec1_size {1000000};
size_t vec2_size {1000000};
// Switch the variable names to inverse test
auto vec1 = generate_random_ints(vec1_size);
auto vec2 = generate_random_strings(vec2_size);
auto op = [] (const std::string& str) { return std::stoi(str); };
//auto op = [] (int i) { return std::to_string(i); };
auto f_push_back_concat = [&vec1, &vec2, &op] () {
push_back_concat(vec1, vec2, op);
auto f_transform_concat_reserve = [&vec1, &vec2, &op] () {
transform_concat_reserve(vec1, vec2, op);
auto f_transform_concat_resize = [&vec1, &vec2, &op] () {
transform_concat_resize(vec1, vec2, op);
auto f_boost_transform_concat = [&vec1, &vec2, &op] () {
boost_transform_concat(vec1, vec2, op);
auto push_back_concat_time = benchmark<std::chrono::milliseconds>(f_push_back_concat, num_tests).count();
auto transform_concat_reserve_time = benchmark<std::chrono::milliseconds>(f_transform_concat_reserve, num_tests).count();
auto transform_concat_resize_time = benchmark<std::chrono::milliseconds>(f_transform_concat_resize, num_tests).count();
auto boost_transform_concat_time = benchmark<std::chrono::milliseconds>(f_boost_transform_concat, num_tests).count();
std::cout << "push_back: " << push_back_concat_time << "ms" << std::endl;
std::cout << "transform_reserve: " << transform_concat_reserve_time << "ms" << std::endl;
std::cout << "transform_resize: " << transform_concat_resize_time << "ms" << std::endl;
std::cout << "boost_transform: " << boost_transform_concat_time << "ms" << std::endl;
return 0;
Compiled using:
g++ vector_concat.cpp -std=c++11 -O3 -o vector_concat_test
The results (mean user-times) are :
| Method | std::string -> int (ms) | int -> std::string (ms) |
| push_back | 68 | 206 |
| std::transform (reserve) | 68 | 202 |
| std::transform (resize) | 67 | 218 |
| boost::transform | 70 | 238 |
The std::transform method using resize is likely optimal (using STL) for trivial to default-construct types.
The std::transform method using reserve and back_inserter is most likely the best we can do otherwise.

Slicing std::array

Is there an easy way to get a slice of an array in C++?
I.e., I've got
array<double, 10> arr10;
and want to get array consisting of five first elements of arr10:
array<double, 5> arr5 = arr10.???
(other than populating it by iterating through first array)
The constructors for std::array are implicitly defined so you can't initialize it with a another container or a range from iterators. The closest you can get is to create a helper function that takes care of the copying during construction. This allows for single phase initialization which is what I believe you're trying to achieve.
template<class X, class Y>
X CopyArray(const Y& src, const size_t size)
X dst;
std::copy(src.begin(), src.begin() + size, dst.begin());
return dst;
std::array<int, 5> arr5 = CopyArray<decltype(arr5)>(arr10, 5);
You can also use something like std::copy or iterate through the copy yourself.
std::copy(arr10.begin(), arr10.begin() + 5, arr5.begin());
Sure. Wrote this:
template<int...> struct seq {};
template<typename seq> struct seq_len;
template<int s0,int...s>
struct seq_len<seq<s0,s...>>:
std::integral_constant<std::size_t,seq_len<seq<s...>>::value> {};
struct seq_len<seq<>>:std::integral_constant<std::size_t,0> {};
template<int Min, int Max, int... s>
struct make_seq: make_seq<Min, Max-1, Max-1, s...> {};
template<int Min, int... s>
struct make_seq<Min, Min, s...> {
typedef seq<s...> type;
template<int Max, int Min=0>
using MakeSeq = typename make_seq<Min,Max>::type;
template<std::size_t src, typename T, int... indexes>
std::array<T, sizeof...(indexes)> get_elements( seq<indexes...>, std::array<T, src > const& inp ) {
return { inp[indexes]... };
template<int len, typename T, std::size_t src>
auto first_elements( std::array<T, src > const& inp )
-> decltype( get_elements( MakeSeq<len>{}, inp ) )
return get_elements( MakeSeq<len>{}, inp );
Where the compile time indexes... does the remapping, and MakeSeq makes a seq from 0 to n-1.
Live example.
This supports both an arbitrary set of indexes (via get_elements) and the first n (via first_elements).
std::array< int, 10 > arr = {0,1,2,3,4,5,6,7,8,9};
std::array< int, 6 > slice = get_elements(arr, seq<2,0,7,3,1,0>() );
std::array< int, 5 > start = first_elements<5>(arr);
which avoids all loops, either explicit or implicit.
2018 update, if all you need is first_elements:
Less boilerplaty solution using C++14 (building up on Yakk's pre-14 answer and stealing from "unpacking" a tuple to call a matching function pointer)
template < std::size_t src, typename T, int... I >
std::array< T, sizeof...(I) > get_elements(std::index_sequence< I... >, std::array< T, src > const& inp)
return { inp[I]... };
template < int N, typename T, std::size_t src >
auto first_elements(std::array<T, src > const& inp)
-> decltype(get_elements(std::make_index_sequence<N>{}, inp))
return get_elements(std::make_index_sequence<N>{}, inp);
Still cannot explain why this works, but it does (for me on Visual Studio 2017).
This answer might be late... but I was just toying around with slices - so here is my little home brew of std::array slices.
Of course, this comes with a few restrictions and is not ultimately general:
The source array from which a slice is taken must not go out of scope. We store a reference to the source.
I was looking for constant array slices first and did not try to expand this code to both const and non const slices.
But one nice feature of the code below is, that you can take slices of slices...
// ParCompDevConsole.cpp : This file contains the 'main' function. Program execution begins and ends there.
#include "pch.h"
#include <cstdint>
#include <iostream>
#include <array>
#include <stdexcept>
#include <sstream>
#include <functional>
template <class A>
class ArraySliceC
using Array_t = A;
using value_type = typename A::value_type;
using const_iterator = typename A::const_iterator;
ArraySliceC(const Array_t & source, size_t ifirst, size_t length)
: m_ifirst{ ifirst }
, m_length{ length }
, m_source{ source }
if (source.size() < (ifirst + length))
std::ostringstream os;
os << "ArraySliceC::ArraySliceC(<source>,"
<< ifirst << "," << length
<< "): out of bounds. (ifirst + length >= <source>.size())";
throw std::invalid_argument( os.str() );
size_t size() const
return m_length;
const value_type& at( size_t index ) const
return m_ifirst + index );
const value_type& operator[]( size_t index ) const
return m_source[m_ifirst + index];
const_iterator cbegin() const
return m_source.cbegin() + m_ifirst;
const_iterator cend() const
return m_source.cbegin() + m_ifirst + m_length;
size_t m_ifirst;
size_t m_length;
const Array_t& m_source;
template <class T, size_t SZ>
std::ostream& operator<<( std::ostream& os, const std::array<T,SZ>& arr )
if (arr.size() == 0)
os << "[||]";
os << "[| " << 0 );
for (auto it = arr.cbegin() + 1; it != arr.cend(); it++)
os << "," << (*it);
os << " |]";
return os;
template<class A>
std::ostream& operator<<( std::ostream& os, const ArraySliceC<A> & slice )
if (slice.size() == 0)
os << "^[||]";
os << "^[| " << 0 );
for (auto it = slice.cbegin() + 1; it != slice.cend(); it++)
os << "," << (*it);
os << " |]";
return os;
template<class A>
A unfoldArray( std::function< typename A::value_type( size_t )> producer )
A result;
for (size_t i = 0; i < result.size(); i++)
result[i] = producer( i );
return result;
int main()
using A = std::array<float, 10>;
auto idf = []( size_t i ) -> float { return static_cast<float>(i); };
const auto values = unfoldArray<A>(idf);
std::cout << "values = " << values << std::endl;
// zero copy slice of values array.
auto sl0 = ArraySliceC( values, 2, 4 );
std::cout << "sl0 = " << sl0 << std::endl;
// zero copy slice of the sl0 (the slice of values array)
auto sl01 = ArraySliceC( sl0, 1, 2 );
std::cout << "sl01 = " << sl01 << std::endl;
return 0;