I have custom array constructor like below:
rtc::ArrayView<const uint8_t> frame,
rtc::ArrayView<uint8_t> encrypted_frame,
uint8_t unencrypted_bytes = 10;
How could I efficiently loop into these frames and do processing for it? Is only for loop the possible option? If we just want to copy the frame without preprocessing, I know that we could just copy using std::copy. Is there any ways to make this iterator processing more efficient?
// // Copy rest of frame
// std::copy(frame.begin() + unencrypted_bytes, frame.begin() +
// (encrypted_frame.size() - 41),
// encrypted_frame.begin() + unencrypted_bytes);
// Doing XOR for Frame
for (size_t i = unencrypted_bytes; i < encrypted_frame.size() - 41; i++) {
// encrypted_frame[i] = i;
RTC_LOG(LS_INFO) << "Ivan, original frame Before XOR : " << i << " "
<< frame[i];
encrypted_frame[i] = frame[i] ^ fake_key_;
RTC_LOG(LS_INFO) << "Ivan, encrypted frame After XOR : " << i << " "
<< encrypted_frame[i];
Below is my array view constructor
* Copyright 2015 The WebRTC Project Authors. All rights reserved.
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
#include <algorithm>
#include <array>
#include <iterator>
#include <type_traits>
#include "rtc_base/checks.h"
#include "rtc_base/type_traits.h"
namespace rtc {
// tl;dr: rtc::ArrayView is the same thing as gsl::span from the Guideline
// Support Library.
// Many functions read from or write to arrays. The obvious way to do this is
// to use two arguments, a pointer to the first element and an element count:
// bool Contains17(const int* arr, size_t size) {
// for (size_t i = 0; i < size; ++i) {
// if (arr[i] == 17)
// return true;
// }
// return false;
// }
// This is flexible, since it doesn't matter how the array is stored (C array,
// std::vector, rtc::Buffer, ...), but it's error-prone because the caller has
// to correctly specify the array length:
// Contains17(arr, arraysize(arr)); // C array
// Contains17(, arr.size()); // std::vector
// Contains17(arr, size); // pointer + size
// ...
// It's also kind of messy to have two separate arguments for what is
// conceptually a single thing.
// Enter rtc::ArrayView<T>. It contains a T pointer (to an array it doesn't
// own) and a count, and supports the basic things you'd expect, such as
// indexing and iteration. It allows us to write our function like this:
// bool Contains17(rtc::ArrayView<const int> arr) {
// for (auto e : arr) {
// if (e == 17)
// return true;
// }
// return false;
// }
// And even better, because a bunch of things will implicitly convert to
// ArrayView, we can call it like this:
// Contains17(arr); // C array
// Contains17(arr); // std::vector
// Contains17(rtc::ArrayView<int>(arr, size)); // pointer + size
// Contains17(nullptr); // nullptr -> empty ArrayView
// ...
// ArrayView<T> stores both a pointer and a size, but you may also use
// ArrayView<T, N>, which has a size that's fixed at compile time (which means
// it only has to store the pointer).
// One important point is that ArrayView<T> and ArrayView<const T> are
// different types, which allow and don't allow mutation of the array elements,
// respectively. The implicit conversions work just like you'd hope, so that
// e.g. vector<int> will convert to either ArrayView<int> or ArrayView<const
// int>, but const vector<int> will convert only to ArrayView<const int>.
// (ArrayView itself can be the source type in such conversions, so
// ArrayView<int> will convert to ArrayView<const int>.)
// Note: ArrayView is tiny (just a pointer and a count if variable-sized, just
// a pointer if fix-sized) and trivially copyable, so it's probably cheaper to
// pass it by value than by const reference.
namespace impl {
// Magic constant for indicating that the size of an ArrayView is variable
// instead of fixed.
enum : std::ptrdiff_t { kArrayViewVarSize = -4711 };
// Base class for ArrayViews of fixed nonzero size.
template <typename T, std::ptrdiff_t Size>
class ArrayViewBase {
static_assert(Size > 0, "ArrayView size must be variable or non-negative");
ArrayViewBase(T* data, size_t size) : data_(data) {}
static constexpr size_t size() { return Size; }
static constexpr bool empty() { return false; }
T* data() const { return data_; }
static constexpr bool fixed_size() { return true; }
T* data_;
// Specialized base class for ArrayViews of fixed zero size.
template <typename T>
class ArrayViewBase<T, 0> {
explicit ArrayViewBase(T* data, size_t size) {}
static constexpr size_t size() { return 0; }
static constexpr bool empty() { return true; }
T* data() const { return nullptr; }
static constexpr bool fixed_size() { return true; }
// Specialized base class for ArrayViews of variable size.
template <typename T>
class ArrayViewBase<T, impl::kArrayViewVarSize> {
ArrayViewBase(T* data, size_t size)
: data_(size == 0 ? nullptr : data), size_(size) {}
size_t size() const { return size_; }
bool empty() const { return size_ == 0; }
T* data() const { return data_; }
static constexpr bool fixed_size() { return false; }
T* data_;
size_t size_;
} // namespace impl
template <typename T, std::ptrdiff_t Size = impl::kArrayViewVarSize>
class ArrayView final : public impl::ArrayViewBase<T, Size> {
using value_type = T;
using const_iterator = const T*;
// Construct an ArrayView from a pointer and a length.
template <typename U>
ArrayView(U* data, size_t size)
: impl::ArrayViewBase<T, Size>::ArrayViewBase(data, size) {
RTC_DCHECK_EQ(size == 0 ? nullptr : data, this->data());
RTC_DCHECK_EQ(size, this->size());
this->size() == 0); // data is null iff size == 0.
// Construct an empty ArrayView. Note that fixed-size ArrayViews of size > 0
// cannot be empty.
ArrayView() : ArrayView(nullptr, 0) {}
ArrayView(std::nullptr_t) // NOLINT
: ArrayView() {}
ArrayView(std::nullptr_t, size_t size)
: ArrayView(static_cast<T*>(nullptr), size) {
static_assert(Size == 0 || Size == impl::kArrayViewVarSize, "");
RTC_DCHECK_EQ(0, size);
// Construct an ArrayView from a C-style array.
template <typename U, size_t N>
ArrayView(U (&array)[N]) // NOLINT
: ArrayView(array, N) {
static_assert(Size == N || Size == impl::kArrayViewVarSize,
"Array size must match ArrayView size");
// (Only if size is fixed.) Construct a fixed size ArrayView<T, N> from a
// non-const std::array instance. For an ArrayView with variable size, the
// used ctor is ArrayView(U& u) instead.
template <typename U,
size_t N,
typename std::enable_if<
Size == static_cast<std::ptrdiff_t>(N)>::type* = nullptr>
ArrayView(std::array<U, N>& u) // NOLINT
: ArrayView(, u.size()) {}
// (Only if size is fixed.) Construct a fixed size ArrayView<T, N> where T is
// const from a const(expr) std::array instance. For an ArrayView with
// variable size, the used ctor is ArrayView(U& u) instead.
template <typename U,
size_t N,
typename std::enable_if<
Size == static_cast<std::ptrdiff_t>(N)>::type* = nullptr>
ArrayView(const std::array<U, N>& u) // NOLINT
: ArrayView(, u.size()) {}
// (Only if size is fixed.) Construct an ArrayView from any type U that has a
// static constexpr size() method whose return value is equal to Size, and a
// data() method whose return value converts implicitly to T*. In particular,
// this means we allow conversion from ArrayView<T, N> to ArrayView<const T,
// N>, but not the other way around. We also don't allow conversion from
// ArrayView<T> to ArrayView<T, N>, or from ArrayView<T, M> to ArrayView<T,
// N> when M != N.
template <
typename U,
typename std::enable_if<Size != impl::kArrayViewVarSize &&
HasDataAndSize<U, T>::value>::type* = nullptr>
ArrayView(U& u) // NOLINT
: ArrayView(, u.size()) {
static_assert(U::size() == Size, "Sizes must match exactly");
template <
typename U,
typename std::enable_if<Size != impl::kArrayViewVarSize &&
HasDataAndSize<U, T>::value>::type* = nullptr>
ArrayView(const U& u) // NOLINT(runtime/explicit)
: ArrayView(, u.size()) {
static_assert(U::size() == Size, "Sizes must match exactly");
// (Only if size is variable.) Construct an ArrayView from any type U that
// has a size() method whose return value converts implicitly to size_t, and
// a data() method whose return value converts implicitly to T*. In
// particular, this means we allow conversion from ArrayView<T> to
// ArrayView<const T>, but not the other way around. Other allowed
// conversions include
// ArrayView<T, N> to ArrayView<T> or ArrayView<const T>,
// std::vector<T> to ArrayView<T> or ArrayView<const T>,
// const std::vector<T> to ArrayView<const T>,
// rtc::Buffer to ArrayView<uint8_t> or ArrayView<const uint8_t>, and
// const rtc::Buffer to ArrayView<const uint8_t>.
template <
typename U,
typename std::enable_if<Size == impl::kArrayViewVarSize &&
HasDataAndSize<U, T>::value>::type* = nullptr>
ArrayView(U& u) // NOLINT
: ArrayView(, u.size()) {}
template <
typename U,
typename std::enable_if<Size == impl::kArrayViewVarSize &&
HasDataAndSize<U, T>::value>::type* = nullptr>
ArrayView(const U& u) // NOLINT(runtime/explicit)
: ArrayView(, u.size()) {}
// Indexing and iteration. These allow mutation even if the ArrayView is
// const, because the ArrayView doesn't own the array. (To prevent mutation,
// use a const element type.)
T& operator[](size_t idx) const {
RTC_DCHECK_LT(idx, this->size());
return this->data()[idx];
T* begin() const { return this->data(); }
T* end() const { return this->data() + this->size(); }
const T* cbegin() const { return this->data(); }
const T* cend() const { return this->data() + this->size(); }
std::reverse_iterator<T*> rbegin() const {
return std::make_reverse_iterator(end());
std::reverse_iterator<T*> rend() const {
return std::make_reverse_iterator(begin());
std::reverse_iterator<const T*> crbegin() const {
return std::make_reverse_iterator(cend());
std::reverse_iterator<const T*> crend() const {
return std::make_reverse_iterator(cbegin());
ArrayView<T> subview(size_t offset, size_t size) const {
return offset < this->size()
? ArrayView<T>(this->data() + offset,
std::min(size, this->size() - offset))
: ArrayView<T>();
ArrayView<T> subview(size_t offset) const {
return subview(offset, this->size());
// Comparing two ArrayViews compares their (pointer,size) pairs; it does *not*
// dereference the pointers.
template <typename T, std::ptrdiff_t Size1, std::ptrdiff_t Size2>
bool operator==(const ArrayView<T, Size1>& a, const ArrayView<T, Size2>& b) {
return == && a.size() == b.size();
template <typename T, std::ptrdiff_t Size1, std::ptrdiff_t Size2>
bool operator!=(const ArrayView<T, Size1>& a, const ArrayView<T, Size2>& b) {
return !(a == b);
// Variable-size ArrayViews are the size of two pointers; fixed-size ArrayViews
// are the size of one pointer. (And as a special case, fixed-size ArrayViews
// of size 0 require no storage.)
static_assert(sizeof(ArrayView<int>) == 2 * sizeof(int*), "");
static_assert(sizeof(ArrayView<int, 17>) == sizeof(int*), "");
static_assert(std::is_empty<ArrayView<int, 0>>::value, "");
template <typename T>
inline ArrayView<T> MakeArrayView(T* data, size_t size) {
return ArrayView<T>(data, size);
// Only for primitive types that have the same size and aligment.
// Allow reinterpret cast of the array view to another primitive type of the
// same size.
// Template arguments order is (U, T, Size) to allow deduction of the template
// arguments in client calls: reinterpret_array_view<target_type>(array_view).
template <typename U, typename T, std::ptrdiff_t Size>
inline ArrayView<U, Size> reinterpret_array_view(ArrayView<T, Size> view) {
static_assert(sizeof(U) == sizeof(T) && alignof(U) == alignof(T),
"ArrayView reinterpret_cast is only supported for casting "
"between views that represent the same chunk of memory.");
std::is_fundamental<T>::value && std::is_fundamental<U>::value,
"ArrayView reinterpret_cast is only supported for casting between "
"fundamental types.");
return ArrayView<U, Size>(reinterpret_cast<U*>(, view.size());
} // namespace rtc
#endif // API_ARRAY_VIEW_H_
This looks like WebRTC code. And if I had to guess, you're encrypting the media bytes of an RTP packet (just a guess). And so you probably want that to be fast.
I'm going to assume you recognize that the RTC_LOG statements in your main loop are likely more of a loop performance killer than anything else you can do to optimize the xor encryption. It's going to negate whatever optimizations you do if you are logging each individual byte. So let's start with this.
for (size_t i = unencrypted_bytes; i < encrypted_frame.size() - 41; i++) {
encrypted_frame[i] = frame[i] ^ fake_key_;
The operator overload for [] looks like this:
T& operator[](size_t idx) const {
RTC_DCHECK_LT(idx, this->size());
return this->data()[idx];
So that means every iteration a call to data() for both the source and destination arrays. And operator[] overload does some additional validation checks. In a release build, the compiler may be able to optimize most of that away since. But I don't know that for a fact because I don't know if the compiler will optimize your ArrayView like it would operations on a std:: collection class. Nor do I know if those RTC_DCHECK macros are no-ops in a release build.
But in a Debug build, it will be really slow. So if we can make debug fast, we can assume it carries over to your release build.
We can make sure our primary loop that iterates over the bytes and doesn't make any function calls within the loop. That's going to be your biggest speed up. Hence, this will be much faster than what you have:
uint8_t* frame_data =;
uint8_t* encrypted_data =
const size_t stop = i < encrypted_frame.size() - 41;
for (size_t i = frame_data + unencrypted_bytes; i < stop; i++) {
encrypted_data[i] = frame_data[i] ^ fake_key_;
You could optionally use std::transform instead of a for-loop, but I think that will be nearly equivalent.
Again, it's entirely possible the compiler will optimize out the original function be as good as what I just produced. But since ArrayView doesn't compile locally for me (don't have the webrtc sources handy), I don't know. Otherwise, if I could, I'd have all my assumptions validated on godbolt.
But I do know from experience that a function call per element in really tight loop iterating over bytes or words, even if declared inline, is never as fast as manually inlining all the code you need directly into the loop.
For xor-ing a source range to a destination range, I'd use std::transform:
frame.cbegin() + unencrypted_bytes,
frame.cend() - 41,
[=] (const auto byte) -> std::uint8_t { return byte ^ fake_key_; });
In C++, it is often the case that clearly expressing the intention of the program with high-level abstractions using the standard library is the best choice. Source code should be used to express what the program needs to do, and avoid dictating how to do it as much as possible*, because that would constrain compiler optimizations from coming up with the best possible approach.
* Unless you really know what you're doing and you have the benchmarks to prove it
It would also be nice to make use of std::bit_xor, but that would require two input ranges in order to invoke the overload for std::transform that accepts a binary operator. Assuming the constant fake_key_ is a std::uint8_t, here's the definition for an iterator to model an infinite, filled range:
template <class T>
struct filled {
using value_type = T;
using difference_type = std::ptrdiff_t;
using reference = const T &;
using pointer = const T *;
using iterator_category = std::input_iterator_tag;
constexpr filled() noexcept = default;
constexpr filled(const filled &) = default;
constexpr filled(filled &&) = default;
constexpr filled(reference value) : value{value} {}
constexpr filled &operator=(const filled &) = default;
constexpr filled &operator=(filled &&) = default;
constexpr ~filled() = default;
constexpr bool operator==(const filled &) const noexcept { return false; }
constexpr reference operator*() const noexcept { return value; }
constexpr pointer operator->() const noexcept {
return std::addressof(value);
constexpr filled &operator++() noexcept { return *this; }
constexpr filled operator++(int) { return *this; }
value_type value;
Enabling the use of this overload:
frame.cbegin() + unencrypted_bytes,
frame.cend() - 41,
There is a good talk by Jason Turner and Ben Deane from C++Now 2017 called "Constexpr all the things" which also gives a constexpr vector implementation. I was dabbling with the idea myself, for educational purposes. My constexpr vector was pure in the sense that pushing back to it would return a new vector with added element.
During the talk, I saw a push_back implementation tat looks like more or less following:
constexpr void push_back(T const& e) {
if(size_ >= Size)
throw std::range_error("can't use more than Size");
else {
storage_[size_++] = e;
They were taking the element by value and moving it but, I don't think this is the source of my problems. The thing I want to know is, how this function could be used in a constexpr context? This is not a const member function, it modifies the state. I don think it is possible to do something like
constexpr cv::vector<int> v1;
And if this is not possible, how could we use this thing in constexpr context and achieve the goal of the task using this vector, namely compile-time JSON parsing?
Here is my version, so that you can see both my new vector returning version and the version from the talk. (Note that performance, perfect forwarding etc. concerns are omitted)
#include <cstdint>
#include <array>
#include <type_traits>
namespace cx {
template <typename T, std::size_t Size = 10>
struct vector {
using iterator = typename std::array<T, Size>::iterator;
using const_iterator = typename std::array<T, Size>::const_iterator;
constexpr vector(std::initializer_list<T> const& l) {
for(auto& t : l) {
if(size_++ < Size)
storage_[size_] = std::move(t);
constexpr vector(vector const& o, T const& t) {
storage_ = o.storage_;
size_ = o.size_;
storage_[size_++] = t;
constexpr auto begin() const { return storage_.begin(); }
constexpr auto end() const { return storage_.begin() + size_; }
constexpr auto size() const { return size_; }
constexpr void push_back(T const& e) {
if(size_ >= Size)
throw std::range_error("can't use more than Size");
else {
storage_[size_++] = e;
std::array<T, Size> storage_{};
std::size_t size_{};
template <typename T>
constexpr auto make_vector(std::initializer_list<T> const& l) {
return cx::vector<int>{l};
template <typename T>
constexpr auto push_back(cx::vector<T> const& o, T const& t) {
return cx::vector<int>{o, t};
int main() {
constexpr auto v1 = make_vector({1, 2, 3});
static_assert(v1.size() == 3);
constexpr auto v2 = push_back(v1, 4);
static_assert(v2.size() == 4);
static_assert(std::is_same_v<decltype(v1), decltype(v2)>);
// v1.push_back(4); fails on a constexpr context
So, this thing made me realize there is probably something deep that I don' know about constexpr. So, recapping the question; how such a constexpr vector could offer a mutating push_back like that in a constexpr context? Seems like it is not working in a constexpr context right now. If push_back in a constexpr context is not intended to begin with, how can you call it a constexpr vector and use it for compile-time JSON parsing?
Your definition of vector is correct, but you can't modify constexpr objects. They are well and truly constant. Instead, do compile-time calculations inside constexpr functions (the output of which can then be assigned to constexpr objects).
For example, we can write a function range, which produces a vector of numbers from 0 to n. It uses push_back, and we can assign the result to a constexpr vector in main.
constexpr vector<int> range(int n) {
vector<int> v{};
for(int i = 0; i < n; i++) {
return v;
int main() {
constexpr vector<int> v = range(10);
Your return cx::vector<int>{o, t}; will produce a compilation error when o and t are of types cx::vector<T> and T respectively, because those are different types, while all elements of std::initializer_list<T> should be of same type (o is not expanded into a list of its elements).
If you're merely after your 'pure' implementation of push_back, then you can make do with standard arrays:
#include <array>
template <typename T, std::size_t N>
constexpr auto push_back(std::array<T, N> const& oldArr, T const& el) {
std::array<T, N+1> newArr{};
std::copy(begin(oldArr), end(oldArr), begin(newArr));
newArr[N] = el;
return newArr;
int main() {
constexpr auto a1 = std::to_array({1, 2, 3});
static_assert(a1.size() == 3);
constexpr auto a2 = push_back(a1, 4);
static_assert(a2.size() == 4);
// This assert will still fail though, because push_back's implementation
// above not only returns new array, but also a new type.
// For example, std::array<int, 3> is not the same type as std::array<int, 4>
//static_assert(std::is_same_v<decltype(a1), decltype(a2)>);
Declaring multidimensional arrays with static size is quite easy in C++ and the array then is stored in one continuous block of memory (row major layout).
However declaring dynamically allocated multidimensional arrays (size known only at runtime) in C++ is quite tricky as discussed in other SO thread regarding arrays. To preserve the same syntax with multiple square brackets (in case of 2D array) you need to create an array of pointers, which point to another set of arrays (rows). With more dimensions it adds more (unnecessary) levels of indirection, memory fragmentation, and with small sizes of the array the pointers can take more memory than then the actual data.
One of the solutions is to use 1D array and then recalculate the indices.
3D array with sizes 10, 3 and 5. I want an element at positions 3, 1, 4 instead of writing 3darray[3][1][4] I would write 3darray[index], where index would be calculated as 3*(y_dym_size*z_dym_size) + 1*(z_dym_size) + 4 which, when substituted, results to 3*(3*5)+1*(5)+4.
I can easily make a class that encapsulates a dynamically allocated array and recomputes in indices in the presented manner, but this is not practical, as it needs to be written for every number of dimensions.
I would like to create a template that would work for arbitrary number of dimensions with zero overhead (which is the spirit of modern C++ - having reusable code/classes where more work is being shifted to the compiler). I have the following code that works for n-dimensional array, however does not have 0 overhead. It contains for loop and also have an array which is being used in the 1D resolution:
template <class T, size_t DIM>
class arrayND{
std::array<size_t, DIM> sizes;
std::array<size_t, DIM-1> access_multiplier;
vector<T> data;
using iterator = typename vector<T>::iterator;
using const_iterator = typename vector<T>::const_iterator;
template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
arrayND(Args&&... args) {
std::array<size_t, DIM> temp{args...};
sizes = temp;
size_t mult = 1;
for(int i = DIM-2; i >= 0; --i){
mult *= sizes[i+1];
access_multiplier[i] = mult;
template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
T& get(Args&&... args){
std::array<size_t, DIM> idx_copy{args...};
size_t index = idx_copy[DIM-1];
for(int i = DIM-2; i >= 0; --i){
index += idx_copy[i]*access_multiplier[i];
return data[index];
template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
T& operator()(Args&&... args){
return get(args...);
void set(const T& elem){
fill(begin(data), end(data), elem);
iterator begin(){
return begin(data);
iterator end(){
return end(data);
const_iterator begin() const{
return cbegin(data);
const_iterator end() const{
return cend(data);
Other approach I was thinking of was to utilise variadic templates, which would be hopefully - after compiler optimization - identical to code written specially for some number of dimensions:
int getIndex(size_t index){
return index;
template<typename... Args>
int getIndex(size_t index, Args... args){
return access_multiplier[DIM-sizeof...(Args)-1]*index + getIndex(args...);
template <typename... Args, typename std::enable_if_t<sizeof...(Args) == DIM, int> = 0>
T& get(Args&&... args){
return data[getIndex(args...)];
/*std::array<size_t, DIM> idx_copy{args...};
size_t index = idx_copy[DIM-1];
for(int i = DIM-2; i >= 0; --i){
index += idx_copy[i]*access_multiplier[i];
return data[index];*/
Is there a way in the current version (C++17) or the C++ language how to obtain both flexibility (arbitrary number of dimensions) and performance (zero overhead compares to code written specially for some number of dimensions)? If there has to be overhead then it makes more sense to hardcode it for lets say up to 5 dimensions.
Is there already an implementation of dynamics multidimensional array in some existing library?
Split the view from the storage.
An n-dimensional array view of T is a class with a pointer to T and some way of getting n-1 stride sizes. [] returns an n-1 dimensional array view.
There are two different flavours of such views. The first stores the strides, the second a pointer to a contiguous buffer of strides. Both have their advantages; the first with care can even optimize when some or all dimensions are fixed. But I'll do the 2nd.
template<class T, std::size_t N>
struct slice {
T* ptr=0;
std::size_t const* strides=0;
slice<T,N-1> operator[]( std::size_t i )const{
return { ptr + i**strides, strides+1 };
template<class T>
struct slice<T,1> {
T* ptr=0;
std::size_t const* strides=0;
T& operator[]( std::size_t i )const{
return *(ptr + i**strides);
this one permits per-element strides.
Now you just have to expose a stride<T,N> to do chained [] on. This is similar to how I'd write it for 3 dimensions.
If you prefer (x,y,z) syntax and your only problem is the for loop and are afraid the compiler did not flatten it, you can write it force flattened using pack expansion. But profile and examine optimized assembly first.
Consider an STL container C that is forward-iteratable. I need to access every step element, starting from idx. If C is a vector (i.e. has a random-access iterator) I can just use index arithmetic:
template <class Container>
void go(const Container& C) {
for(size_t i = idx; i<C.size(); i+=step) {
/* do something with C[i] */
However, if C does not support that, e.g. C is a list, one needs to rewrite the above solution. A quick attempt would be:
template <class Container>
void go(const Container& C) {
size_t max = C.size();
size_t i = idx;
for(auto it = std::next(C.begin(),idx); i < max; i+=step, it+=step) {
/* do something with *it */
Not much longer and it works... except that most likely it will trigger the undefined behavior. Both std::next and it+=step can potentially step way beyond the C.end() before i < max check is performed.
The solution I am currently using (not shown) is really bloated when compared to the initial for. I have separate check for the first iteration and those that follows. A lot of boilerplate code...
So, my question is, can the above pattern be written in a safe, and succinct way? Imagine you want to nest these loops 2 or 3 times. You don't want the whole page of code for that!
The code should be reasonably short
The code should have no overhead. Doing std::next(C.begin(), i) in each iteration over i is unnecessairly long, if you can just std::advance(it, step) instead.
The code should benefit from the case when it is indeed a random-access iterator when std::advance can be performed in constant time.
C is constant. I do not insert, erase or modify C within the loop.
You might use helper functions:
template <typename IT>
IT secure_next(IT it, std::size_t step, IT end, std::input_iterator_tag)
while (it != end && step--) {
return it;
template <typename IT>
IT secure_next(IT it, std::size_t step, IT end, std::random_access_iterator_tag)
return end - it < step ? end : it + step;
template <typename IT>
IT secure_next(IT it, std::size_t step, IT end)
return secure_next(it, step, end, typename std::iterator_traits<IT>::iterator_category{});
And then:
for (auto it = secure_next(C.begin(), idx, C.end());
it != C.end();
it = secure_next(it, step, C.end()) {
/* do something with *it */
Alternatively, with range-v3, you could do something like:
for (const auto& e : C | ranges::view::drop(idx) | ranges::view::stride(step)) {
/* do something with e */
The comment in the question about the requirements inspired me to implement this in terms of k * step instead of some other mechanism controlling the number of iterations over the container.
template <class Container>
void go(const Container& C)
const size_t sz = C.size();
if(idx >= sz) return;
size_t k_max = (sz - idx) / step + 1;
size_t k = 0
for(auto it = std::advance(C.begin(), idx); k < k_max && (std::advance(it, step), true); ++k) {
/* do something with *it */
One option is to adapt the iterator so that it is safe to advance past the end. Then you can use stock std::next(), std::advance(), pass it to functions expecting an iterator, and so on. Then the strided iteration can look almost exactly like you want:
template<class Container, class Size>
void iterate(const Container& c, Size idx, Size step)
if (unlikely(idx < 0 || step <= 0))
bounded_iterator it{begin(c), c};
for (std::advance(it, idx); it != end(c); std::advance(it, step))
This is not dissimilar from the secure_next() suggestion. It is a little more flexible, but also more work. The range-v3 solution looks even nicer but may or may not be an option for you.
Boost.Iterator has facilities for adapting iterators like this, and it's also straightforward to do it directly. This is how an incomplete sketch might look for iterators not supporting random access:
template<class Iterator, class Sentinel, class Size>
class bounded_iterator
using difference_type = typename std::iterator_traits<Iterator>::difference_type;
using value_type = typename std::iterator_traits<Iterator>::value_type;
using pointer = typename std::iterator_traits<Iterator>::pointer;
using reference = typename std::iterator_traits<Iterator>::reference;
using iterator_category = typename std::iterator_traits<Iterator>::iterator_category;
template<class Container>
constexpr explicit bounded_iterator(Iterator begin, const Container& c)
: begin_{begin}, end_{end(c)}
constexpr auto& operator++()
if (begin_ != end_)
return *this;
constexpr reference operator*() const
return *begin_;
friend constexpr bool operator!=(const bounded_iterator& i, Sentinel s)
return i.begin_ != s;
// and the rest...
Iterator begin_;
Sentinel end_;
template<class Iterator, class Container>
bounded_iterator(Iterator, const Container&) -> bounded_iterator<Iterator, decltype(end(std::declval<const Container&>())), typename size_type<Container>::type>;
And for random access iterators:
template<RandomAccessIterator Iterator, class Sentinel, class Size>
class bounded_iterator<Iterator, Sentinel, Size>
using difference_type = typename std::iterator_traits<Iterator>::difference_type;
using value_type = typename std::iterator_traits<Iterator>::value_type;
using pointer = typename std::iterator_traits<Iterator>::pointer;
using reference = typename std::iterator_traits<Iterator>::reference;
using iterator_category = typename std::iterator_traits<Iterator>::iterator_category;
template<class Container>
constexpr explicit bounded_iterator(Iterator begin, const Container& c)
: begin_{begin}, size_{std::size(c)}, index_{0}
constexpr auto& operator+=(difference_type n)
index_ += n;
return *this;
constexpr reference operator*() const
return begin_[index_];
friend constexpr bool operator!=(const bounded_iterator& i, Sentinel)
return i.index_ < i.size_;
// and the rest...
const Iterator begin_;
const Size size_;
Size index_;
As an aside, it seems GCC produces slightly better code with this form than with my attempts at something like secure_next(). Can its optimizer reason better about indices than pointer arithmetic?
This example is shared also via gist and godbolt.
I have a std::vector<int> with contiguous shuffled values from 0 to N and want to swap, as efficiently as possible, each value with its position in the vector.
v[6] = 3;
v[3] = 6;
This is a simple problem, but I do not know how to handle it in order to make it trivial and, above all, very fast. Thank you very much for your suggestions.
Given N at compile time and given the array contains each index in [0,N) exactly once,
it's relatively straight forward (as long as it doesn't have to be in-place, as mentioned in the comments above) :
Construct a new array so that v'[n] = find_index(v, n) and assign it to the old one.
Here I used variadic templates with std::index_sequence to roll it into a single assignment:
template<typename T, std::size_t N>
std::size_t find_index(const std::array<T,N>& arr, std::size_t index) {
return static_cast<std::size_t>(std::distance(arr.begin(), std::find(arr.begin(), arr.end(), index)));
template<typename T, std::size_t N, std::size_t... Index>
void swap_index_value(std::array<T,N>& arr, std::index_sequence<Index...> seq){
arr = { find_index(arr, Index)... };
template<typename Integer, std::size_t N>
void swap_index_value(std::array<Integer,N>& arr) {
swap_index_value(arr, std::make_index_sequence<N>{});
The complexity of this is does not look great though. Calling find_index(arr, n) for each n in [0,N)
will take N * (N+1) / 2 comparisons total (std::sort would only take N * log(N)).
However, since we know each index is present in the array, we could just fill out an array of indices
as we walk over the original array, and assuming T is an integral type we can skip some std::size_t <-> T conversions, too:
template<typename T, std::size_t N>
void swap_index_value(std::array<T,N>& arr){
std::array<T, N> indices;
for (T i = 0; i < N; ++i)
indices[arr[i]] = i;
arr = indices;
We're still using twice the space and doing some randomly ordered writes to our array,
but essentially we're down to 2*N assignments, and the code is simpler than before.
Alternatively, we could also std::sort if we keep a copy to do lookups in:
template<typename T, std::size_t N>
void swap_index_value(std::array<T,N>& arr){
std::sort(arr.begin(), arr.end(), [copy = arr](const T& lhs, const T& rhs) {
return copy[lhs] < copy[rhs];
First version here,
second version here,
std::sort version here
Benchmarking which one is faster is left as an exercise to the reader ;)