Dijkstra's algorithm for matrices - c++

I've been trying to implement Dijkstra's algorithm in C++11 to work on matrices of arbitrary size. Specifically, I am interested in solving question 83 on Project Euler.
I appear to always run in to a situation where every node neighboring the current node has already been visited, which, if I understand the algorithm correctly, should never happen.
I've tried poking around in a debugger, and I've re-read the code several times, but I have no idea where I am going wrong.
Here is what I have done so far:
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <vector>
#include <set>
#include <tuple>
#include <cstdint>
#include <cinttypes>
typedef std::tuple<size_t, size_t> Index;
std::ostream& operator<<(std::ostream& os, Index i)
os << "(" << std::get<0>(i) << ", " << std::get<1>(i) << ")";
return os;
template<typename T>
class Matrix
Matrix(size_t i, size_t j):
xs(i * j)
Matrix(size_t n, size_t m, const std::string& path):
xs(n * m)
std::ifstream mat_in {path};
char c;
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < m - 1; ++j) {
mat_in >> (*this)(i,j);
mat_in >> c;
mat_in >> (*this)(i,m - 1);
T& operator()(size_t i, size_t j)
return xs[n * i + j];
T& operator()(Index i)
return xs[n * std::get<0>(i) + std::get<1>(i)];
T operator()(Index i) const
return xs[n * std::get<0>(i) + std::get<1>(i)];
std::vector<Index> surrounding(Index ind) const
size_t i = std::get<0>(ind);
size_t j = std::get<1>(ind);
std::vector<Index> is;
if (i > 0)
is.push_back(Index(i - 1, j));
if (i < n - 1)
is.push_back(Index(i + 1, j));
if (j > 0)
is.push_back(Index(i, j - 1));
if (j < m - 1)
is.push_back(Index(i, j + 1));
return is;
size_t rows() const { return n; }
size_t cols() const { return m; }
size_t n;
size_t m;
std::vector<T> xs;
/* Finds the minimum sum of the weights of the nodes along a path from 1,1 to n,m using Dijkstra's algorithm modified for matrices */
int64_t shortest_path(const Matrix<int>& m)
Index origin(0,0);
Index current { m.rows() - 1, m.cols() - 1 };
Matrix<int64_t> nodes(m.rows(), m.cols());
std::set<Index> in_path;
for (size_t i = 0; i < m.rows(); ++i)
for (size_t j = 0; j < m.cols(); ++j)
nodes(i,j) = INTMAX_MAX;
nodes(current) = m(current);
while (1) {
auto is = m.surrounding(current);
Index next = origin;
for (auto i : is) {
if (in_path.find(i) == in_path.end()) {
nodes(i) = std::min(nodes(i), nodes(current) + m(i));
if (nodes(i) < nodes(next))
next = i;
current = next;
if (current == origin)
return nodes(current);
int64_t at(const Matrix<int64_t>& m, const Index& i) { return m(i); }
int at(const Matrix<int>& m, const Index& i) { return m(i); }
int main()
Matrix<int> m(80,80,"mat.txt");
printf("%" PRIi64 "\n", shortest_path(m));
return 0;

You do not understand the algorithm correctly. There is nothing stopping you from running into dead ends. As long as there are other options you have not yet explored, just mark it as a dead end and move on.
BTW I agree with commentators who say that you are overcomplicating the solution. It suffices to create a matrix of "cost to get to here" and have a queue of points to explore paths from. Initialize the total cost matrix to a value for NOT_VISITED, -1 would work. For each point, you look at the neighbors. If the neighbor either has not been visited, or you just found a cheaper path to it, then adjust the cost matrix and add the point to the queue.
Keep going until the queue is empty. And then you have guaranteed lowest costs everywhere.
A* is a lot more efficient than this naive approach, but what I just described is more than efficient enough to solve the problem.


SIMD matrix multiplication causing segfault or segabrt

I inspired myself from this link to code a multiplicator of matrix which are multiple of 4: SSE matrix-matrix multiplication
I came up with something somewhat similar, but I observed that if the for loop with j increase by 4 like in the suggest code, it only fill 1 column each 4 column ( which make sense). I can decrease the for loop by 2, and the result is that only half of the column are filled.
So logically, the solution should be to only increase the loop by 1, but when I make the change in the code, I get either segfault error if I use_mm_store_ps or data corrupted size vs. prev_size if I use _mm_storeu_ps, which makes me believe that the data is simply not align.
What and how should I align the data to not cause such error and fill the resulting matrix?
Here is the code I have so far:
void mat_mult(Matrix A, Matrix B, Matrix C, n) {
for(int i = 0; i < n; ++i) {
for(int j = 0; j < n; j+=1) {
__m128 vR = _mm_setzero_ps();
for(int k = 0; k < n; k++) {
__m128 vA = _mm_set1_ps(A(i,k));
__m128 vB = _mm_loadu_ps(&B(k,j));
vR = _mm_add_ss(vR,vA*vB);
_mm_storeu_ps(&C(i,j), vR);
I corrected your code, also implemented quite a lot of other supplementary code to fully run tests and print outputs, including that I needed to implement Matrix class from scratch. Following code can be compiled in C++11 standard.
Main corrections to your function are: you should handle separately a case when number of B columns is not multiple of 4, this uneven tail case should be handled by separate loop, you should actually run j loop in steps of 4 (as 128-bit SSE float-32 register contains 4 floats), you should use _mm_mul_ps(vA, vB) instead of vA * vB.
Main bug of your code is that instead of yours _mm_add_ss() you should use _mm_add_ps() because you need to add not single value but 4 of them separately. Only due to usage of _mm_add_ss() you were observed that only 1 out of 4 columns was filled (the rest 3 were zeros).
Alternatively you can fix work of your code by using _mm_load_ss() instead of _mm_loadu_ps() and _mm_store_ss() instead _mm_storeu_ps(). After only this fix your code will give correct result, but will be slow, it will be not faster than regular non-SSE solution. To actually gain speed you have to use only ..._ps() instructions everywhere, also handle correctly case of non-multiple of 4.
Because you don't handle case of B columns being non-multiple of 4, because of this your program segfaults, you just store memory out of bounds of matrix C.
Also you asked a question about alignment. Don't ever use aligned store/load like _mm_store_ps()/_mm_load_ps(), always use _mm_storeu_ps()/_mm_loadu_ps(). Because unaligned access instructions are guaranteed to be of same speed as aligned access instructions for same memory pointers values. But aligned instructions may segfault. So unaligned is always better, same speed and never segfault. It used to be in old time on old CPUs that aligned instructions where faster, but right now they are implemented in CPU with exactly same speed. Aligned instructions don't give any profit, only segfaults. But still you may want to use aligned instructions to intentionally segfault if you want to make sure that your program's memory pointers are always aligned.
I implemented also a separate function with reference slow multiplication of matrices, in order to run a reference test to check the correctness of fast (SSE) multiplication.
As commented out by #АлексейНеудачин, my previous version of Matrix class was allocating unaligned memory for array, now I implemented new helper class AlignmentAllocator which ensures that Matrix is allocating aligned memory, this allocator is used by std::vector<> that stores underlying Matrix's data.
Full code with all the corrections, tests and console outputs plus all the extra supplementary code is below. See also console output after the code, I do print two matrices produced by two different multiplication functions, so that two matrices can be compared visually. All test cases are generated randomly. Scroll down my code a bit to see your fixed function mat_mult(). Also click on Try it online! link if you want to see/run my code online.
Try it online!
#include <cmath>
#include <iostream>
#include <vector>
#include <random>
#include <stdexcept>
#include <string>
#include <iomanip>
#include <cstdlib>
#include <malloc.h>
#include <immintrin.h>
using FloatT = float;
template <typename T, std::size_t N>
class AlignmentAllocator {
typedef T value_type;
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef T * pointer;
typedef const T * const_pointer;
typedef T & reference;
typedef const T & const_reference;
inline AlignmentAllocator() throw() {}
template <typename T2> inline AlignmentAllocator(const AlignmentAllocator<T2, N> &) throw() {}
inline ~AlignmentAllocator() throw() {}
inline pointer adress(reference r) { return &r; }
inline const_pointer adress(const_reference r) const { return &r; }
inline pointer allocate(size_type n);
inline void deallocate(pointer p, size_type);
inline void construct(pointer p, const value_type & v) { new (p) value_type(v); }
inline void destroy(pointer p) { p->~value_type(); }
inline size_type max_size() const throw() { return size_type(-1) / sizeof(value_type); }
template <typename T2> struct rebind { typedef AlignmentAllocator<T2, N> other; };
bool operator!=(const AlignmentAllocator<T, N> & other) const { return !(*this == other); }
bool operator==(const AlignmentAllocator<T, N> & other) const { return true; }
template <typename T, std::size_t N>
inline typename AlignmentAllocator<T, N>::pointer AlignmentAllocator<T, N>::allocate(size_type n) {
#ifdef _MSC_VER
auto p = (pointer)_aligned_malloc(n * sizeof(value_type), N);
auto p = (pointer)aligned_alloc(N, n * sizeof(value_type));
if (!p)
throw std::bad_alloc();
return p;
template <typename T, std::size_t N>
inline void AlignmentAllocator<T, N>::deallocate(pointer p, size_type) {
#ifdef _MSC_VER
static size_t constexpr MatrixAlign = 64;
template <typename T, size_t Align = MatrixAlign>
using AlignedVector = std::vector<T, AlignmentAllocator<T, Align>>;
class Matrix {
Matrix(size_t rows, size_t cols)
: rows_(rows), cols_(cols) {
cols_aligned_ = (sizeof(FloatT) * cols_ + MatrixAlign - 1)
/ MatrixAlign * MatrixAlign / sizeof(FloatT);
if (size_t(m_.data()) % 64 != 0 ||
(cols_aligned_ * sizeof(FloatT)) % 64 != 0)
throw std::runtime_error("Matrix was allocated unaligned!");
Matrix & Clear() {
m_.resize(rows_ * cols_aligned_);
return *this;
FloatT & operator() (size_t i, size_t j) {
if (i >= rows_ || j >= cols_)
throw std::runtime_error("Matrix index (" +
std::to_string(i) + ", " + std::to_string(j) + ") out of bounds (" +
std::to_string(rows_) + ", " + std::to_string(cols_) + ")!");
return m_[i * cols_aligned_ + j];
FloatT const & operator() (size_t i, size_t j) const {
return const_cast<Matrix &>(*this)(i, j);
size_t Rows() const { return rows_; }
size_t Cols() const { return cols_; }
bool Equal(Matrix const & b, int round = 7) const {
if (Rows() != b.Rows() || Cols() != b.Cols())
return false;
FloatT const eps = std::pow(FloatT(10), -round);
for (size_t i = 0; i < Rows(); ++i)
for (size_t j = 0; j < Cols(); ++j)
if (std::fabs((*this)(i, j) - b(i, j)) > eps)
return false;
return true;
size_t rows_ = 0, cols_ = 0, cols_aligned_ = 0;
AlignedVector<FloatT> m_;
void mat_print(Matrix const & A, int round = 7, size_t width = 0) {
FloatT const pow10 = std::pow(FloatT(10), round);
for (size_t i = 0; i < A.Rows(); ++i) {
for (size_t j = 0; j < A.Cols(); ++j)
std::cout << std::setprecision(round) << std::fixed << std::setw(width)
<< std::right << (std::round(A(i, j) * pow10) / pow10) << " ";
std::cout << std::endl;;
void mat_mult(Matrix const & A, Matrix const & B, Matrix & C) {
if (A.Cols() != B.Rows())
throw std::runtime_error("Number of A.Cols and B.Rows don't match!");
if (A.Rows() != C.Rows() || B.Cols() != C.Cols())
throw std::runtime_error("Wrong C rows, cols!");
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < B.Cols() - B.Cols() % 4; j += 4) {
auto sum = _mm_setzero_ps();
for (size_t k = 0; k < A.Cols(); ++k)
sum = _mm_add_ps(
_mm_set1_ps(A(i, k)),
_mm_loadu_ps(&B(k, j))
_mm_storeu_ps(&C(i, j), sum);
if (B.Cols() % 4 == 0)
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = B.Cols() - B.Cols() % 4; j < B.Cols(); ++j) {
FloatT sum = 0;
for (size_t k = 0; k < A.Cols(); ++k)
sum += A(i, k) * B(k, j);
C(i, j) = sum;
void mat_mult_slow(Matrix const & A, Matrix const & B, Matrix & C) {
if (A.Cols() != B.Rows())
throw std::runtime_error("Number of A.Cols and B.Rows don't match!");
if (A.Rows() != C.Rows() || B.Cols() != C.Cols())
throw std::runtime_error("Wrong C rows, cols!");
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < B.Cols(); ++j) {
FloatT sum = 0;
for (size_t k = 0; k < A.Cols(); ++k)
sum += A(i, k) * B(k, j);
C(i, j) = sum;
void mat_fill_random(Matrix & A) {
std::mt19937_64 rng{std::random_device{}()};
std::uniform_real_distribution<FloatT> distr(-9.99, 9.99);
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < A.Cols(); ++j)
A(i, j) = distr(rng);
int main() {
try {
Matrix a(17, 23), b(23, 19), c(17, 19), d(c.Rows(), c.Cols());
mat_mult_slow(a, b, c);
mat_mult(a, b, d);
if (!c.Equal(d, 5))
throw std::runtime_error("Test failed, c != d.");
Matrix a(3, 7), b(7, 5), c(3, 5), d(c.Rows(), c.Cols());
mat_mult_slow(a, b, c);
mat_mult(a, b, d);
mat_print(c, 3, 8);
std::cout << std::endl;
mat_print(d, 3, 8);
return 0;
} catch (std::exception const & ex) {
std::cout << "Exception: " << ex.what() << std::endl;
return -1;
-37.177 -114.438 36.094 -49.689 -139.857
22.113 -127.210 -94.434 -14.363 -6.336
71.878 94.234 33.372 32.573 73.310
-37.177 -114.438 36.094 -49.689 -139.857
22.113 -127.210 -94.434 -14.363 -6.336
71.878 94.234 33.372 32.573 73.310

Matrix multiplication (with different dimensions)

For Math class in school I need to create an application that does something (just anything) with matrices. I decided to create a matrix calculator. I have a Matrix class which contains a 2D array, an row integer and a column integer. I created the following function to multiply two matrices:
public: Matrix* multiply(Matrix* other)
Matrix* temp = new Matrix(other->r, other->c);
for(int i = 0; i < this->r; i++)
for(int j = 0; j < this->c; j++)
for(int k = 0; k < other->c; k++)
temp->mat[i][j] += this->mat[i][k] * other->mat[k][j];
return temp;
This works perfectly, but only if I multiply matrices with the same dimensions (e.g. Mat4x4*Mat4x4 or Mat2x4*Mat2x4). I understand I can't just multiply an Mat4x4 with an Mat9X2 or anything, but I do know the second matrix's columns should be equal to the first matrix's rows (so a Mat2x2 should be able to multiply with a Mat2x1) and that the answer will have the dimensions of the second matrix. How could (or should) I make the function so it will multiply the matrices with the same and with different dimensions?
Thanks in advance
A solution for your program would be to make the temp dimensions not the others dimension but this->r, other->c in order to make the dimensions valid with the outputs from the matrix multiplication.
Hope this helps.
The following code contains a Matrix class implementation meant to show a few features of C++ (like unique pointers, random numbers, and stream formatting). I often use it when I want to explain a little bit about the language. Maybe it can help you.
#include <cassert>
#include <iostream>
#include <iomanip>
#include <memory>
#include <random>
// Pedagogical implementation of matrix type.
class Matrix {
// Create a rows-by-cols matrix filled with random numbers in (-1, 1).
static Matrix Random(std::size_t rows, std::size_t cols) {
Matrix m(rows, cols);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<double> dis(-1, 1);
for (std::size_t row = 0; row < rows; ++row) {
for (std::size_t col = 0; col < cols; ++col) {
m(row, col) = dis(gen);
return m;
// Build an uninitialized rows-by-cols matrix.
Matrix(std::size_t rows, std::size_t cols)
: m_data { std::make_unique<double[]>(rows * cols) },
m_rows { rows },
m_cols { cols }
assert(m_rows > 0);
assert(m_cols > 0);
// Return number of rows
std::size_t rows() const { return m_rows; }
// Return number of columns
std::size_t cols() const { return m_cols; }
// Value at (row, col)
double operator()(std::size_t row, std::size_t col) const {
assert(row < rows());
assert(col < cols());
return m_data[row * cols() + col];
// Reference to value at (row, col)
double& operator()(std::size_t row, std::size_t col) {
assert(row < rows());
assert(col < cols());
return m_data[row * cols() + col];
// Matrix multiply
Matrix operator*(const Matrix& other) const {
assert(cols() == other.rows());
Matrix out(rows(), other.cols());
for (std::size_t i = 0; i < rows(); ++i) {
for (std::size_t j = 0; j < other.cols(); ++j) {
double sum { 0 };
for (std::size_t k = 0; k < cols(); ++k) {
sum += (*this)(i, k) * other(k, j);
out(i, j) = sum;
return out;
std::unique_ptr<double[]> m_data; // will cleanup after itself
const std::size_t m_rows;
const std::size_t m_cols;
// Pretty-print a matrix
std::ostream& operator<<(std::ostream& os, const Matrix& m) {
os << std::scientific << std::setprecision(16);
for (std::size_t row = 0; row < m.rows(); ++row) {
for (std::size_t col = 0; col < m.cols(); ++col) {
os << std::setw(23) << m(row, col) << " ";
os << "\n";
return os;
int main() {
Matrix A = Matrix::Random(3, 4);
Matrix B = Matrix::Random(4, 2);
std::cout << "A\n" << A
<< "B\n" << B
<< "A * B\n" << (A * B);
Possible output:
$ clang++ matmul.cpp -std=c++17 -Ofast -march=native -Wall -Wextra
$ ./a.out
1.0367049464391398e-01 7.4917987082978588e-03 -2.7966084757805687e-01 -7.2325095373639048e-01
2.2478938813996119e-01 8.4194832286446353e-01 5.3602376615184033e-01 7.1132727553003439e-01
1.9608747339865196e-01 -6.4829263198209253e-01 -2.7477471919710350e-01 1.2721104074473044e-01
-8.5938605801284385e-01 -6.2981285198013204e-01
-6.0333085647033191e-01 -6.8234173530317577e-01
-1.2614486249714407e-01 -3.3875904433100934e-01
-6.9618174970366520e-01 6.6785401241316045e-01
A * B
4.4517888255515814e-01 -4.5869338680118737e-01
-1.2639839804611623e+00 -4.2259184895688506e-01
1.6871952235091500e-01 4.9689953389829533e-01
It turnes out the order of the rows and columns got me heckin' bamboozled. The formula was correct. Sorry for unnecessary post.

Can't understand why my program throws error

My code is in
#include <iostream>
#include <string>
#include <algorithm>
#include <climits>
#include <vector>
#include <cmath>
using namespace std;
struct State {
int v;
const State *rest;
void dump() const {
if(rest) {
cout << ' ' << v;
} else {
cout << endl;
State() : v(0), rest(0) {}
State(int _v, const State &_rest) : v(_v), rest(&_rest) {}
void ss(int *ip, int *end, int target, const State &state) {
if(target < 0) return; // assuming we don't allow any negatives
if(ip==end && target==0) {
{ // without the first one
ss(ip+1, end, target, state);
{ // with the first one
int first = *ip;
ss(ip+1, end, target-first, State(first, state));
vector<int> get_primes(int N) {
int size = floor(0.5 * (N - 3)) + 1;
vector<int> primes;
vector<bool> is_prime(size, true);
for(long i = 0; i < size; ++i) {
if(is_prime[i]) {
int p = (i << 1) + 3;
// sieving from p^2, whose index is 2i^2 + 6i + 3
for (long j = ((i * i) << 1) + 6 * i + 3; j < size; j += p) {
is_prime[j] = false;
int main() {
int N;
cin >> N;
vector<int> primes = get_primes(N);
int a[primes.size()];
for (int i = 0; i < primes.size(); ++i) {
a[i] = primes[i];
int * start = &a[0];
int * end = start + sizeof(a) / sizeof(a[0]);
ss(start, end, N, State());
It takes one input N (int), and gets the vector of all prime numbers smaller than N.
Then, it finds the number of unique sets from the vector that adds up to N.
The get_primes(N) works, but the other one doesn't.
I borrowed the other code from
How to find all matching numbers, that sums to 'N' in a given array
Please help me.. I just want the number of unique sets.
You've forgotten to return primes; at the end of your get_primes() function.
I'm guessing the problem is:
vector<int> get_primes(int N) {
// ...
return primes; // missing this line
As-is, you're just writing some junk here:
vector<int> primes = get_primes(N);
it's undefined behavior - which in this case manifests itself as crashing.

Variadic nested loops

I am working on a N dimensional grid.
I would like to generate nested loops depending on any dimension (2D, 3D, 4D, etc...).
How can I do that in an elegant and fast way ? Below a simple illustration of my problem.
I am writing in C++ but I think this kind of question can be useful for other languages.
I need to know the indices (i,j,k...) in my do stuff part.
Edit : lower_bound and upper_bound represents the indexes in the grid so they are always positive.
#include <vector>
int main()
// Dimension here is 3D
std::vector<size_t> lower_bound({4,2,1});
std::vector<size_t> upper_bound({16,47,9});
for (size_t i = lower_bound[0]; i < upper_bound[0]; i ++)
for (size_t j = lower_bound[1]; j < upper_bound[1]; j ++)
for (size_t k = lower_bound[2]; k < upper_bound[2]; k ++)
// for (size_t l = lower_bound[3]; l < upper_bound[3]; l ++)
// ...
// Do stuff such as
grid({i,j,k}) = 2 * i + 3 *j - 4 * k;
// where grid size is the total number of vertices
Following may help:
bool increment(
std::vector<int>& v,
const std::vector<int>& lower,
const std::vector<int>& upper)
assert(v.size() == lower.size());
assert(v.size() == upper.size());
for (auto i = v.size(); i-- != 0; ) {
if (v[i] != upper[i]) {
return true;
v[i] = lower[i];
return false;
And use it that way:
int main() {
const std::vector<int> lower_bound({4,2,1});
const std::vector<int> upper_bound({6,7,4});
std::vector<int> current = lower_bound;
do {
std::copy(current.begin(), current.end(), std::ostream_iterator<int>(std::cout, " "));
std::cout << std::endl;
} while (increment(current, lower_bound, upper_bound));
Live demo
An iterative approach could look like this:
#include <iostream>
#include <vector>
int main()
std::vector<int> lower_bound({-4, -5, -6});
std::vector<int> upper_bound({ 6, 7, 4});
auto increase_counters = [&](std::vector<int> &c) {
for(std::size_t i = 0; i < c.size(); ++i) {
// This bit could be made to look prettier if the indices are counted the
// other way around. Not that it really matters.
int &ctr = c .rbegin()[i];
int top = upper_bound.rbegin()[i];
int bottom = lower_bound.rbegin()[i];
// count up the innermost counter
if(ctr + 1 < top) {
// if it flows over the upper bound, wrap around and continue with
// the next.
ctr = bottom;
// end condition. If we end up here, loop's over.
c = upper_bound;
for(std::vector<int> counters = lower_bound; counters != upper_bound; increase_counters(counters)) {
for(int i : counters) {
std::cout << i << ", ";
std::cout << "\n";
...although whether this or a recursive approach is more elegant rather depends on the use case.
#include <iostream>
#include <vector>
template <typename Func>
void process(const std::vector<int>& lower, const std::vector<int>& upper, Func f)
std::vector<int> temp;
process(lower, upper, f, 0, temp);
template <typename Func>
void process(const std::vector<int>& lower, const std::vector<int>& upper, Func f,
int index, std::vector<int>& current)
if (index == lower.size())
for (int i = lower[index]; i < upper[index]; ++i)
process(lower, upper, f, index + 1, current);
int main()
// Dimension here is 3D
std::vector<int> lower_bound({-4, -5, 6});
std::vector<int> upper_bound({6, 7, 4});
// Replace the lambda below with whatever code you want to process
// the resulting permutations.
process(lower_bound, upper_bound, [](const std::vector<int>& values)
for (std::vector<int>::const_iterator it = values.begin(); it != values.end(); ++it)
std::cout << *it << " ";
std::cout << std::endl;
Probably some typos an whatnot, but I'd flatten the whole range.
This is based on the idea that the range can be described as
x_0 + d_0*(x_1+d_1*(x_2+d_2....)
So we can roll our own that way
std::vector<int> lower_bound{-4,-5,6};
std::vector<int> upper_bound{6,7,4};
std::vector<int> ranges;
for (size_t i = 0; i < lower_bound.size(); i++) {
for (int idx = 0; idx < numel; idx++) {
//if you don't need the actual indicies, you're done
//extract indexes
int idx2 = idx;
std::vector<int> indexes;
for (int i = 0; i < ranges.size(); i++) {
idx2 = idx2/ranges[i];
//do stuff
grid[idx] = 2 * indexes[0] + 3 *indexes[1] - 4 * indexes[2];
Edit: to be more generic:
template <typename D>
void multi_for(const std::vector<int>& lower_bound, const std::vector<int> upper_bound, D d) {
std::vector<int> ranges;
for (size_t i = 0; i < lower_bound.size(); i++) {
size_t numel = std::accumulate(ranges.begin(), ranges.end(), std::multiplies<int,int>{});
for (int idx = 0; idx < numel; idx++) {
//if you don't need the actual indicies, you're done
//extract indexes
int idx2 = idx;
std::vector<int> indexes;
for (int i = 0; i < ranges.size(); i++) {
idx2 = idx2/ranges[i];
//do stuff
size_t* grid;//initialize to whateer
std::vector<int> lower_bound{-4,-5,6};
std::vector<int> upper_bound{6,7,4};
auto do_stuff = [grid](size_t idx, const std::vector<int> indexes) {
grid[idx] = 2 * indexes[0] + 3 *indexes[1] - 4 * indexes[2];
A recursive function may help you achieve what you want.
void Recursive( int comp )
if(comp == dimension)
// Do stuff
for (int e = lower_bound[comp]; e < upper_bound[comp]; e++)
Some additions may be necessary in the function signature if you need to know the current indices (i,j,k,...) in your "Do Stuff" section.
This is a clean way to have access to these indices
void Recursive( int comp, int dimension )
static std::vector<int> indices;
if( comp == 0 ) // initialize indices
indices.resize(dimension, 0);
if(comp == dimension -1)
// Do stuff
int& e = indices[comp];
for (e = lower_bound[comp]; e < upper_bound[comp]; e++)
This is however not usable along multiple threads, due to the shared static vector.

Setting pointer to arbitrary dimension array?

When I want to initiate a multidimensional array, I usually just use pointers. For example, for two dimensions I use:
double **array
and for three I use:
double ***array
However, I'd like to set a multidimensional array based on a command line argument indicating the dimension. Is there are way to set an array of arbitrary size once you have a variable with the number of dimensions you'd like?
Even though this whole question is an indication of a design flaw, you can (sort of) accomplish this:
template<typename T>
class MultiArray
MultiArray(std::size_t dimen, std::size_t dimen_size) : _dimensions(dimen)
_data = new T[dimen * dimen_size];
// implment copy constructor, copy-assignment operator, destructor, and move constructors as well
T* operator[](int i)
assert(0 <= i && i < _dimensions); // bounds check for your dimension
return &_data[i];
T* _data;
std::size_t _dimensions;
int main()
MultiArray<int> a(5, 2);
a[4][1] = 3;
std::cout << a[4][1] << std::endl;
return 0;
If you want it jagged, you would have to do more math and maintenance regarding the bounds for each "dimension".
The problem you run into has making the dimensions mean something for your application. Typically, a multi-dimensional array represents something (e.g. a 2D vector can represent Cartesian space, a 3D or 4D vector can be used for manipulating data for 3D graphics). Beyond the 4th dimension, finding a valid meaning for the array becomes murky and maintaining the logic behind it becomes increasingly complex with each new dimension.
you may be interested in the following code which allow you to use any "dynamic" dimension:
#include <cassert>
#include <cstddef>
#include <vector>
template<typename T>
class MultiArray
explicit MultiArray(const std::vector<size_t>& dimensions) :
const T& get(const std::vector<size_t>& indexes) const
return values[computeIndex(indexes)];
T& get(const std::vector<size_t>& indexes)
return values[computeIndex(indexes)];
size_t computeIndex(const std::vector<size_t>& indexes) const
assert(indexes.size() == dimensions.size());
size_t index = 0;
size_t mul = 1;
for (size_t i = 0; i != dimensions.size(); ++i) {
assert(indexes[i] < dimensions[i]);
index += indexes[i] * mul;
mul *= dimensions[i];
assert(index < values.size());
return index;
std::vector<size_t> computeIndexes(size_t index) const
assert(index < values.size());
std::vector<size_t> res(dimensions.size());
size_t mul = values.size();
for (size_t i = dimensions.size(); i != 0; --i) {
mul /= dimensions[i - 1];
res[i - 1] = index / mul;
assert(res[i - 1] < dimensions[i - 1]);
index -= res[i - 1] * mul;
return res;
size_t computeTotalSize(const std::vector<size_t>& dimensions) const
size_t totalSize = 1;
for (auto i : dimensions) {
totalSize *= i;
return totalSize;
std::vector<size_t> dimensions;
std::vector<T> values;
int main()
MultiArray<int> m({3, 2, 4});
m.get({0, 0, 3}) = 42;
m.get({2, 1, 3}) = 42;
for (size_t i = 0; i != 24; ++i) {
assert(m.computeIndex(m.computeIndexes(i)) == i);
return 0;