I inspired myself from this link to code a multiplicator of matrix which are multiple of 4: SSE matrix-matrix multiplication
I came up with something somewhat similar, but I observed that if the for loop with j increase by 4 like in the suggest code, it only fill 1 column each 4 column ( which make sense). I can decrease the for loop by 2, and the result is that only half of the column are filled.
So logically, the solution should be to only increase the loop by 1, but when I make the change in the code, I get either segfault error if I use_mm_store_ps or data corrupted size vs. prev_size if I use _mm_storeu_ps, which makes me believe that the data is simply not align.
What and how should I align the data to not cause such error and fill the resulting matrix?
Here is the code I have so far:
void mat_mult(Matrix A, Matrix B, Matrix C, n) {
for(int i = 0; i < n; ++i) {
for(int j = 0; j < n; j+=1) {
__m128 vR = _mm_setzero_ps();
for(int k = 0; k < n; k++) {
__m128 vA = _mm_set1_ps(A(i,k));
__m128 vB = _mm_loadu_ps(&B(k,j));
vR = _mm_add_ss(vR,vA*vB);
}
_mm_storeu_ps(&C(i,j), vR);
}
}
}
I corrected your code, also implemented quite a lot of other supplementary code to fully run tests and print outputs, including that I needed to implement Matrix class from scratch. Following code can be compiled in C++11 standard.
Main corrections to your function are: you should handle separately a case when number of B columns is not multiple of 4, this uneven tail case should be handled by separate loop, you should actually run j loop in steps of 4 (as 128-bit SSE float-32 register contains 4 floats), you should use _mm_mul_ps(vA, vB) instead of vA * vB.
Main bug of your code is that instead of yours _mm_add_ss() you should use _mm_add_ps() because you need to add not single value but 4 of them separately. Only due to usage of _mm_add_ss() you were observed that only 1 out of 4 columns was filled (the rest 3 were zeros).
Alternatively you can fix work of your code by using _mm_load_ss() instead of _mm_loadu_ps() and _mm_store_ss() instead _mm_storeu_ps(). After only this fix your code will give correct result, but will be slow, it will be not faster than regular non-SSE solution. To actually gain speed you have to use only ..._ps() instructions everywhere, also handle correctly case of non-multiple of 4.
Because you don't handle case of B columns being non-multiple of 4, because of this your program segfaults, you just store memory out of bounds of matrix C.
Also you asked a question about alignment. Don't ever use aligned store/load like _mm_store_ps()/_mm_load_ps(), always use _mm_storeu_ps()/_mm_loadu_ps(). Because unaligned access instructions are guaranteed to be of same speed as aligned access instructions for same memory pointers values. But aligned instructions may segfault. So unaligned is always better, same speed and never segfault. It used to be in old time on old CPUs that aligned instructions where faster, but right now they are implemented in CPU with exactly same speed. Aligned instructions don't give any profit, only segfaults. But still you may want to use aligned instructions to intentionally segfault if you want to make sure that your program's memory pointers are always aligned.
I implemented also a separate function with reference slow multiplication of matrices, in order to run a reference test to check the correctness of fast (SSE) multiplication.
As commented out by #АлексейНеудачин, my previous version of Matrix class was allocating unaligned memory for array, now I implemented new helper class AlignmentAllocator which ensures that Matrix is allocating aligned memory, this allocator is used by std::vector<> that stores underlying Matrix's data.
Full code with all the corrections, tests and console outputs plus all the extra supplementary code is below. See also console output after the code, I do print two matrices produced by two different multiplication functions, so that two matrices can be compared visually. All test cases are generated randomly. Scroll down my code a bit to see your fixed function mat_mult(). Also click on Try it online! link if you want to see/run my code online.
Try it online!
#include <cmath>
#include <iostream>
#include <vector>
#include <random>
#include <stdexcept>
#include <string>
#include <iomanip>
#include <cstdlib>
#include <malloc.h>
#include <immintrin.h>
using FloatT = float;
template <typename T, std::size_t N>
class AlignmentAllocator {
public:
typedef T value_type;
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
typedef T * pointer;
typedef const T * const_pointer;
typedef T & reference;
typedef const T & const_reference;
public:
inline AlignmentAllocator() throw() {}
template <typename T2> inline AlignmentAllocator(const AlignmentAllocator<T2, N> &) throw() {}
inline ~AlignmentAllocator() throw() {}
inline pointer adress(reference r) { return &r; }
inline const_pointer adress(const_reference r) const { return &r; }
inline pointer allocate(size_type n);
inline void deallocate(pointer p, size_type);
inline void construct(pointer p, const value_type & v) { new (p) value_type(v); }
inline void destroy(pointer p) { p->~value_type(); }
inline size_type max_size() const throw() { return size_type(-1) / sizeof(value_type); }
template <typename T2> struct rebind { typedef AlignmentAllocator<T2, N> other; };
bool operator!=(const AlignmentAllocator<T, N> & other) const { return !(*this == other); }
bool operator==(const AlignmentAllocator<T, N> & other) const { return true; }
};
template <typename T, std::size_t N>
inline typename AlignmentAllocator<T, N>::pointer AlignmentAllocator<T, N>::allocate(size_type n) {
#ifdef _MSC_VER
auto p = (pointer)_aligned_malloc(n * sizeof(value_type), N);
#else
auto p = (pointer)aligned_alloc(N, n * sizeof(value_type));
#endif
if (!p)
throw std::bad_alloc();
return p;
}
template <typename T, std::size_t N>
inline void AlignmentAllocator<T, N>::deallocate(pointer p, size_type) {
#ifdef _MSC_VER
_aligned_free(p);
#else
std::free(p);
#endif
}
static size_t constexpr MatrixAlign = 64;
template <typename T, size_t Align = MatrixAlign>
using AlignedVector = std::vector<T, AlignmentAllocator<T, Align>>;
class Matrix {
public:
Matrix(size_t rows, size_t cols)
: rows_(rows), cols_(cols) {
cols_aligned_ = (sizeof(FloatT) * cols_ + MatrixAlign - 1)
/ MatrixAlign * MatrixAlign / sizeof(FloatT);
Clear();
if (size_t(m_.data()) % 64 != 0 ||
(cols_aligned_ * sizeof(FloatT)) % 64 != 0)
throw std::runtime_error("Matrix was allocated unaligned!");
}
Matrix & Clear() {
m_.clear();
m_.resize(rows_ * cols_aligned_);
return *this;
}
FloatT & operator() (size_t i, size_t j) {
if (i >= rows_ || j >= cols_)
throw std::runtime_error("Matrix index (" +
std::to_string(i) + ", " + std::to_string(j) + ") out of bounds (" +
std::to_string(rows_) + ", " + std::to_string(cols_) + ")!");
return m_[i * cols_aligned_ + j];
}
FloatT const & operator() (size_t i, size_t j) const {
return const_cast<Matrix &>(*this)(i, j);
}
size_t Rows() const { return rows_; }
size_t Cols() const { return cols_; }
bool Equal(Matrix const & b, int round = 7) const {
if (Rows() != b.Rows() || Cols() != b.Cols())
return false;
FloatT const eps = std::pow(FloatT(10), -round);
for (size_t i = 0; i < Rows(); ++i)
for (size_t j = 0; j < Cols(); ++j)
if (std::fabs((*this)(i, j) - b(i, j)) > eps)
return false;
return true;
}
private:
size_t rows_ = 0, cols_ = 0, cols_aligned_ = 0;
AlignedVector<FloatT> m_;
};
void mat_print(Matrix const & A, int round = 7, size_t width = 0) {
FloatT const pow10 = std::pow(FloatT(10), round);
for (size_t i = 0; i < A.Rows(); ++i) {
for (size_t j = 0; j < A.Cols(); ++j)
std::cout << std::setprecision(round) << std::fixed << std::setw(width)
<< std::right << (std::round(A(i, j) * pow10) / pow10) << " ";
std::cout << std::endl;;
}
}
void mat_mult(Matrix const & A, Matrix const & B, Matrix & C) {
if (A.Cols() != B.Rows())
throw std::runtime_error("Number of A.Cols and B.Rows don't match!");
if (A.Rows() != C.Rows() || B.Cols() != C.Cols())
throw std::runtime_error("Wrong C rows, cols!");
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < B.Cols() - B.Cols() % 4; j += 4) {
auto sum = _mm_setzero_ps();
for (size_t k = 0; k < A.Cols(); ++k)
sum = _mm_add_ps(
sum,
_mm_mul_ps(
_mm_set1_ps(A(i, k)),
_mm_loadu_ps(&B(k, j))
)
);
_mm_storeu_ps(&C(i, j), sum);
}
if (B.Cols() % 4 == 0)
return;
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = B.Cols() - B.Cols() % 4; j < B.Cols(); ++j) {
FloatT sum = 0;
for (size_t k = 0; k < A.Cols(); ++k)
sum += A(i, k) * B(k, j);
C(i, j) = sum;
}
}
void mat_mult_slow(Matrix const & A, Matrix const & B, Matrix & C) {
if (A.Cols() != B.Rows())
throw std::runtime_error("Number of A.Cols and B.Rows don't match!");
if (A.Rows() != C.Rows() || B.Cols() != C.Cols())
throw std::runtime_error("Wrong C rows, cols!");
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < B.Cols(); ++j) {
FloatT sum = 0;
for (size_t k = 0; k < A.Cols(); ++k)
sum += A(i, k) * B(k, j);
C(i, j) = sum;
}
}
void mat_fill_random(Matrix & A) {
std::mt19937_64 rng{std::random_device{}()};
std::uniform_real_distribution<FloatT> distr(-9.99, 9.99);
for (size_t i = 0; i < A.Rows(); ++i)
for (size_t j = 0; j < A.Cols(); ++j)
A(i, j) = distr(rng);
}
int main() {
try {
{
Matrix a(17, 23), b(23, 19), c(17, 19), d(c.Rows(), c.Cols());
mat_fill_random(a);
mat_fill_random(b);
mat_mult_slow(a, b, c);
mat_mult(a, b, d);
if (!c.Equal(d, 5))
throw std::runtime_error("Test failed, c != d.");
}
{
Matrix a(3, 7), b(7, 5), c(3, 5), d(c.Rows(), c.Cols());
mat_fill_random(a);
mat_fill_random(b);
mat_mult_slow(a, b, c);
mat_mult(a, b, d);
mat_print(c, 3, 8);
std::cout << std::endl;
mat_print(d, 3, 8);
}
return 0;
} catch (std::exception const & ex) {
std::cout << "Exception: " << ex.what() << std::endl;
return -1;
}
}
Output:
-37.177 -114.438 36.094 -49.689 -139.857
22.113 -127.210 -94.434 -14.363 -6.336
71.878 94.234 33.372 32.573 73.310
-37.177 -114.438 36.094 -49.689 -139.857
22.113 -127.210 -94.434 -14.363 -6.336
71.878 94.234 33.372 32.573 73.310
Related
I'm continually irritated that matrix dimensions are accessed with getters. Sure one can have public rows and cols but these should be const so that users of the Matrix class can't directly modify them. This is a natural way of coding but, because using const members is regarded, apparently incorrectly, as not possible without UB, results in people using non-const fields when they really should be consts that are mutable when needed such as resizing, or assigning.
What I'd like is something that can be used like this:
Matrix2D<double> a(2, 3);
int index{};
for (int i = 0; i < a.rows; i++)
for (int ii = 0; ii < a.cols; ii++)
a(i, ii) = index++;
but where a.rows=5; isn't compilable because it's const. And it would be great if the class can be included in vectors and other containers.
Now comes: Implement C++20's P0784 (More constexpr containers)
https://reviews.llvm.org/D68364?id=222943
It should be doable without casts or UB and can even be done when evaluating const expressions. I believe c++20 has made it doable by providing the functions std::destroy_at and std::construct_at. I have attached an answer using c++20 that appears to show it is indeed possible, and easily done, to provide const member objects in classes without an undue burden.
So the question is do the new constexpr functions in c++20, which provide the ability to destroy and then construct an object with different consts valid? It certainly appears so and it passes the consexpr lack of UB test.
If you want to have const properties you can try this :
class Matrix {
public:
const int rows;
const int cols;
Matrix(int _rows, int _cols) : rows(_rows), cols(_cols)
{
}
};
The rows and cols properties of a Matrix instance are initialized once in the constructor.
Well, here's my pass at using const rows and cols in a 2D matrix. It passes UB tests and can be used in collections like vector w/o weird restrictions. It's a partial implementation. No bounds checking and such but to provide a possible approach to the problem.
matrix2d.h
#pragma once
#include <memory>
#include <vector>
#include <initializer_list>
#include <stdexcept>
#include <array>
template<class T>
class Matrix2D
{
std::vector<T> v;
public:
const size_t rows;
const size_t cols;
constexpr Matrix2D(const Matrix2D& m) = default;
constexpr Matrix2D(Matrix2D&& m) noexcept = default;
explicit constexpr Matrix2D(size_t a_rows=0, size_t a_cols=0) : v(a_rows* a_cols), rows(a_rows), cols(a_cols) {}
constexpr Matrix2D(std::initializer_list<std::initializer_list<T>> list) : rows(list.size()), cols((list.begin())->size())
{
for (auto& p1 : list)
for (auto p2 : p1)
v.push_back(p2);
}
constexpr Matrix2D& operator=(const Matrix2D& mat)
{
std::vector<T> tmp_v = mat.v;
std::construct_at(&this->rows, mat.rows);
std::construct_at(&this->cols, mat.cols);
this->v.swap(tmp_v);
return *this;
}
constexpr Matrix2D& operator=(Matrix2D&& mat) noexcept
{
std::construct_at(&this->rows, mat.rows);
std::construct_at(&this->cols, mat.cols);
this->v.swap(mat.v);
return *this;
}
// user methods
constexpr T* operator[](size_t row) { // alternate bracket indexing
return &v[row * cols];
};
constexpr T& operator()(size_t row, size_t col)
{
return v[row * cols + col];
}
constexpr const T& operator()(size_t row, size_t col) const
{
return v[row * cols + col];
}
constexpr Matrix2D operator+(Matrix2D const& v1) const
{
if (rows != v1.rows || cols != v1.cols) throw std::range_error("cols and rows must be the same");
Matrix2D ret = *this;
for (size_t i = 0; i < ret.v.size(); i++)
ret.v[i] += v1.v[i];
return ret;
}
constexpr Matrix2D operator-(Matrix2D const& v1) const
{
if (rows != v1.rows || cols != v1.cols) throw std::range_error("cols and rows must be the same");
Matrix2D ret = *this;
for (size_t i = 0; i < ret.v.size(); i++)
ret.v[i] -= v1.v[i];
return ret;
}
constexpr Matrix2D operator*(Matrix2D const& v1) const
{
if (cols != v1.rows) throw std::range_error("cols of first must == rows of second");
Matrix2D v2 = v1.transpose();
Matrix2D ret(rows, v1.cols);
for (size_t row = 0; row < rows; row++)
for (size_t col = 0; col < v1.cols; col++)
{
T tmp{};
for (size_t row_col = 0; row_col < cols; row_col++)
tmp += this->operator()(row, row_col) * v2(col, row_col);
ret(row, col) = tmp;
}
return ret;
}
constexpr Matrix2D transpose() const
{
Matrix2D ret(cols, rows);
for (size_t r = 0; r < rows; r++)
for (size_t c = 0; c < cols; c++)
ret(c, r) = this->operator()(r, c);
return ret;
}
// Adds rows to end
constexpr Matrix2D add_rows(const Matrix2D& new_rows)
{
if (cols != new_rows.cols) throw std::range_error("cols cols must match");
Matrix2D ret = *this;
std::construct_at(&ret.rows, rows + new_rows.rows);
ret.v.insert(ret.v.end(), new_rows.v.begin(), new_rows.v.end());
return ret;
}
constexpr Matrix2D add_cols(const Matrix2D& new_cols)
{
if (rows != new_cols.rows) throw std::range_error("rows must match");
Matrix2D ret(rows, cols + new_cols.cols);
for (size_t row = 0; row < rows; row++)
{
for (size_t col = 0; col < cols; col++)
ret(row, col) = this->operator()(row, col);
for (size_t col = cols; col < ret.cols; col++)
ret(row, col) = new_cols(row, col - cols);
}
return ret;
}
constexpr bool operator==(Matrix2D const& v1) const
{
if (rows != v1.rows || cols != v1.cols || v.size() != v1.v.size())
return false;
for (size_t i = 0; i < v.size(); i++)
if (v[i] != v1.v[i])
return false;
return true;
}
constexpr bool operator!=(Matrix2D const& v1) const
{
return !(*this == v1);
}
// friends
template <size_t a_rows, size_t a_cols>
friend constexpr auto make_std_array(const Matrix2D& v)
{
if (a_rows != v.rows || a_cols != v.cols) throw std::range_error("template cols and rows must be the same");
std::array<std::array<T, a_cols>, a_rows> ret{};
for (size_t r = 0; r < a_rows; r++)
for (size_t c = 0; c < a_cols; c++)
ret[r][c] = v(r, c);
return ret;
}
};
Source.cpp
#include <vector>
#include <algorithm>
#include <iostream>
#include "matrix2d.h"
constexpr std::array<std::array<int, 3>, 4> foo()
{
Matrix2D<double> a(2, 3), b;
Matrix2D d(a);
int index{};
for (int i = 0; i < a.rows; i++)
for (int ii = 0; ii < a.cols; ii++)
a(i, ii) = index++;
b = a + a;
Matrix2D<int> z1{ {1},{3},{5} };
z1 = z1.add_cols({{2}, {4}, {6}});
z1 = z1.add_rows({{7,8}});
// z1 is now {{1,2},{3,4},{5,6},{7,8}}
Matrix2D<int> z2{ {1,2,3},{4,5,6} };
Matrix2D<int> z3 = z1 * z2; // z3: 4 rows, 3 cols
// from separate math program product of z1 and z2
Matrix2D<int> ref{ {9,12,15},{19,26,33},{29,40,51},{39,54,69} };
// test transpose
Matrix2D<int> tmp = ref.transpose(); // tmp: now 3 rows, 4 cols
if (ref == tmp) throw; // verify not equal
tmp = tmp.transpose(); // now ref==tmp==z3
if (ref != tmp || ref != z3) throw;
// Check using vector of matrixes, verify sort on row size works
std::vector<Matrix2D<int>> v;
v.push_back(z1);
v.push_back(z2);
v.push_back(z1);
v.push_back(z2);
std::sort(v.begin(), v.end(), [](auto const& m1, auto const& m2) {return m1.rows < m2.rows; });
for (size_t i = 0; i < v.size() - 1; i++)
if (v[i].rows > v[i + 1].rows) throw;
// return constexpr of 2D std::array
std::array<std::array<int, 3>, 4> array = make_std_array<4,3>(ref); // ref must have 4 rows, 3 cols
return array;
}
int main()
{
auto print = [](auto& a) {
for (auto& x : a)
{
for (auto y : x)
std::cout << y << " ";
std::cout << '\n';
}
std::cout << '\n';
};
// run time
auto zz = foo();
print(zz);
// validating no UB in Matrix2D;
// and creating a nested, std::array initialized at compile time
//constexpr std::array<std::array<int, 3>, 4> x = foo();
constexpr std::array<std::array<int, 3>, 4> x = foo(); // compile time
print(x);
return 0;
}
The issue I am having is how to get the correct number columns to go through for the inner most loop of K.
An example is a 2x3 matrix and a 3x2 matrix being multiplied.
The result should be a 2x2 matrix, but currently I dont know how to send the value of 2 to the operator overloaded function.
It should be
int k = 0; k < columns of first matrix;k++
Matrix::Matrix(int row, int col)
{
rows = row;
cols = col;
cx = (float**)malloc(rows * sizeof(float*)); //initialize pointer to pointer matrix
for (int i = 0; i < rows; i++)
*(cx + i) = (float*)malloc(cols * sizeof(float));
}
Matrix Matrix::operator * (Matrix dx)
{
Matrix mult(rows, cols);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
mult.cx[i][j] = 0;
for (int k = 0; k < ?;k++) //?????????????
{
mult.cx[i][j] += cx[i][k] * dx.cx[k][j];
}
}
}
mult.print();
return mult;
//calling
Matrix mult(rowA, colB);
mult = mat1 * mat2;
}
Linear algebra rules say the result should have dimensions rows x dx.cols
Matrix Matrix::operator * (Matrix dx)
{
Matrix mult(rows, dx.cols);
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < cols; j++)
{
mult.cx[i][j] = 0;
for (int k = 0; k < cols;k++) //?????????????
{
mult.cx[i][j] += cx[i][k] * dx.cx[k][j];
}
}
}
mult.print();
return mult;
A few random hints:
Your code is basically C; it doesn’t use (e.g.) important memory-safety features from C++. (Operator overloading is the only C++-like feature in use.) I suggest that you take advantage of C++ a bit more.
Strictly avoid malloc() in C++. Use std::make_unique(...) or, if there is no other way, a raw new operator. (BTW, there is always another way.) In the latter case, make sure there is a destructor with a delete or delete[]. The use of malloc() in your snippet smells like a memory leak.
What can be const should be const. Initialize as many class members as possible in the constructor’s initializer list and make them const if appropriate. (For example, Matrix dimensions don’t change and should be const.)
When writing a container-like class (which a Matrix may be, in a sense), don’t restrict it to a single data type; your future self will thank you. (What if you need a double instead of a float? Is it going to be a one-liner edit or an all-nighter spent searching where a forgotten float eats away your precision?)
Here’s a quick and dirty runnable example showing matrix multiplication:
#include <cstddef>
#include <iomanip>
#include <iostream>
#include <memory>
namespace matrix {
using std::size_t;
template<typename Element>
class Matrix {
class Accessor {
public:
Accessor(const Matrix& mat, size_t m) : data_(&mat.data_[m * mat.n_]) {}
Element& operator [](size_t n) { return data_[n]; }
const Element& operator [](size_t n) const { return data_[n]; }
private:
Element *const data_;
};
public:
Matrix(size_t m, size_t n) : m_(m), n_(n),
data_(std::make_unique<Element[]>(m * n)) {}
Matrix(Matrix &&rv) : m_(rv.m_), n_(rv.n_), data_(std::move(rv.data_)) {}
Matrix operator *(const Matrix& right) {
Matrix result(m_, right.n_);
for (size_t i = 0; i < m_; ++i)
for (size_t j = 0; j < right.n_; ++j) {
result[i][j] = Element{};
for (size_t k = 0; k < n_; ++k) result[i][j] +=
(*this)[i][k] * right[k][j];
}
return result;
}
Accessor operator [](size_t m) { return Accessor(*this, m); }
const Accessor operator [](size_t m) const { return Accessor(*this, m); }
size_t m() const { return m_; }
size_t n() const { return n_; }
private:
const size_t m_;
const size_t n_;
std::unique_ptr<Element[]> data_;
};
template<typename Element>
std::ostream& operator <<(std::ostream &out, const Matrix<Element> &mat) {
for (size_t i = 0; i < mat.m(); ++i) {
for (size_t j = 0; j < mat.n(); ++j) out << std::setw(4) << mat[i][j];
out << std::endl;
}
return out;
}
} // namespace matrix
int main() {
matrix::Matrix<int> m22{2, 2};
m22[0][0] = 0; // TODO: std::initializer_list
m22[0][1] = 1;
m22[1][0] = 2;
m22[1][1] = 3;
matrix::Matrix<int> m23{2, 3};
m23[0][0] = 0; // TODO: std::initializer_list
m23[0][1] = 1;
m23[0][2] = 2;
m23[1][0] = 3;
m23[1][1] = 4;
m23[1][2] = 5;
matrix::Matrix<int> m32{3, 2};
m32[0][0] = 5; // TODO: std::initializer_list
m32[0][1] = 4;
m32[1][0] = 3;
m32[1][1] = 2;
m32[2][0] = 1;
m32[2][1] = 0;
std::cout << "Original:\n\n";
std::cout << m22 << std::endl << m23 << std::endl << m32 << std::endl;
std::cout << "Multiplied:\n\n";
std::cout << m22 * m22 << std::endl
<< m22 * m23 << std::endl
<< m32 * m22 << std::endl
<< m23 * m32 << std::endl
<< m32 * m23 << std::endl;
}
Possible improvements and other recommendations:
Add consistency checks. throw, for example, a std::invalid_argument when dimensions don’t match on multiplication, i.e. when m_ != right.n_, and a std::range_error when the operator [] gets an out-of-bounds argument. (The checks may be optional, activated (e.g.) for debugging using an if constexpr.)
Use a std::initializer_list or the like for initialization, so that you can have (e.g.) a const Matrix initialized in-line.
Always check your code using valgrind. (Tip: Buliding with -g lets valgrind print also the line numbers where something wrong happened (or where a relevant preceding (de)allocation had happened).)
The code could me made shorter and more elegant (not necessarily more efficient; compiler optimizations are magic nowadays) by not using operator [] everywhere and having some fun with pointer arithmetics instead.
Make the type system better, so that (e.g.) Matrix instances with different types can play well with each other. Perhaps a Matrix<int> multiplied by a Matrix<double> could yield a Matrix<double> etc. One could also support multiplication between a scalar value and a Matrix. Or between a Matrix and a std::array, std::vector etc.
I've been trying to implement Dijkstra's algorithm in C++11 to work on matrices of arbitrary size. Specifically, I am interested in solving question 83 on Project Euler.
I appear to always run in to a situation where every node neighboring the current node has already been visited, which, if I understand the algorithm correctly, should never happen.
I've tried poking around in a debugger, and I've re-read the code several times, but I have no idea where I am going wrong.
Here is what I have done so far:
#include <iostream>
#include <fstream>
#include <cstdlib>
#include <vector>
#include <set>
#include <tuple>
#include <cstdint>
#include <cinttypes>
typedef std::tuple<size_t, size_t> Index;
std::ostream& operator<<(std::ostream& os, Index i)
{
os << "(" << std::get<0>(i) << ", " << std::get<1>(i) << ")";
return os;
}
template<typename T>
class Matrix
{
public:
Matrix(size_t i, size_t j):
n(i),
m(j),
xs(i * j)
{}
Matrix(size_t n, size_t m, const std::string& path):
n(n),
m(m),
xs(n * m)
{
std::ifstream mat_in {path};
char c;
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < m - 1; ++j) {
mat_in >> (*this)(i,j);
mat_in >> c;
}
mat_in >> (*this)(i,m - 1);
}
}
T& operator()(size_t i, size_t j)
{
return xs[n * i + j];
}
T& operator()(Index i)
{
return xs[n * std::get<0>(i) + std::get<1>(i)];
}
T operator()(Index i) const
{
return xs[n * std::get<0>(i) + std::get<1>(i)];
}
std::vector<Index> surrounding(Index ind) const
{
size_t i = std::get<0>(ind);
size_t j = std::get<1>(ind);
std::vector<Index> is;
if (i > 0)
is.push_back(Index(i - 1, j));
if (i < n - 1)
is.push_back(Index(i + 1, j));
if (j > 0)
is.push_back(Index(i, j - 1));
if (j < m - 1)
is.push_back(Index(i, j + 1));
return is;
}
size_t rows() const { return n; }
size_t cols() const { return m; }
private:
size_t n;
size_t m;
std::vector<T> xs;
};
/* Finds the minimum sum of the weights of the nodes along a path from 1,1 to n,m using Dijkstra's algorithm modified for matrices */
int64_t shortest_path(const Matrix<int>& m)
{
Index origin(0,0);
Index current { m.rows() - 1, m.cols() - 1 };
Matrix<int64_t> nodes(m.rows(), m.cols());
std::set<Index> in_path;
for (size_t i = 0; i < m.rows(); ++i)
for (size_t j = 0; j < m.cols(); ++j)
nodes(i,j) = INTMAX_MAX;
nodes(current) = m(current);
while (1) {
auto is = m.surrounding(current);
Index next = origin;
for (auto i : is) {
if (in_path.find(i) == in_path.end()) {
nodes(i) = std::min(nodes(i), nodes(current) + m(i));
if (nodes(i) < nodes(next))
next = i;
}
}
in_path.insert(current);
current = next;
if (current == origin)
return nodes(current);
}
}
int64_t at(const Matrix<int64_t>& m, const Index& i) { return m(i); }
int at(const Matrix<int>& m, const Index& i) { return m(i); }
int main()
{
Matrix<int> m(80,80,"mat.txt");
printf("%" PRIi64 "\n", shortest_path(m));
return 0;
}
You do not understand the algorithm correctly. There is nothing stopping you from running into dead ends. As long as there are other options you have not yet explored, just mark it as a dead end and move on.
BTW I agree with commentators who say that you are overcomplicating the solution. It suffices to create a matrix of "cost to get to here" and have a queue of points to explore paths from. Initialize the total cost matrix to a value for NOT_VISITED, -1 would work. For each point, you look at the neighbors. If the neighbor either has not been visited, or you just found a cheaper path to it, then adjust the cost matrix and add the point to the queue.
Keep going until the queue is empty. And then you have guaranteed lowest costs everywhere.
A* is a lot more efficient than this naive approach, but what I just described is more than efficient enough to solve the problem.
How do you create a multidimensional array (matrix) whose dimensions are determined at runtime.
The best way seems to be to take a vector of dimensions for construction and also a vector of offsets to access individual elements
This will also allow using initializer lists:
This should take a matrix of types determined at compile time, so templates make sense
C++11 features should be used as appropriate, bonus points for lambdas
Example usage:
int main(int, char **)
{
static const std::size_t d1{2};
static const std::size_t d2{3};
static const std::size_t d3{4};
multi_vec<std::size_t> q({d1,d2,d3});
for (std::size_t i1=0; i1 < d1; ++i1)
for (std::size_t i2=0; i2 < d2; ++i2)
for (std::size_t i3=0; i3 < d3; ++i3)
q[{i1,i2,i3}]=foo(i1,i2,i3);
for (std::size_t i1=0; i1 < d1; ++i1)
for (std::size_t i2=0; i2 < d2; ++i2)
for (std::size_t i3=0; i3 < d3; ++i3)
std::cout << "{"
<< i1 << ","
<< i2 << ","
<< i3 << "}=> "
<< foo(i1,i2,i3) << "=="
<< q[{i1,i2,i3}]
<< ((foo(i1,i2,i3) == q[{i1,i2,i3}])
? " good" : " ERROR")
<< std::endl;
};
This is actually a good place to use lambdas with std::for_each and unique_ptr for memory management:
template <class T>
class multi_vec
{
public:
using param=std::vector<size_t>;
explicit multi_vec(const param& dimensions)
: dim{dimensions}, prod {1}
{
std::for_each(dim.begin(), dim.end(), [this] (std::size_t val)
{
mult.emplace_back(prod);
prod *= val;
} );
ptr.reset(new T[prod]);
}
std::size_t capacity() const { return prod; }
// undefined if elements in lookup != elemenets in dim
// undefined if any element in lookup
// is greater than or equal to corresponding dim element
T& operator[](const param& lookup)
{
return ptr[get_offset(lookup)];
}
const T operator[](const param& lookup) const
{
return ptr[get_offset(lookup)];
}
private:
std::size_t get_offset(const param& lookup) const
{
std::size_t offset=0;
auto mit=mult.begin();
std::for_each(lookup.begin(), lookup.end(), [&offset, &mit] (std::size_t val)
{
offset+=*mit * val;
++mit;
} );
return offset;
}
param dim;
param mult;
std::size_t prod;
std::unique_ptr<T[]> ptr;
};
This uses a single dimensional array for actual storage and caches multipliers for reuse
Since a normal multidimension array e.g. x[2][3][4] does not bounds check, going out of bounds is deemed undefined instead of constantly checking values that are probably valid. In the same way diminsionality is not checked for lookups, if required, a static member can be added to check if a parameter is valid, in terms of bounds or dimensionality.
As answered here https://stackoverflow.com/a/19725907/2684539 :
#include <cstddef>
#include <vector>
template<typename T>
class MultiArray
{
public:
explicit MultiArray(const std::vector<size_t>& dimensions) :
dimensions(dimensions),
values(computeTotalSize(dimensions))
{
assert(!dimensions.empty());
assert(!values.empty());
}
const T& get(const std::vector<size_t>& indexes) const
{
return values[computeIndex(indexes)];
}
T& get(const std::vector<size_t>& indexes)
{
return values[computeIndex(indexes)];
}
size_t computeIndex(const std::vector<size_t>& indexes) const
{
assert(indexes.size() == dimensions.size());
size_t index = 0;
size_t mul = 1;
for (size_t i = 0; i != dimensions.size(); ++i) {
assert(indexes[i] < dimensions[i]);
index += indexes[i] * mul;
mul *= dimensions[i];
}
assert(index < values.size());
return index;
}
std::vector<size_t> computeIndexes(size_t index) const
{
assert(index < values.size());
std::vector<size_t> res(dimensions.size());
size_t mul = values.size();
for (size_t i = dimensions.size(); i != 0; --i) {
mul /= dimensions[i - 1];
res[i - 1] = index / mul;
assert(res[i - 1] < dimensions[i - 1]);
index -= res[i - 1] * mul;
}
return res;
}
private:
size_t computeTotalSize(const std::vector<size_t>& dimensions) const
{
size_t totalSize = 1;
for (auto i : dimensions) {
totalSize *= i;
}
return totalSize;
}
private:
std::vector<size_t> dimensions;
std::vector<T> values;
};
int main()
{
MultiArray<int> m({3, 2, 4});
m.get({0, 0, 3}) = 42;
m.get({2, 1, 3}) = 42;
for (size_t i = 0; i != 24; ++i) {
assert(m.computeIndex(m.computeIndexes(i)) == i);
}
return 0;
}
You use C. Honestly, C's dynamic arrays are superior to anything C++ offers. Here is the code for a 3D array:
//Allocation, the type needs some effort to understand,
//but once you do understand it, this is easy:
int width = 7, height = 8, depth = 9;
int (*array)[height][width] = malloc(depth*sizeof(*array));
//Filling, completely natural...
for(int z = 0; z < depth; z++) {
for(int y = 0; y < height; y++) {
for(int x = 0; x < width; x++) {
array[z][y][x] = 42;
}
}
}
//Deallocation, trivial...
free(array);
A 3D array is nothing more or less than a 1D array of 2D arrays (which are 1D arrays of 1D arrays in turn), so you declare a pointer *array to a 2D array of integers int (...)[heigh][width], and allocate space for depth such elements malloc(depth*sizeof(*array)). This is fully analogous to creating a 1D array with int *array = malloc(length*sizeof(*array));. The rest is array-pointer-decay magic.
This works is C, because C allows array types to contain dynamic sizes (since C99). It's like defining two constants with the declared sizes at the point of the declaration. C++, on the other hand, still insists on array sizes being compile-time constants, and thus fails to allow you multi-dimensional arrays of dynamic size.
When I want to initiate a multidimensional array, I usually just use pointers. For example, for two dimensions I use:
double **array
and for three I use:
double ***array
However, I'd like to set a multidimensional array based on a command line argument indicating the dimension. Is there are way to set an array of arbitrary size once you have a variable with the number of dimensions you'd like?
Even though this whole question is an indication of a design flaw, you can (sort of) accomplish this:
template<typename T>
class MultiArray
{
public:
MultiArray(std::size_t dimen, std::size_t dimen_size) : _dimensions(dimen)
{
_data = new T[dimen * dimen_size];
}
// implment copy constructor, copy-assignment operator, destructor, and move constructors as well
T* operator[](int i)
{
assert(0 <= i && i < _dimensions); // bounds check for your dimension
return &_data[i];
}
private:
T* _data;
std::size_t _dimensions;
};
int main()
{
MultiArray<int> a(5, 2);
a[4][1] = 3;
std::cout << a[4][1] << std::endl;
return 0;
}
If you want it jagged, you would have to do more math and maintenance regarding the bounds for each "dimension".
The problem you run into has making the dimensions mean something for your application. Typically, a multi-dimensional array represents something (e.g. a 2D vector can represent Cartesian space, a 3D or 4D vector can be used for manipulating data for 3D graphics). Beyond the 4th dimension, finding a valid meaning for the array becomes murky and maintaining the logic behind it becomes increasingly complex with each new dimension.
you may be interested in the following code which allow you to use any "dynamic" dimension:
#include <cassert>
#include <cstddef>
#include <vector>
template<typename T>
class MultiArray
{
public:
explicit MultiArray(const std::vector<size_t>& dimensions) :
dimensions(dimensions),
values(computeTotalSize(dimensions))
{
assert(!dimensions.empty());
assert(!values.empty());
}
const T& get(const std::vector<size_t>& indexes) const
{
return values[computeIndex(indexes)];
}
T& get(const std::vector<size_t>& indexes)
{
return values[computeIndex(indexes)];
}
size_t computeIndex(const std::vector<size_t>& indexes) const
{
assert(indexes.size() == dimensions.size());
size_t index = 0;
size_t mul = 1;
for (size_t i = 0; i != dimensions.size(); ++i) {
assert(indexes[i] < dimensions[i]);
index += indexes[i] * mul;
mul *= dimensions[i];
}
assert(index < values.size());
return index;
}
std::vector<size_t> computeIndexes(size_t index) const
{
assert(index < values.size());
std::vector<size_t> res(dimensions.size());
size_t mul = values.size();
for (size_t i = dimensions.size(); i != 0; --i) {
mul /= dimensions[i - 1];
res[i - 1] = index / mul;
assert(res[i - 1] < dimensions[i - 1]);
index -= res[i - 1] * mul;
}
return res;
}
private:
size_t computeTotalSize(const std::vector<size_t>& dimensions) const
{
size_t totalSize = 1;
for (auto i : dimensions) {
totalSize *= i;
}
return totalSize;
}
private:
std::vector<size_t> dimensions;
std::vector<T> values;
};
int main()
{
MultiArray<int> m({3, 2, 4});
m.get({0, 0, 3}) = 42;
m.get({2, 1, 3}) = 42;
for (size_t i = 0; i != 24; ++i) {
assert(m.computeIndex(m.computeIndexes(i)) == i);
}
return 0;
}