C++ 2D growing array like MATLAB

C++ 2D growing array like MATLAB - c++

I have read some posts about dynamic growing arrays in C, but I can't see how to create a 2D growing array (like in MATLAB).
I have a function to construct an array for some image processing, but I don't know what will be the size of this array (cols and rows). How can I create this?
I read something about malloc and realloc. These functions are portable or useful for this problem.
EDIT: SOLVED, using the Armadillo library, a C++ linear algebra library.

Simplest is with pointers
int nrows = 10;
int ncols = 5;
double* matrix = new double[mrows*ncols];
And then you can access it as if it's a 2D array like.
So if you want matrix[row][col], you'd do
int offset = row*ncols+col;
double value = matrix[offset];
Also, if you want the comfort of Matlab like matrixes in C++, look into Armadillo

If you're doing image processing, you might want to use the matrix and array types from opencv.

By growing an array like Matlab, I'm assuming you mean doing things like:
mat = [mat; col]
You can resize a matrix in C++, but not with a clean syntax like the one above.
For example, you can use std::vector<std::vector<T>> to represent your matrix.
std::vector<std::vector<int> > mat;
Then to add a column:
for (int i=0; i<mat.size(); i++) mat[i].push_back(col[i]);
or to add a row
mat.push_back(row); // row is a std::vector<int>

+1 for OpenCV, especially useful if you are doing image analysis, as it abstracts the underlying data type (GRAYSCALE, RGB, etc.).

C++ doesn't have a standard matrix class per-se. I think there were too many different uses of such a class that made a one-size-fit all solution impossible. There is an example and discussion in Stroustrup's book (The C++ Programming Language (Third Edition)) as to a simple implementation for a numerical matrix.
However, for image processing it's much better to use an existing library.
You might have a look at CImg. I've used it before and found it quick and well documented.
If you are on an AMD machine I know there is an optimised library for image processing from AMD, the Framewave project Framewave Project.
Also, if you are used to MATLAB style code then you may want to look at it++.
I think the project aim is for it to be as similar to MATLAB as possible.

Related

Training data structure and access

I'm writing up an implementation of backpropagation for a feedforward neural network in C++ and I'm using the Armadillo library. Right now, I'm loading training data with the method load for the class matrix in the Armadillo library. Two questions:
1) Is this a reasonable choice for storing pre-formatted (CSV), numeric data that fits into main memory (<2GB)? Certainly there are better ways to do this than others and it'd be nice to know if this is not a good practice. Part of me feels like this isn't a good choice for holding the data as there are likely more data-ish structures/frameworks (like I should be accessing some SQL database or something). Another part of me feels like numeric data is by definition just matrices so this should be wonderful.
2) I need to sample without replacement from a data set in my implementation and I see two routes: either I could shuffle the rows of the data set or shuffle an array that indexes the data set. There is a shuffle method for the matrix class in the Armadillo library and I'm suspicious that what is shuffled is addresses and not the rows themselves. Wouldn't that be just as efficient as shuffling an indexing array?

1) Yes, this is fine and it's how I would do it, but note that Armadillo matrices are column-major and thus you may need to transpose the CSV that you load. If your data is sufficiently large that it won't fit in main memory, you could consider writing a custom CSV parser that looks at the data in a streaming sense (i.e. one point at a time), thus reducing your RAM footprint, or you could even use mmap() to map a file full of packed doubles as your matrix and let the kernel work out what needs to be swapped in when.
2) Because all matrix data is stored contiguously (i.e. double* not double**), shuffle() will be moving the elements in the matrix. What I generally do in this type of situation is create a vector of indices and shuffle it:
uvec indices = linspace<uvec>(0, n, n);
shuffle(indices);
// Now loop over each shuffled point...
for (uword i = 0; i < n; ++i)
{
// access the point with data.col(indices[i]) and do whatever
}
(The above code isn't tested, but it should work or easily be adapted into something that works.)
For what it's worth, mlpack (http://www.mlpack.org/) does have a not-yet-stable neural network infrastructure that uses Armadillo, and it may be worth your time to check out; the link below is to the relevant source directly, but poking around on Github and the mlpack website should reveal better documentation.
https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/ann

How to wrap Eigen::SparseMatrix over preexistant 3-standard compress row/colum arrays

NOTE: I allready asked this question, but it was closed because of "too broad" without much explanation. I can't see how this question could be more specific (it deals with a specific class of a specific library for a specific usage...), so I assume that it was something like a "moderator's mistake" and ask it again...
I would like to perfom sparse matrix/matrix multiplication using Eigen on sparse matrices. These matrices are already defined in the code I am working on in standard 3-arrays compressed row/column strorage.
Then I would like to use the Eigen::SparseMatrix class as a wrapper on these arrays (assuming that internally Eigen uses such a 3-arrays storage) in order to avoid to duplicate matrices in memory. I would like to do something like the following:
Eigen::SparseMatrix smin0(n,m);
Eigen::SparseMatrix smin1(m,l);
Eigen::SparseMatrix smout(n,l);
smin0.set_innerPtr(myInnerPtr0);
smin0.set_outerPtr(myOuterPtr0);
smin0.set_valuePtr(myValuePtr0);
smin1.set_innerPtr(myInnerPtr1);
smin1.set_outerPtr(myOuterPtr1);
smin1.set_valuePtr(myValuePtr1);
smout=smin0*smin1;
int *myOutInnerPtr=smout.innerIndexPtr();
int *myOutOuterPtr=smout.outerIndexPtr();
double *myOutValuePtr=smout.valuePtr();
Is it possible and if yes, how?
Many Thanks

As ggael pointed out, you can use Eigen::MappedSparseMatrix for that.
The reason you can't just overwrite the internal pointers of a SparseMatrix is that this would cause problems when the SparseMatrix deallocates them, but you allocated them in a different way then Eigen does (and how Eigen internally allocates memory is an implementation detail you should not really rely on in your code).

Storing big matrices in C++ (Armadillo)

I'm using the Armadillo library in C++ for storing / calculating large matrices. It is my understanding that one should store large arrays / matrices dynamically (on the heap).
Suppose I declare a matrix
mat X;
and set the size to be (say) 500 rows, 500 columns with random entries:
X.randn(500,500);
Does Armadillo store X dynamically (i.e. on the heap) despite not using new or delete.? The reason I ask, is because it seems Armadillo allows me to declare a variable as:
mat::fixed<n_rows, n_cols>
which, I quote: "is generally faster than dynamic memory allocation, but the size of the matrix can't be changed afterwards (directly or indirectly)".
Regardless of the above -- should I use this:
mat A;
A.set_size(n-1,n-1);
or this:
mat *A = new mat;
(*A).set_size(n-1,n-1);
where n is between 1000 or 100000 and not known in advance.

Does Armadillo store X dynamically (i.e. on the heap) despite not
using new or delete.?
Yes. There will be some form of new or delete in the library code. You just don't notice it from the outside.
The reason I ask, is because it seems Armadillo
allows me to declare a variable as (mat::fixed ...)
You'd have to look into the source code to see what's going on exactly here. My guess is that it has some kind of internal logic that decides how to deal with things based on size. You would normally use mat::fixed for small matrices, though.
Following that, you should use
mat A(n-1,n-1);
if you know the size at that point already. In some cases,
mat A;
A.set_size(n-1,n-1);
might also be okay.
I can't think of a good reason to use your second option with the mat * pointer. First of all, libraries like armadillo handle their memory allocations internally, and developers take great care to get it right. Also, even if the memory code in the library was broken, your idea new mat wouldn't fix it: You would allocate memory for a mat object, but that object is certainly rather small. The big part is probably hidden behind something like a member variable T* data in the class mat, and you cannot influence how this is allocated from the outside.
I initially missed your comment on the size of n. As Mikhail says, dealing with 100000x100000 matrices will require much more care than simply thinking about the way you instantiate them.

Storing images of different sizes in a data structure, OpenCV

I am wondering if there is any way to hold (or store) images of different sizes in a single data structure using OpenCV (C++). For example, in MATLAB I can do it by using "cell".
Specifically, I am generating my results which are images of different sizes and it would be grate for me if I can store them in a single data structure. So that, I can use it late on.
Please note, this has to be done with C++ and OpenCV.
I am thinking to give a try with: std::vector. Thanks a lot.

Yeah you can try this
std::vector<cv::Mat> ImageDataBase;
for(int i=0;i<length_of_imageDataBase;i++)
{
cv::Mat img = cv::imread("Address of the images");
ImageDataBase.pushback(img);
}

I think the problem lays in the way You think about objects in c++ generally. Matlab requires objects to be of the same size in one vector/array/matrix/however it should be called, because it is optimised to operate on matrices, and those operations are very dependent on dimensions of a matrix.
In c++ the main entity is an object. The most similar thing to matlab vector is an array, like cv::Mat potatoes[30]. Yet, even this demands only to be filled with objects of the same class, disregarding the size of those cv::Mat contents.
So, to wrap it all up, You have a couple of choices:
an array, like cv::Mat crazySocks[42] - You need to be carefull here, because You need to know how many socks there will be, and You might a segmentation error if You go out of array bounds
a vector, as suggested by Vinoj John Hosan, like std::vector<cv::Mat> jaguars - this is a fine idea, because stl containers can do some nice tricks with their content, and You may easily modify size of the vector.
a list, like std::list<cv::Mat> toFind - this is better than vector if You plan to modify the size of Your container often.
any of previously mentioned, but with pointers, like cv::Mat *crazyPointers[33] - when You have some big objects to move, it's better to move only informations about where they are, than the object.cv::Mat does some tricks internally with it's data, so it shouldn't be the case.

Is a vector<vector<double>> a good way to make a matrix class?

I am a maths student and quite new to C++ and to help my learning I want to create a matrix class (I dont want to use a library class). I was thinking of doing something like
int iRows = 5;
int iColumns = 6;
double** pMatrix = new double*[iRows];
for (int i = 0; i < iRows; ++i) {
pMatrix[i] = new double[iColumns];
}
(I am not sure if this is the right syntax - I wanted to get advice here before trying) but I see here at Stackoverflow that using pointers that are not like shared_ptr is not recommended. Is it better to use vector<vector<double>> so that I do not have to worry about deleting the memory? I am worried that vector is not a good choice because the length can be changed with push_back and I want the matrix to be fixed in size. I cannot use
double dMatrix[iRows][iColumns];
because the dimensions are not constant. What would be the best choice for me to use?

Probably
std::vector<double> matrix(rows * columns); // ditch the prefixes
// indexing: matrix[row * columns + column];
As each row will have the same number of columns anyway.

I would ask yourself first: what are you trying to achieve? Are you wanting to create something as a learning exercise or do you want a decent matrix implementation?
If you are wanting to do this as a learning exercise then I would suggest using only a 1d vector of doubles internally with MxN elements. Create a class that stores this internally but hides the implementation from callers -- they shouldn't know or care how it's stored. As part of the interface you would typically want to access it via operator (m,n), eg
double& MyMatrix::operator()(int m, int n) {
return m_Array[m*numColumns + n];
}
As soon as you try to do more interesting things with it such as addition and multiplication you'll realise that you'll have to overload the arithmetic operators. Not just operator+, operator-, but also operators *, /, *=, +=, -= /=, ++, --. When you implement multiplication you may find that your implementation may be too slow to be useful as you may find that you're making lots of redundant copies. YMMV
So if you want a fast matrix library then you'll want a library that uses BLAS internally such as Boost's Basic Linear Algebra library.
Perhaps then try it yourself first to get an idea of the problems in getting a good design then take a look at boost as you will learn a lot by studying it.

No, definitely not. Neither
vector<vector<double>> matrix;
nor
double** matrix;
are good layouts for a matrix class. Although your class may work well to get acquainted with programming, the performance of such a matrix class will be inferior. The problem is that you lose data locality. Just consider your code
for (int i = 0; i < iRows; ++i) {
pMatrix[i] = new double[iColumns];
}
For an efficient matrix-vector multiplication you should have as much matrix values as possible in the cache otherwise the memory transfers will just take too much time.
As you are acquiring one block of memory per row, nothing guarantees that these data junks are close together in memory. For a simple matrix-vector multiplication this might not be too bad because the row elements are still stored contiguously and "only" the jump from one row to the next leads to a cache miss.
However, operating on the transposed matrix is really a problem because the values along a column might be stored anywhere in the memory and there is no reasonable way to determine the stride between those elements which could be used for cache prefetching.
Thus, as suggested by the other authors, use one large block of memory for your matrix. This requires a bit more effort from your side but it will pay off for the users of your matrix class.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ 2D growing array like MATLAB - c++

If you're doing image processing, you might want to use the matrix and array types from opencv.

+1 for OpenCV, especially useful if you are doing image analysis, as it abstracts the underlying data type (GRAYSCALE, RGB, etc.).

Related

Training data structure and access

How to wrap Eigen::SparseMatrix over preexistant 3-standard compress row/colum arrays

Storing big matrices in C++ (Armadillo)

Storing images of different sizes in a data structure, OpenCV

Is a vector<vector<double>> a good way to make a matrix class?

Categories

Resources