Storing big matrices in C++ (Armadillo) - c++

I'm using the Armadillo library in C++ for storing / calculating large matrices. It is my understanding that one should store large arrays / matrices dynamically (on the heap).
Suppose I declare a matrix
mat X;
and set the size to be (say) 500 rows, 500 columns with random entries:
X.randn(500,500);
Does Armadillo store X dynamically (i.e. on the heap) despite not using new or delete.? The reason I ask, is because it seems Armadillo allows me to declare a variable as:
mat::fixed<n_rows, n_cols>
which, I quote: "is generally faster than dynamic memory allocation, but the size of the matrix can't be changed afterwards (directly or indirectly)".
Regardless of the above -- should I use this:
mat A;
A.set_size(n-1,n-1);
or this:
mat *A = new mat;
(*A).set_size(n-1,n-1);
where n is between 1000 or 100000 and not known in advance.

Does Armadillo store X dynamically (i.e. on the heap) despite not
using new or delete.?
Yes. There will be some form of new or delete in the library code. You just don't notice it from the outside.
The reason I ask, is because it seems Armadillo
allows me to declare a variable as (mat::fixed ...)
You'd have to look into the source code to see what's going on exactly here. My guess is that it has some kind of internal logic that decides how to deal with things based on size. You would normally use mat::fixed for small matrices, though.
Following that, you should use
mat A(n-1,n-1);
if you know the size at that point already. In some cases,
mat A;
A.set_size(n-1,n-1);
might also be okay.
I can't think of a good reason to use your second option with the mat * pointer. First of all, libraries like armadillo handle their memory allocations internally, and developers take great care to get it right. Also, even if the memory code in the library was broken, your idea new mat wouldn't fix it: You would allocate memory for a mat object, but that object is certainly rather small. The big part is probably hidden behind something like a member variable T* data in the class mat, and you cannot influence how this is allocated from the outside.
I initially missed your comment on the size of n. As Mikhail says, dealing with 100000x100000 matrices will require much more care than simply thinking about the way you instantiate them.

Related

Creating nd matrices in Eigen using arrays or vectors

I’ve encountered a problem when working with Eigen in C++. Eigen doesn’t support n-dimensional matrices (beside the unsupported Tensor class which is actually not an option). What I need is a dynamically allocated rank 4 tensor. Now I’ve two options:
Using an std::vector<std::vector<Eigen::MatrixXd>>> which seems like an bad idea because every vector will allocate it’s own memory (somewhere) and hence, it will not really be efficient.
Using an dynamically allocated 2d-Array inside a std::unique_ptr because I don’t want to manually free the pointer. The downside of this is actually, that one usually shouldn’t use arrays wrapped inside a std:unique_ptr nowadays because for dynamically allocated arrays we have std::vector.
Could someone give me a hint towards the right direction or suggest another approach?

Declaring 3D array structure in c++ using vector

Hi I am a graduate student studying scientific computing using c++. Some of our research focus on speed of an algorithm, therefore it is important to construct array structure that is fast enough.
I've seen two ways of constructing 3D Arrays.
First one is to use vector liblary.
vector<vector<vector<double>>> a (isize,vector<double>(jsize,vector<double>(ksize,0)))
This gives 3D array structure of size isize x jsize x ksize.
The other one is to construct a structure containing 1d array of size isize* jsize * ksize using
new double[isize*jsize*ksize]. To access the specific location of (i,j,k) easily, operator overloading is necessary(am I right?).
And from what I have experienced, first one is much faster since it can access to location (i,j,k) easily while latter one has to compute location and return the value. But I have seen some people preferring latter one over the first one. Why do they prefer the latter setting? and is there any disadvantage of using the first one?
Thanks in adavance.
Main difference between those will be the layout:
vector<vector<vector<T>>>
This will get you a 1D array of vector<vector<T>>.
Each item will be a 1D array of vector<T>.
And each item of those 1D array will be a 1D array of T.
The point is, vector itself does not store its content. It manages a chunk of memory, and stores the content there. This has a number of bad consequences:
For a matrix of dimension X·Y·Z, you will end up allocating 1 + X + X·Y memory chunks. That's horribly slow, and will trash the heap. Imagine: a cube matrix of size 20 would trigger 421 calls to new!
To access a cell, you have 3 levels of indirection:
You must access the vector<vector<vector<T>>> object to get pointer to top-level memory chunk.
You must then access the vector<vector<T>> object to get pointer to second-level memory chunk.
You must then access the vector<T> object to get pointer to the leaf memory chunk.
Only then you can access the T data.
Those memory chunks will be spread around the heap, causing a lot of cache misses and slowing the overall computation.
Should you get it wrong at some point, it is possible to end up with some lines in your matrix having different lengths. After all, they're independent 1-d arrays.
Having a contiguous memory block (like new T[X * Y * Z]) on the other hand gives:
You allocate 1 memory chunk. No heap trashing, O(1).
You only need to access the pointer to the memory chunk, then can go straight for desired element.
All matrix is contiguous in memory, which is cache-friendly.
Those days, a single cache miss means dozens or hundreds lost computing cycles, do not underestimate the cache-friendliness aspect.
By the way, there is a probably better way you didn't mention: using one of the numerous matrix libraries that will handle this for you automatically and provide nice support tools (like SSE-accelerated matrix operations). One such library is Eigen, but there are plenty others.
→ You want to do scientific computing? Let a lib handle the boilerplate and the basics so you can focus on the scientific computing part.
In my point of view, there are too much advantages std::vector's have over normal plain arrays.
In short here are some:
It is much harder to create memory leaks with std::vector. This point alone is one of the biggest advantages. This has nothing to do with performance, but should be considered all the time.
std::vector is part of the STL. This part of C++ is one of the most used one. Thousands of people use the STL and so they get "tested" every day. Over the last years they got optimized so radically, they don't lack any performance anymore. (pls correct me if i see this wrong)
Handling with std::vector is easy as 1, 2, 3. No pointer handling no nothing... Just accessing it via methods or with []-operator and more other methods.
First of all, the idea that you access (i,j,k) in your vec^3 directly is somewhat flawed. What you have is a structure of pointers where you need to dereference three pointers along the way. Note that I have no idea whether that is faster or slower than computing the position within a one-dimensional array, though. You'd need to test that and it might depend on the size of your data (especially whether it fits in a chunk).
Second, the vector^3 requires pointers and vector sizes, which require more memory. In many cases, this will be irrelevant (as the image grows cubically but the memory difference only quadratically) but if your algoritm is really going to fill out any memory available, that can matter.
Third, the raw array stores everything in consecutive memory, which is good for streaming and can be good for certain algorithms because of quick cache accesses. For example when you add one 3D image to another.
Note that all of this is about hyper-optimization that you might not need. The advantages of vectors that skratchi.at pointed out in his answer are quite strong, and I add the advantage that vectors usually increase readability. If you do not have very good reasons not to use vectors, then use them.
If you should decide for the raw array, in any case, make sure that you wrap it well and keep the class small and simple, in order to counter problems regarding leaks and such.
Welcome to SO.
If everything what you have are the two alternatives, then the first one could be better.
Prefer using STL array or vector instead of a C array
You should avoid to use C++ plain arrays since you need to manage yourself the memory allocating/deallocating with new/delete and other boilerplate code like keep track of the size/check bounds. In clearly words "C arrays are less safe, and have no advantages over array and vector."
However, there are some important drawbacks in the first alternative. Something I would like to highlight is that:
std::vector<std::vector<std::vector<T>>>
is not a 3-d matrix. In a matrix, all the rows must have the same size. On the other hand, in a "vector of vectors" there is no guarantee that all the nested vectors have the same length. The reason is that a vector is a linear 1-D structure as pointed out in the #spectras answer. Hence, to avoid all sort of bad or unexpected behaviours, you must to include guards in your code to obtain the rectangular invariant guaranteed.
Luckily, the first alternative is not the only one you may have in hands.
For example, you can replace the c-style array by a std::array:
const int n = i_size * j_size * k_size;
std::array<int, n> myFlattenMatrix;
or use std::vector in case your matrix dimensions can change.
Accessing element by its 3 coordinates
Regarding your question
To access the specific location of (i,j,k) easily, operator
overloading is necessary(am I right?).
Not exactly. Since there isn't a 3-parameter operator for neither std::vector nor array, you can't overload it. But you can create a template class or function to wrap it for you. In any case you will must to deference the 3 vectors or calculate the flatten index of the element in the linear storage.
Considering do not use a third part matrix library like Eigen for your experiments
You aren't coding it for production but for research purposes instead. Particularly, your research is exactly regarding the performance of algorithms. In that case, I prefer do not recommend to use a third part library, like Eigen, absolutely. Of course it depends a lot of what kind of "speed of an algorithm" metrics are you willing to gather, but Eigen, for instance, will do a lot of things under the hood (like vectorization) which will have a tremendous influence on your experiments. Since it will be hard for you to control those unseen optimizations, these library's features may lead you to wrong conclusions about your algorithms.
Algorithm's performance and big-o notation
Usually, the performance of algorithms are analysed by using the big-O approach where factors like the actual time spent, hardware speed or programming language traits aren't taken in account. The book "Data Structures and Algorithms in C++" by Adam Drozdek can provide more details about it.

C++ Eigen Matrix Operations vs. Memory Allocation Performance

I have an algorithm that requires the construction of an NxN matrix inside a function that will return the product of this matrix with an Nx1 vector that's also built on the fly.
(N is usually 8 or 9, but must be generalized for values greater than that).
I'm using the Eigen library for performing algebraic operations that are even more complex (least squares and several other constrained problems), so switching it isn't an option.
I've benchmarked the functions, and there's a huge bottleneck due to the intensive memory
allocations. I aim to build a thread safe application, so, for some cases, I replaced these matrices and vectors with references to elements from a global vector that serves as a provider for objects that cannot be stored on the stack. This avoids calling the constructors/destructors of the Eigen matrices and vectors, but it's not an elegant solution and it can lead to huge problems if considerable care is not taken.
As such, does Eigen either offer a workaround because I don't see the option of passing an allocator as a template argument for these objects, OR is there a more obvious thing to do?
You can manage your own memory in a way that fits your needs and use Eigen::Map instead of Eigen::Matrix to perform calculations with it. Just make sure the data is aligned properly or notify Eigen if it isn't.
See the reference Eigen::Map for details.
Here is short example:
#include <iostream>
#include <Eigen/Core>
int main() {
int mydata[3 * 4]; // Manage your own memory as you see fit
int* data_ptr = mydata;
Eigen::Map<Eigen::MatrixXi, Eigen::Unaligned> mymatrix(data_ptr, 3, 4);
// use mymatrix like you would any another matrix
mymatrix = Eigen::MatrixXi::Zero(3, 4);
std::cout << mymatrix << '\n';
// This line will trigger a failed assertion in debug mode
// To change it see
// http://eigen.tuxfamily.org/dox-devel/TopicAssertions.html
mymatrix = Eigen::MatrixXi::Ones(3, 6);
std::cout << mymatrix << '\n';
}
To gather my comments into a full idea. Here is how I would try to do it.
Because the memory allocation in eigen is a pretty advanced stuff IMO and they do not expose much places to tap into it. The best bet is to wrap eigen objects itself into some kind of resource manager, like OP did.
I would make it a simple bin, that hold Matrix< Scalar, Dynamic, Dynamic> objects. This way you template the Scalar type and have a manager for generalized size matrices.
Whenever you call for an object, you check if you have a free object of the desired size, you return reference to it. If not, you allocate a new one. Simple. when you want to release the object, then you mark it free in the resource manager. I don't think anything more complicated is needed, but of course it's possible to implement some more sophisticated logic.
To ensure thread safety I would put a lock in the manager. Initialize it in the constructor if needed. Of course locking on free and allocate would be needed.
However depending on the work schedule. If the threads work on their own arrays I would consider to make one resource manager instance for each thread, so they don't clock each other. The thing is, that a global lock or a global manager would possibly get exhausted if you have like 12 cores working heavy on allocations/deallocations, and effectively serialize your app thourgh this one lock.
You can try replacing your default memory allocator with jemalloc or tcmalloc. It's pretty easy to try out thanks to the LD_PRELOAD mechanism.
https://github.com/jemalloc/jemalloc/wiki/Getting-Started
http://goog-perftools.sourceforge.net/doc/tcmalloc.html
C++ memory allocation mechanism performance comparison (tcmalloc vs. jemalloc)
what're the differences between tcmalloc/jemalloc and memory pool
I think it works for most C++ projects as well.
You could allocate memory for some common matrix sizes before calling that function with operator new or operator new[], store the void* pointers somewhere and let the function itself retrieve an memory block with the right size. After that, you can use placement new for matrix construction. Details are given in More effective C++, item 8.

Storing images of different sizes in a data structure, OpenCV

I am wondering if there is any way to hold (or store) images of different sizes in a single data structure using OpenCV (C++). For example, in MATLAB I can do it by using "cell".
Specifically, I am generating my results which are images of different sizes and it would be grate for me if I can store them in a single data structure. So that, I can use it late on.
Please note, this has to be done with C++ and OpenCV.
I am thinking to give a try with: std::vector. Thanks a lot.
Yeah you can try this
std::vector<cv::Mat> ImageDataBase;
for(int i=0;i<length_of_imageDataBase;i++)
{
cv::Mat img = cv::imread("Address of the images");
ImageDataBase.pushback(img);
}
I think the problem lays in the way You think about objects in c++ generally. Matlab requires objects to be of the same size in one vector/array/matrix/however it should be called, because it is optimised to operate on matrices, and those operations are very dependent on dimensions of a matrix.
In c++ the main entity is an object. The most similar thing to matlab vector is an array, like cv::Mat potatoes[30]. Yet, even this demands only to be filled with objects of the same class, disregarding the size of those cv::Mat contents.
So, to wrap it all up, You have a couple of choices:
an array, like cv::Mat crazySocks[42] - You need to be carefull here, because You need to know how many socks there will be, and You might a segmentation error if You go out of array bounds
a vector, as suggested by Vinoj John Hosan, like std::vector<cv::Mat> jaguars - this is a fine idea, because stl containers can do some nice tricks with their content, and You may easily modify size of the vector.
a list, like std::list<cv::Mat> toFind - this is better than vector if You plan to modify the size of Your container often.
any of previously mentioned, but with pointers, like cv::Mat *crazyPointers[33] - when You have some big objects to move, it's better to move only informations about where they are, than the object.cv::Mat does some tricks internally with it's data, so it shouldn't be the case.

How to create an array with size more than C++ limits

I have a little problem here, i write c++ code to create an array but when i want to set array size to 100,000,000 or more i got an error.
this is my code:
int i=0;
double *a = new double[n*n];
this part is so important for my project.
When you think you need an array of 100,000,000 elements, what you actually need is a different data structure that you probably have never heard of before. Maybe a hash map, or maybe a sparse matrix.
If you tell us more about the actual problem you are trying to solve, we can provide better help.
In general, the only reason that would fail would be due to lack of memory/memory fragmentation/available address space. That is, trying to allocate 800MB of memory. Granted, I have no idea why your system's virtual memory can't handle that, but maybe you allocated a bunch of other stuff. It doesn't matter.
Your alternatives are to tricks like memory-mapped files, sparse arrays, and so forth instead of an explicit C-style array.
If you do not have sufficient memory, you may need to use a file to store your data and process it in smaller chunks.
Don't know if IMSL provides what you are looking for, however, if you want to work on smaller chunks you might devise an algorithm that can call IMSL functions with these small chunks and later merge the results. For example, you can do matrix multiplication by combining multiplication of sub-matrices.