Efficiant multidimensional data storage in C++

Efficiant multidimensional data storage in C++ - c++

I'm trying to write a C++ program that needs to store and adjust data in a 3D array. The size is given by the user and doesn't change throughout the run, and I don't need to perform any complicated matrix operations on it. I just need it to be optimized to set and get from given 3D coordinates (I do quite some iterations over all the members, and it's a big array). What's the best way to go about defining that array? Vector of vector of vector? Arrays of vectors? CvMat/IplImage with multi channels? Should I even keep it as 3D or just turn it into one very long interleaved vector and calculate indexes accordingly?
Thanks!

I would go with your last option, a single large array with transformed indices. If all you want to do is read and write known indices, this is probably the most efficient structure, both in terms of storage and speed. You can also wrap this in a class and overload operator () to make it easy to access 3D coordinates, for eg. you could write a(1,2,3) = 10; and the overloaded operator could take care transforming the 3D coordinates into a linear index. Iterating over such an array would also be quite simple since there's only one dimension.

It depends on what you mean by efficient, but have you looked at KD Trees?

Related

Advantages of a bit matrix over a bitmap

I want to create a simple representation of an environment that basically just represents if at a certain position is an object or is not.
I would thus only need a big matrix filled with 1's and 0'. It is important to work effectively on this matrix, since I am going to have random positioned get and set operations on it, but also iterate over the whole matrix.
What would be the best solution for this?
My approach would be to create a vector of vectors containing bit elements. Otherwise, would there be an advantage of using a bitmap?

Note that while std::vector<bool> may consume less memory it is also slower than std::vector<char> (depending on the use case), because of all the bitwise operations. As with any optimization questions, there is only one answer: try different solutions and profile properly.

Traverse of multidimensional Array in any axis

I have a (kind of) performance problem in my code, that roots in the chosen architecture.
I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data.
Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.
Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.
I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.
I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.
I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:
My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.
Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)
Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.
Long story short:
Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.
I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.

Eigen has multi-dimensional tensor support (nominally unsupported, but written by the DeepMind people, so "somewhat" supported?), and FFTW has 1d to 3d FFTs. Using external libraries with a set of 1D to 3D FFTs would outsource most of the hard work.
Edit: Actually, FFTW has support for threaded n-dimensional FFTs

Representation of a symmetric diagonal matrix

Lets assume we have a huge symmetric diagonal matrix. What is the efficient way to implement this?
The only way that i could think of is that by using the symmetric property where Xij = Xji, we can reduce the size of this matrix by half. But then representing this matrix using a 2D array would be inefficient, since we cant reduce the matrix size by using arrays.
Another thing representing this matrix using adjacency list also would be inefficient, because relating this matrix to a graph. It would be a density graph. And the operation of adj list takes lots of time such as removing, inserting and searching.
But what about using heaps?

There is no one answer until you decide what you are going to do with this matrix (or maybe matrices?).
If you are just going to store and remember it, then just store it sequentially, leaving out the redundant entries. (Your code knows how to access it, because that is all it does, right?)
More probably, you want to do normal matrix operations on it. In that case, are you trying to make the storage efficient, or the execution? In the later case, I don't see many opportunities based on it being symmetric--the multiplies are the expensive thing and you probably still need all of those. If it is the storage, then are you limiting yourself to operations that only take symmetric in and symmetric out? Sounds awfully specific. If so, then you only need to do the calculations for the part you are storing, because, by definition the other entries are symmetric, so just write your code to generate that part of the matrix and you are done.

Making an Eigen::Vector look like a vector of points

I want to represent a 2D shape in such a way that it can be interacted with as if it were a vector of points, in particular I want to be able to call operator[] and at() on it and return references to things that act like 2D points. Currently I just use a class whose only member variable is a vector of points and that has various arithmetic and geometric operations defined pointwise on its elements.
However, in other parts of my code I need to treat a vector of n points as an element of 2n dimensional space and perform basic linear algebra on it (e.g. projecting the vector onto a given subspace of R^2n). Currently I'm creating an Eigen::VectorXd object every time I want to do this, and then converting back after performing these operations. I don't want to do this, as I make the conversion often enough that all the copying is a noticeable source of inefficiency.
If I was storing the data as a flat array of doubles/floats/ints, I could cast a pointer to its nth element to a pointer to a Point (whose members would just be a pair of doubles/floats/ints). However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Is there a sensible way of solving this? I could just use Eigen::Vectors everywhere, but I really want most of the code to be able to pretend that it is dealing with a set of points.

However, as I don't know the internal representation that Eigen uses for vectors (and it may well change), this isn't possible.
Eigen offers the Map classes that allow mapping plain arrays to Eigen structures. For example:
double numbers[2];
Eigen::Vector2f::Map( numbers ).dot( Eigen::Vector2f::Constant(1) );

Storing Matrix information in STL vector. Which is better vector or vector of vectors?

I've created my own Matrix class were inside the class the information regarding the Matrix is stored in a STL vector. I've notice that while searching the web some people work with a vector of vectors to represent the Matrix information. My best guess tells me that so long as the matrix is small or skinny (row_num >> column_num) the different should be small, but what about if the matrix is square or fat (row_num << column_num)? If I were to create a very large matrix would I see a difference a run time? Are there other factors that need to be considered?
Thanks

Have you considered using an off-the-shelf matrix representation such as boost's instead of reinventing the wheel?
If you have a lot of empty rows for example, using the nested representation could save a lot of space. Unless you have specific information in actual use cases showing one way isn't meeting your requirements, code the way that's easiest to maintain and implement properly.

There are too many variables to answer your question.
Create an abstraction so that your code does not care how the matrix is represented. Then write your code using any implementation. Then profile it.
If your matrix is dense, the "vector of vectors" is very unlikely to be faster than a single big memory block and could be slower. (Chasing two pointers for random access + worse locality.)
If your matrices are large and sparse, the right answer to your question is probably "neither".
So create an abstract interface, code something up, and profile it. (And as #Mark says, there are lots of third-party libraries you should probably consider.)

If you store everything in a single vector, an iterator will traverse the entire matrix. If you use a vector of vectors, an iterator will only traverse a single dimension.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js