Lets assume we have a huge symmetric diagonal matrix. What is the efficient way to implement this?
The only way that i could think of is that by using the symmetric property where Xij = Xji, we can reduce the size of this matrix by half. But then representing this matrix using a 2D array would be inefficient, since we cant reduce the matrix size by using arrays.
Another thing representing this matrix using adjacency list also would be inefficient, because relating this matrix to a graph. It would be a density graph. And the operation of adj list takes lots of time such as removing, inserting and searching.
But what about using heaps?
There is no one answer until you decide what you are going to do with this matrix (or maybe matrices?).
If you are just going to store and remember it, then just store it sequentially, leaving out the redundant entries. (Your code knows how to access it, because that is all it does, right?)
More probably, you want to do normal matrix operations on it. In that case, are you trying to make the storage efficient, or the execution? In the later case, I don't see many opportunities based on it being symmetric--the multiplies are the expensive thing and you probably still need all of those. If it is the storage, then are you limiting yourself to operations that only take symmetric in and symmetric out? Sounds awfully specific. If so, then you only need to do the calculations for the part you are storing, because, by definition the other entries are symmetric, so just write your code to generate that part of the matrix and you are done.
Related
So in my nonlinear finite element solver i use Eigen3 sparse matrices and the LDLT factorization.
The thing is, this factorization needs to be performed many times during a dynamic simulation, and a lot of time is spent inserting the coefficients in the iteration matrix based on triplets (storage is reserved).
Is there any good strategies on how to utilize the fact that the sparsity is unchanged and the order of insertions are the same? When forming this matrix, looping over the elements, coupligs etc. in the model, the order of insertion are the same at every time step during the simulation.
Using coeffref increased simulation time with about 10x.
I've been thinking of making a single pass of the model and forming pointers directly to the respective location in the coefficient matrix, but this seems a bit dangerous, especially since the LDLT factorisation is run in between.
If the sparsity pattern of your matrix is not changing each time step, then you can directly change the values of the raw data array with valuePtr(). This is extremely simple and can be done in parallel if needed. If you can figure out how to do this in a linear fashion, i.e.
SparseMatrix<double> A;
for(int i = 0; i < n; i++)
A.valuePtr()[i] = ...
then it will stupid fast (something to do with avoiding cache misses and other black magic). As for the previous comment that the LDLT factorization will not change, that is true from a theoretical standpoint. However, according to the Eigen documentation:
"In factorize(), the factors of the coefficient matrix are computed. This step should be called each time the values of the matrix change. However, the structural pattern of the matrix should not change between multiple calls."
https://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html
I think this is because the factors are stored within the solver object, though I could be wrong. A test should be pretty easy to confirm one way or another. That said, I think you have to call factorize() after you change the values. Still though, you can save considerable time by only calling the analyzePattern() routine once.
I want to create a simple representation of an environment that basically just represents if at a certain position is an object or is not.
I would thus only need a big matrix filled with 1's and 0'. It is important to work effectively on this matrix, since I am going to have random positioned get and set operations on it, but also iterate over the whole matrix.
What would be the best solution for this?
My approach would be to create a vector of vectors containing bit elements. Otherwise, would there be an advantage of using a bitmap?
Note that while std::vector<bool> may consume less memory it is also slower than std::vector<char> (depending on the use case), because of all the bitwise operations. As with any optimization questions, there is only one answer: try different solutions and profile properly.
I've created my own Matrix class were inside the class the information regarding the Matrix is stored in a STL vector. I've notice that while searching the web some people work with a vector of vectors to represent the Matrix information. My best guess tells me that so long as the matrix is small or skinny (row_num >> column_num) the different should be small, but what about if the matrix is square or fat (row_num << column_num)? If I were to create a very large matrix would I see a difference a run time? Are there other factors that need to be considered?
Thanks
Have you considered using an off-the-shelf matrix representation such as boost's instead of reinventing the wheel?
If you have a lot of empty rows for example, using the nested representation could save a lot of space. Unless you have specific information in actual use cases showing one way isn't meeting your requirements, code the way that's easiest to maintain and implement properly.
There are too many variables to answer your question.
Create an abstraction so that your code does not care how the matrix is represented. Then write your code using any implementation. Then profile it.
If your matrix is dense, the "vector of vectors" is very unlikely to be faster than a single big memory block and could be slower. (Chasing two pointers for random access + worse locality.)
If your matrices are large and sparse, the right answer to your question is probably "neither".
So create an abstract interface, code something up, and profile it. (And as #Mark says, there are lots of third-party libraries you should probably consider.)
If you store everything in a single vector, an iterator will traverse the entire matrix. If you use a vector of vectors, an iterator will only traverse a single dimension.
I'm trying to write a C++ program that needs to store and adjust data in a 3D array. The size is given by the user and doesn't change throughout the run, and I don't need to perform any complicated matrix operations on it. I just need it to be optimized to set and get from given 3D coordinates (I do quite some iterations over all the members, and it's a big array). What's the best way to go about defining that array? Vector of vector of vector? Arrays of vectors? CvMat/IplImage with multi channels? Should I even keep it as 3D or just turn it into one very long interleaved vector and calculate indexes accordingly?
Thanks!
I would go with your last option, a single large array with transformed indices. If all you want to do is read and write known indices, this is probably the most efficient structure, both in terms of storage and speed. You can also wrap this in a class and overload operator () to make it easy to access 3D coordinates, for eg. you could write a(1,2,3) = 10; and the overloaded operator could take care transforming the 3D coordinates into a linear index. Iterating over such an array would also be quite simple since there's only one dimension.
It depends on what you mean by efficient, but have you looked at KD Trees?
The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence.
The way I'm doing it now, using a matrix library (MTL4):
For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option.
Using an array of pointers instead:
For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place?
The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations.
So the performance question boils down to this, as I see it:
- The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive.
- The array of pointers does not need the extra memory for matrix A containing values.
- The array of pointers option does not have to copy all NxN values of A between each iteration.
- The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc.
Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?
Test, test, test.
Especially in the field of Numerical Linear Algebra. There are many effects in play, which is why there is a number of optimized libraries that have solved that burden for you.
Some effects to consider:
Memory locality and cache effects
Multithreading effects (some algorithms that are optimal while running single-core, cause memory collision/cache eviction when more than one core is utilized).
There is no substitute for testing.
Here are some comments:
Is the function you use for the inversion capable of handling a matrix of pointers instead of values? If it does not realise it has to do an indirection, all kinds of strange effects could happen.
When doing an in-place matrix inversion (meaning the inverted matrix overwrites the input matrix), all input coefficients will get overwritten with new values, because matrix inversion can not be done by re-ordering the elements of the matrix.
During the inversion process, none of the input coefficients may be changed by an outside process. All such updates have to be performed between iterations.
So, you get the following set of trade-offs when you chose the pointer solution:
The coefficients making up matrix A can no longer be calculated asynchronously with the matrix inversion.
Either all coefficients must be recalculated for each iteration (when you use in-place inversion, meaning the inverted matrix uses the same memory as the input matrix), or you still have to use a matrix of N x N values to hold the result of the inversion.
You're getting good answers here. The only thing I would add is some general experience with performance.
You are thinking about performance a-priori. That's reasonable, but the real payoff is a-posteriori. In other words, you don't know for certain where the real optimization opportunities are, until the running code tells you.
You don't know if the bulk of the time will be spent in matrix inversion, multiplication, copying the matrix, dereferencing, or what. People can guess. If I had to guess, it would be matrix inversion, because it's 100x100.
However, something else I can't guess might be even bigger.
Guessing has a very poor track record, especially when you can just find out.
Here's an example of what I mean.