What are the Issues with a vector-of-vectors? - c++

I've read that a vector-of-vectors is bad given a fixed 2nd dimension, but I cannot find a clear explanation of the problems on http://www.stackoverflow.com.
Could someone give an explanation of why using 2D indexing on a single vector is preferable to using a vector-of-vectors for a fixed 2nd dimension?
Also, I'm assuming that a vector-of-vectors is the preferable data structure for a 2D array with a variable 2nd dimension? If there is any proof to the contrary I'd love to see that.

For a std::vector the underlying array is dynamically allocated from the heap. If you have e.g. std::vector<std::vector<double>>, then your outer vector would look like
{v1, v2, v3, v4, ... vn}
This looks like each of the inner vectors will be in contiguous memory, and they will, but their underlying arrays will not be contiguous. See the diagram of the memory layout in this post. In other words the you cannot say that
&(v1.back()) + 1 == &(v2.front()) // not necessarily true!
Instead if you used a single vector with striding then you would gain data locality, and it would inherently be more cache friendly as all your data is contiguous.
For the sake of completeness, I would use neither of these methods if your matrix was sparse, as there are more elegant and efficient storage schemes than straight 1D or 2D arrays. Though since you mentioned you have a "fixed 2nd dimension" I will assume that is not the case here.

I shall answer with a simple analogy.
What is "better" in general out of the two things?
A telephone book where each entry is a code referring to a different book that you have to find and read to discover someone's telephone number
A telephone book that lists people's telephone numbers
Keeping all your data in a single big blob is more simple, more sensible, and easier on your computer's cache. A vector with N vectors inside it is much more operationally complex (remember, each of those requires a dynamic allocation and size management operations!); one vector is, well, one vector. You haven't multiplied the workload by N.
The only downside really is that to simulate 2D array access with a 1D underlying data store, you need to write a facade. Fortunately, this is very easy.
Now for the subjective part: on balance I'd say that it's worthwhile unless you're really in a rush and your code quality doesn't particularly matter.

Using a vector of vectors:
Is inefficient in terms of memory allocation, due to multiple blocks being allocated.
Models a jagged right hand edge, so bugs can creep in.
Using a single vector is, in general, better as the memory management is simpler. But you can encounter problems if your matrix is large as it can be difficult to acquire a large contiguous block.
If your array is resizeable, then I'd still stick to a single vector: the resize complexity can be isolated in a single function that you can optimise.
The best solution of all is, of course, to use something like the linear algebra library (BLAS), available in Boost. That also handles large sparse matrices beautifully.

Related

What are the Disadvantages of Nested Vectors?

I'm still fairly new to C++ and have a lot left to learn, but something that I've become quite attached to recently is using nested (multidimensional) vectors. So I may typically end up with something like this:
std::vector<std::vector<std::string> > table;
Which I can then easily access elements of like this:
std::string data = table[3][5];
However, recently I've been getting the impression that it's better (in terms of performance) to have a single-dimensional vector and then just use "index arithmetic" to access elements correspondingly. I assume this performance impact is significant for much larger or higher dimensional vectors, but I honestly have no idea and haven't been able to find much information about it so far.
While, intuitively, it kind of makes sense that a single vector would have better performance than a a higher dimensional one, I honestly don't understand the actual reasons why. Furthermore, if I were to just use single-dimensional vectors, I would lose the intuitive syntax I have for accessing elements of multidimensional ones. So here are my questions:
Why are multidimensional vectors inefficient? If I were to only use a single-dimensional vector instead (to represent data in higher dimensions), what would be the best, most intuitive way to access its elements?
It depends on the exact conditions. I'll talk about the case, when the nested version is a true 2D table (i.e., all rows have equal length).
A 1D vector usually will be faster on every usage patterns. Or, at least, it won't be slower than the nested version.
Nested version can be considered worse, because:
it needs to allocate number-of-rows times, instead of one.
accessing an element takes an additional indirection, so it is slower (additional indirection is usually slower than the multiply needed in the 1D case)
if you process your data sequentially, then it could be much slower, if the 2D data is scattered around the memory. It is because there could be a lot of cache misses, depending how the memory allocator returns memory areas of different rows.
So, if you go for performance, I'd recommend you to create a 2D-wrapper class for 1D vector. This way, you could get as simple API as the nested version, and you'll get the best performance too. And even, if for some cause, you decide to use the nested version instead, you can just change the internal implementation of this wrapper class.
The most intuitive way to access 1D elements is y*width+x. But, if you know your access patterns, you can choose a different one. For example, in a painting program, a tile based indexing could be better for storing and manipulating the image. Here, data can be indexed like this:
int tileMask = (1<<tileSizeL)-1; // tileSizeL is log of tileSize
int tileX = x>>tileSizeL;
int tileY = y>>tileSizeL;
int tileIndex = tileY*numberOfTilesInARow + tileX;
int index = (tileIndex<<(tileSizeL*2)) + ((y&tileMask)<<tileSizeL) + (x&tileMask);
This method has a better spatial locality in memory (pixels near to each other tend to have a near memory address). Index calculation is slower than a simple y*width+x, but this method could have much less cache misses, so in the end, it could be faster.

2 d array, 2 d vector, which is more efficient?

I remember c++ primer said that 2d vector is very inefficient and should be avoided.
but 2d array seems to be rather inconvenient in terms of both creating and deleting.
is there any other way to do it? or 2d vector still competable against 2d array?
I doubt that it matters. There are other factors that will matter more.
I would question the starting premise:
2d vector is inefficient
Sometimes we trade off pure speed for better abstraction. I'll bet the std::string class can be considered inefficient by some measures when compared to raw byte or character array, but I'd still use it.
You'll have a better case if you stop worrying about broad statements and focus on your use case.
The most common application of 2D arrays I know of is for vectors, matricies, and linear algebra. There are other factors for that problem that will be far more important than the choice of underlying data structure.
Since C++ is an object-oriented language, you can solve this easily by starting with an interface and creating implementations that use vector and array. Test them against a meaningful data set and measure.

Concatenate 2 STL vectors in constant O(1) time

I'll give some context as to why I'm trying to do this, but ultimately the context can be ignored as it is largely a classic Computer Science and C++ problem (which must surely have been asked before, but a couple of cursory searches didn't turn up anything...)
I'm working with (large) real time streaming point clouds, and have a case where I need to take 2/3/4 point clouds from multiple sensors and stick them together to create one big point cloud. I am in a situation where I do actually need all the data in one structure, whereas normally when people are just visualising point clouds they can get away with feeding them into the viewer separately.
I'm using Point Cloud Library 1.6, and on closer inspection its PointCloud class (under <pcl/point_cloud.h> if you're interested) stores all data points in an STL vector.
Now we're back in vanilla CS land...
PointCloud has a += operator for adding the contents of one point cloud to another. So far so good. But this method is pretty inefficient - if I understand it correctly, it 1) resizes the target vector, then 2) runs through all Points in the other vector, and copies them over.
This looks to me like a case of O(n) time complexity, which normally might not be too bad, but is bad news when dealing with at least 300K points per cloud in real time.
The vectors don't need to be sorted or analysed, they just need to be 'stuck together' at the memory level, so the program knows that once it hits the end of the first vector it just has to jump to the start location of the second one. In other words, I'm looking for an O(1) vector merging method. Is there any way to do this in the STL? Or is it more the domain of something like std::list#splice?
Note: This class is a pretty fundamental part of PCL, so 'non-invasive surgery' is preferable. If changes need to be made to the class itself (e.g. changing from vector to list, or reserving memory), they have to be considered in terms of the knock on effects on the rest of PCL, which could be far reaching.
Update: I have filed an issue over at PCL's GitHub repo to get a discussion going with the library authors about the suggestions below. Once there's some kind of resolution on which approach to go with, I'll accept the relevant suggestion(s) as answers.
A vector is not a list, it represents a sequence, but with the additional requirement that elements must be stored in contiguous memory. You cannot just bundle two vectors (whose buffers won't be contiguous) into a single vector without moving objects around.
This problem has been solved many times before such as with String Rope classes.
The basic approach is to make a new container type that stores pointers to point clouds. This is like a std::deque except that yours will have chunks of variable size. Unless your clouds chunk into standard sizes?
With this new container your iterators start in the first chunk, proceed to the end then move into the next chunk. Doing random access in such a container with variable sized chunks requires a binary search. In fact, such a data structure could be written as a distorted form of B+ tree.
There is no vector equivalent of splice - there can't be, specifically because of the memory layout requirements, which are probably the reason it was selected in the first place.
There's also no constant-time way to concatenate vectors.
I can think of one (fragile) way to concatenate raw arrays in constant time, but it depends on them being aligned on page boundaries at both the beginning and the end, and then re-mapping them to be adjacent. This is going to be pretty hard to generalise.
There's another way to make something that looks like a concatenated vector, and that's with a wrapper container which works like a deque, and provides a unified iterator and operator[] over them. I don't know if the point cloud library is flexible enough to work with this, though. (Jamin's suggestion is essentially to use something like this instead of the vector, and Zan's is roughly what I had in mind).
No, you can't concatenate two vectors by a simple link, you actually have to copy them.
However! If you implement move-semantics in your element type, you'd probably get significant speed gains, depending on what your element contains. This won't help if your elements don't contain any non-trivial types.
Further, if you have your vector reserve way in advance the memory needed, then that'd also help speed things up by not requiring a resize (which would cause an undesired huge new allocation, possibly having to defragment at that memory size, and then a huge memcpy).
Barring that, you might want to create some kind of mix between linked-lists and vectors, with each 'element' of the list being a vector with 10k elements, so you only need to jump list links once every 10k elements, but it allows you to dynamically grow much easier, and make your concatenation breeze.
std::list<std::vector<element>> forIllustrationOnly; //Just roll your own custom type.
index = 52403;
listIndex = index % 1000
vectorIndex = index / 1000
forIllustrationOnly[listIndex][vectorIndex] = still fairly fast lookups
forIllustrationOnly[listIndex].push_back(vector-of-points) = much faster appending and removing of blocks of points.
You will not get this scaling behaviour with a vector, because with a vector, you do not get around the copying. And you can not copy an arbitrary amount of data in fixed time.
I do not know PointCloud, but if you can use other list types, e.g. a linked list, this behaviour is well possible. You might find a linked list implementation which works in your environment, and which can simply stick the second list to the end of the first list, as you imagined.
Take a look at Boost range joint at http://www.boost.org/doc/libs/1_54_0/libs/range/doc/html/range/reference/utilities/join.html
This will take 2 ranges and join them. Say you have vector1 and vector 2.
You should be able to write
auto combined = join(vector1,vector2).
Then you can use combined with algorithms, etc as needed.
No O(1) copy for vector, ever, but, you should check:
Is the element type trivially copyable? (aka memcpy)
Iff, is my vector implementation leveraging this fact, or is it stupidly looping over all 300k elements executing a trivial assignment (or worse, copy-ctor-call) for each element?
What I have seen is that, while both memcpyas well as an assignment-for-loop have O(n) complexity, a solution leveraging memcpy can be much, much faster.
So, the problem might be that the vector implementation is suboptimal for trivial types.

Storing Matrix information in STL vector. Which is better vector or vector of vectors?

I've created my own Matrix class were inside the class the information regarding the Matrix is stored in a STL vector. I've notice that while searching the web some people work with a vector of vectors to represent the Matrix information. My best guess tells me that so long as the matrix is small or skinny (row_num >> column_num) the different should be small, but what about if the matrix is square or fat (row_num << column_num)? If I were to create a very large matrix would I see a difference a run time? Are there other factors that need to be considered?
Thanks
Have you considered using an off-the-shelf matrix representation such as boost's instead of reinventing the wheel?
If you have a lot of empty rows for example, using the nested representation could save a lot of space. Unless you have specific information in actual use cases showing one way isn't meeting your requirements, code the way that's easiest to maintain and implement properly.
There are too many variables to answer your question.
Create an abstraction so that your code does not care how the matrix is represented. Then write your code using any implementation. Then profile it.
If your matrix is dense, the "vector of vectors" is very unlikely to be faster than a single big memory block and could be slower. (Chasing two pointers for random access + worse locality.)
If your matrices are large and sparse, the right answer to your question is probably "neither".
So create an abstract interface, code something up, and profile it. (And as #Mark says, there are lots of third-party libraries you should probably consider.)
If you store everything in a single vector, an iterator will traverse the entire matrix. If you use a vector of vectors, an iterator will only traverse a single dimension.

Choice of the most performant container (array)

This is my little big question about containers, in particular, arrays.
I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[], etc.).
There are a few restrictions on how this set is used:
its size is read from a configuration file and won't change during execution
it can be viewed as a big two dimensional array of N (e.g. 1 000 000) lines and 6 columns (each one storing the coordinate in one dimension)
the array is manipulated in a big loop, each "particle / line" is accessed and computation takes place with its coordinates, and the results are stored back for this particle, and so on for each particle, and so on for each iteration of the big loop.
no new elements are added or deleted during the execution
First conclusion, as the access on the elements is essentially done by accessing each element one by one with [], I think that I should use a normal dynamic array.
I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.
As I understand there is no advantage to use a dynamically allocated array instead of a std::vector, so things like double** array2d = new ..., loop of new, etc are ruled out.
So is it a good idea to use std::vector<double> ?
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > my_array that can be indexed like my_array[i][j], or is it a bad idea and it would be better to use std::vector<double> other_array and acces it with other_array[6*i+j].
Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
Another option, the one that I am using so far is to use Blitz, in particular blitz::Array:
typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;
Where my elements are accessed like that: my_store(line, column);.
I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.
Do you think that Blitz is OK, or is it useless in my case ?
These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.
Thanks a lot for your help on this problem !
Edit:
From the very interesting answers and comments bellow a good solution seems to be the following:
Use a structure particle (containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)
Use a vector or a deque of this particle structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.
In addition I can also use a Blitz::TinyVector<double,6> instead of a structure.
So is it a good idea to use std::vector<double> ?
Usually, a std::vector should be the first choice of container. You could use either std::vector<>::reserve() or std::vector<>::resize() to avoid reallocations while populating the vector. Whether any other container is better can be found by measuring. And only by measuring. But first measure whether anything the container is involved in (populating, accessing elements) is worth optimizing at all.
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > [...]?
No. IIUC, you are accessing your data per particle, not per row. If that's the case, why not use a std::vector<particle>, where particle is a struct holding six values? And even if I understood incorrectly, you should rather write a two-dimensional wrapper around a one-dimensional container. Then align your data either in rows or columns - what ever is faster with your access patterns.
Do you think that Blitz is OK, or is it useless in my case?
I have no practical knowledge about blitz++ and the areas it is used in. But isn't blitz++ all about expression templates to unroll loop operations and optimizing away temporaries when doing matrix manipulations? ICBWT.
First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct:
struct Particle { /* coords */ };
Then we can make a simple one dimensional array of these Particles.
I would probably use a deque, because that's the default container, but you may wish to try a vector, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque will allocate several chunks.
WARNING:
As Alexandre C remarked, if you go the deque road, refrain from using operator[] and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector should prove faster.
The first rule when choosing from containers is to use std::vector. Then, only after your code is complete and you can actually measure performance, you can try other containers. But stick to vector first. (And use reserve() from the start)
Then, you shouldn't use an std::vector<std::vector<double> >. You know the size of your data: it's 6 doubles. No need for it to be dynamic. It is constant and fixed. You can define a struct to hold you particle members (the six doubles), or you can simply typedef it: typedef double particle[6]. Then, use a vector of particles: std::vector<particle>.
Furthermore, as your program uses the particle data contained in the vector sequentially, you will take advantage of the modern CPU cache read-ahead feature at its best performance.
You could go several ways. But in your case, don't declare astd::vector<std::vector<double> >. You're allocating a vector (and you copy it around) for every 6 doubles. Thats way too costly.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
(other_array[i,j] won't work too well, as i,j employs the comma operator to evaluate the value of "i", then discards that and evaluates and returns "j", so it's equivalent to other_array[i]).
You will need to use one of:
other_array[i][j]
other_array(i, j) // if other_array implements operator()(int, int),
// but std::vector<> et al don't.
other_array[i].identifier // identifier is a member variable
other_array[i].identifier() // member function getting value
other_array[i].identifier(double) // member function setting value
You may or may not prefer to put get_ and set_ or similar on the last two functions should you find them useful, but from your question I think you won't: functions are prefered in APIs between parts of large systems involving many developers, or when the data items may vary and you want the algorithms working on the data to be independent thereof.
So, a good test: if you find yourself writing code like other_array[i][3] where you've decided "3" is the double with the speed in it, and other_array[i][5] because "5" is the the acceleration, then stop doing that and give them proper identifiers so you can say other_array[i].speed and .acceleration. Then other developers can read and understand it, and you're much less likely to make accidental mistakes. On the other hand, if you are iterating over those 6 elements doing exactly the same things to each, then you probably do want Particle to hold a double[6], or to provide an operator[](int). There's no problem doing both:
struct Particle
{
double x[6];
double& speed() { return x[3]; }
double speed() const { return x[3]; }
double& acceleration() { return x[5]; }
...
};
BTW / the reason that vector<vector<double> > may be too costly is that each set of 6 doubles will be allocated on the heap, and for fast allocation and deallocation many heap implementations use fixed-size buckets, so your small request will be rounded up t the next size: that may be a significant overhead. The outside vector will also need to record a extra pointer to that memory. Further, heap allocation and deallocation is relatively slow - in you're case, you'd only be doing it at startup and shutdown, but there's no particular point in making your program slower for no reason. Even more importantly, the areas on the heap may just around in memory, so your operator[] may have cache-faults pulling in more distinct memory pages than necessary, slowing the entire program. Put another way, vectors store elements contiguously, but the pointed-to-vectors may not be contiguous.