std::vector vs std::array vs normal array - c++

I give online coding contests, where the speed of the program is everything. Over the time I have come across 3 ways to use the concept of array in C++. The questions of the contests usually require us to create a dynamic array of the given size. So its just a one time creation of the dynamic array as per the input and we don't resize the array again.
std::vector
Vectors look the most fancy and everyone tends to love them. But few days back one of the question gave me the TIME_LIMIT_EXCEEDED error when doing it with vectors.When I implemented the same logic with normal arrays, the program was submitted successfully.On researching, I found out that using the push_back() function takes a long time as compared to a normal arr[i]=x;
std::array
I don't have much knowledge about its performance. But it looks like a nicer way to handle arrays.
default arrays in C++
I do the dynamic allocation using int *arr=new int[given_size]; and then use the array normally.Passing an array as argument is not as simple as vectors but its not a big deal.
Apart from this there are also times when I have to work with 2D arrays and I am always unsure about what could be the fastest way. vector<vector<int>> is regarded slow in some forums and so is using multidimensional pointers. So I like to use a 1D array with a typedef function to handle its index but it gets complicated when I have to pass a row to a function.
Most of the answers in forums are based on what the OP was trying to do and this gives different answers. What I want to know is which is the best and long term way to use to have maximum speed/efficiency.

push_back takes a long time compared to arr[i]=x;.
Sorry but you are showing your lack of experience with vectors here, because your examples do two different things.
You are comparing something like this code
vector<int> vec; // vector created with size zero
for (...)
vec.push_back(x); // vector size increases
with this code
int arr[N];
for (...)
arr[i] = x;
The difference is that in the first case the vector has size 0 and it's size increases as you add items to it (this takes extra time), but in the second case the array starts out at it's final size. With an array this is how it must be, but with vectors you have a choice. If you know what the final size of the vector is you should code it like this
vector<int> vec(N); // vector created at size N, note use () not []
for (...)
vec[i] = x;
That is the code you should be comparing with the array code for efficiency,
You might also want to research the resize and reserve methods of a vector. Vectors (if nothing else) are much more flexible than arrays.

Related

What are the Disadvantages of Nested Vectors?

I'm still fairly new to C++ and have a lot left to learn, but something that I've become quite attached to recently is using nested (multidimensional) vectors. So I may typically end up with something like this:
std::vector<std::vector<std::string> > table;
Which I can then easily access elements of like this:
std::string data = table[3][5];
However, recently I've been getting the impression that it's better (in terms of performance) to have a single-dimensional vector and then just use "index arithmetic" to access elements correspondingly. I assume this performance impact is significant for much larger or higher dimensional vectors, but I honestly have no idea and haven't been able to find much information about it so far.
While, intuitively, it kind of makes sense that a single vector would have better performance than a a higher dimensional one, I honestly don't understand the actual reasons why. Furthermore, if I were to just use single-dimensional vectors, I would lose the intuitive syntax I have for accessing elements of multidimensional ones. So here are my questions:
Why are multidimensional vectors inefficient? If I were to only use a single-dimensional vector instead (to represent data in higher dimensions), what would be the best, most intuitive way to access its elements?
It depends on the exact conditions. I'll talk about the case, when the nested version is a true 2D table (i.e., all rows have equal length).
A 1D vector usually will be faster on every usage patterns. Or, at least, it won't be slower than the nested version.
Nested version can be considered worse, because:
it needs to allocate number-of-rows times, instead of one.
accessing an element takes an additional indirection, so it is slower (additional indirection is usually slower than the multiply needed in the 1D case)
if you process your data sequentially, then it could be much slower, if the 2D data is scattered around the memory. It is because there could be a lot of cache misses, depending how the memory allocator returns memory areas of different rows.
So, if you go for performance, I'd recommend you to create a 2D-wrapper class for 1D vector. This way, you could get as simple API as the nested version, and you'll get the best performance too. And even, if for some cause, you decide to use the nested version instead, you can just change the internal implementation of this wrapper class.
The most intuitive way to access 1D elements is y*width+x. But, if you know your access patterns, you can choose a different one. For example, in a painting program, a tile based indexing could be better for storing and manipulating the image. Here, data can be indexed like this:
int tileMask = (1<<tileSizeL)-1; // tileSizeL is log of tileSize
int tileX = x>>tileSizeL;
int tileY = y>>tileSizeL;
int tileIndex = tileY*numberOfTilesInARow + tileX;
int index = (tileIndex<<(tileSizeL*2)) + ((y&tileMask)<<tileSizeL) + (x&tileMask);
This method has a better spatial locality in memory (pixels near to each other tend to have a near memory address). Index calculation is slower than a simple y*width+x, but this method could have much less cache misses, so in the end, it could be faster.

Create matrix of random numbers in C++ without looping

I need to create a multidimensional matrix of randomly distributed numbers using a Gaussian distribution, and am trying to keep the program as optimized as possible. Currently I am using Boost matrices, but I can't seem to find anything that accomplishes this without manually looping. Ideally, I would like something similar to Python's numpy.random.randn() function, but this must be done in C++. Is there another way to accomplish this that is faster than manually looping?
You're going to have to loop anyway, but you can eliminate the array lookup inside your loop. True N-dimensional array indexing is going to be expensive, so you best option is any library (or written yourself) which also provides you with an underlying linear data store.
You can then loop over the entire n-dimensional array as if it was linear, avoiding many multiplications of the indexes by the dimensions.
Another optimization is to do away with the index altogether, and take a pointer to the first element, then iterate the pointer itself, this does away with a whole variable in the CPU which can give the compiler more space for other things. e.g. if you had 1000 elements in a vector:
vector<int> data;
data.resize(1000);
int *intPtr = &data[0];
int *endPtr = &data[0] + 1000;
while(intPtr != endPtr)
{
(*intPtr) == rand_function();
++intPtr;
}
Here, two tricks have happened. Pre-calculate the end condition outside the loop itself (this avoids a lookup of a function such as vector::size() 1000 times), and working with pointers to the data in memory rather than indexes. An index gets internally converted to a pointer every time it's used to access the array. By storing the "current pointer" and adding 1 to that each time, then the cost of calculating the pointers from indexes 1000 times is eliminated.
This can be faster but it depends on the implementation. Compilers can do some of the same hand-optimizations, but not all of them. The rand_function should also be inline to avoid the function call overhead.
A warning however: if you use std::vector with the pointer trick then it's not thread safe, if another thread changed the vector's length during the loop then the vector can get reallocated to a different place in memory. Don't do pointer tricks unless you'd be perfectly comfortable writing your own vector, array, table classes as needed.

How to speed std::vector access time

How to improve std::vector time?
Hi i am making a software for multivariable fuzzy k means cluster.
It work over big matrix´s 50.000 observations by 10 variables.
The matrix not need to grow up o shrink or bounds check.
Only make a resize to the needed size, load items and then make a lot of access.
First use:
`std::vector< std::vector<double> > matrix(NumClusters, std::vector<double>(NumObs,0.0));`
To get element do: double A=matrix[i][j]; But the procces time was of 20 minutes.
Then make:
std::vector<double> U(NumClusters *NumObs,0.0);
To get element do: double A=U[i*NumObs+j]; and the time was better.
Now want to make a question:
Which will be more faster to get access:
iterator+int
std::vector<double>::const_iterator Uit = U.begin();
double A= *(Uit+index)
pointer[int]
std::vector<double>::const_pointer Upt = U.data();
double A= Upt[index];
Or normal index access[int]
double A= U[index];
Greetings
One thing you could try is switch rows and columns. If you have a 10 × 50,000 matrix and you lay it down one row after another, then operations on rows will be more efficient than operations on columns because they'll have better locality. You might also want to consider std::valarray as that container should optimize certain math operations on vector data.
As has been said, using indices vs. pointers shouldn't matter as far as efficiency is concerned. Indices could be more readable.
A very C++ thing you might want to do (which shouldn't have any effects on efficiency, just code readability) is wrap the vector in a container that makes it behave like a 2D matrix but uses a contiguous 1D vector underneath. Take a look at How can I use a std::valarray to store/manipulate a contiguous 2D array? for inspiration.
When you store 2D matrix as vector<vector<int>>, you have to dereference two pointers sequentally to access an alement (double indirection). That's why most of the libraries store matrices as linear array vector<int> as you do it now. In this case only single indirection is used, moreover all the data is stored in more compact layout in memory.
Now regarding fastest access. Ideally, all the three forms of access you mention are equally fast. However, no compiler is perfect, some may have issues with inlining deep calls (at least MSVC seems to have such issues rarely). That's why if you want to ensure maximal speed, you should avoid using any C++ abstractions inside your inner loop. Use only pointers and indices, and that would be the fastest way indeed. Note however, that most likely there will be no speedup over other methods (perhaps generated assembly would be absolutely equal).
As a conclusion, this way is the fastest for me:
auto ptr = matrix.data();
auto num = matrix.size();
for (size_t i = 0; i < num; i++)
ptr[i] = ...; //do whatever complex math you have

Bitshifting elements in an array

I have an assignment in which I must read a list of 4000 names from a text file and sort then into a C style array as they're being read in (rather than reading them all then sorting). Since this is going involve a lot elements changing indices would it be possible to use bitshifting to rearrange large quantities of elements simultaneously?For example,
declare a heap based array of 20 size
place variable x index 10
perform a bitshift on index 9 with the size of the array data type so that x is now in index 11
Also, if you have any tips on the task in general I'd appreciate it.
No, that doesn't sound at all like something you'd use bitshifting for.
You will have distinct elements (the names) stored in an array, and you need to change the order of entire elements. This is not what bitshifting is used for; it is used to move the bits in a single integer to the left or to the right.
You should just learn qsort().
Not sure about the "sort as they're being read in" requirement, but the easiest solution would be to just call qsort() as each name is added. If that's not allowed or deemed too expensive, think about how to do a "sorted insert" against an array.
By the way, a typical approach in C would be to work with an array of pointers to strings, rather than an array of actual strings. This is good, since sorting an array of pointers is much easier.
So you would have:
char *names[4000];
instead of
char names[4000][64 /* or whatever */];
This would require you to dynamically allocate space for each name as it's loaded though, which isn't to hard. Especially not if you have strdup(). :)
If using qsort() is not allowed(would be pretty stupid to do so after every insert), you could write your own insertion sort. It's not exactly a very efficient way of sorting large arrays but I suppose it's what your teacher is expecting for.

Choice of the most performant container (array)

This is my little big question about containers, in particular, arrays.
I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[], etc.).
There are a few restrictions on how this set is used:
its size is read from a configuration file and won't change during execution
it can be viewed as a big two dimensional array of N (e.g. 1 000 000) lines and 6 columns (each one storing the coordinate in one dimension)
the array is manipulated in a big loop, each "particle / line" is accessed and computation takes place with its coordinates, and the results are stored back for this particle, and so on for each particle, and so on for each iteration of the big loop.
no new elements are added or deleted during the execution
First conclusion, as the access on the elements is essentially done by accessing each element one by one with [], I think that I should use a normal dynamic array.
I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.
As I understand there is no advantage to use a dynamically allocated array instead of a std::vector, so things like double** array2d = new ..., loop of new, etc are ruled out.
So is it a good idea to use std::vector<double> ?
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > my_array that can be indexed like my_array[i][j], or is it a bad idea and it would be better to use std::vector<double> other_array and acces it with other_array[6*i+j].
Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
Another option, the one that I am using so far is to use Blitz, in particular blitz::Array:
typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;
Where my elements are accessed like that: my_store(line, column);.
I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.
Do you think that Blitz is OK, or is it useless in my case ?
These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.
Thanks a lot for your help on this problem !
Edit:
From the very interesting answers and comments bellow a good solution seems to be the following:
Use a structure particle (containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)
Use a vector or a deque of this particle structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.
In addition I can also use a Blitz::TinyVector<double,6> instead of a structure.
So is it a good idea to use std::vector<double> ?
Usually, a std::vector should be the first choice of container. You could use either std::vector<>::reserve() or std::vector<>::resize() to avoid reallocations while populating the vector. Whether any other container is better can be found by measuring. And only by measuring. But first measure whether anything the container is involved in (populating, accessing elements) is worth optimizing at all.
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > [...]?
No. IIUC, you are accessing your data per particle, not per row. If that's the case, why not use a std::vector<particle>, where particle is a struct holding six values? And even if I understood incorrectly, you should rather write a two-dimensional wrapper around a one-dimensional container. Then align your data either in rows or columns - what ever is faster with your access patterns.
Do you think that Blitz is OK, or is it useless in my case?
I have no practical knowledge about blitz++ and the areas it is used in. But isn't blitz++ all about expression templates to unroll loop operations and optimizing away temporaries when doing matrix manipulations? ICBWT.
First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct:
struct Particle { /* coords */ };
Then we can make a simple one dimensional array of these Particles.
I would probably use a deque, because that's the default container, but you may wish to try a vector, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque will allocate several chunks.
WARNING:
As Alexandre C remarked, if you go the deque road, refrain from using operator[] and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector should prove faster.
The first rule when choosing from containers is to use std::vector. Then, only after your code is complete and you can actually measure performance, you can try other containers. But stick to vector first. (And use reserve() from the start)
Then, you shouldn't use an std::vector<std::vector<double> >. You know the size of your data: it's 6 doubles. No need for it to be dynamic. It is constant and fixed. You can define a struct to hold you particle members (the six doubles), or you can simply typedef it: typedef double particle[6]. Then, use a vector of particles: std::vector<particle>.
Furthermore, as your program uses the particle data contained in the vector sequentially, you will take advantage of the modern CPU cache read-ahead feature at its best performance.
You could go several ways. But in your case, don't declare astd::vector<std::vector<double> >. You're allocating a vector (and you copy it around) for every 6 doubles. Thats way too costly.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
(other_array[i,j] won't work too well, as i,j employs the comma operator to evaluate the value of "i", then discards that and evaluates and returns "j", so it's equivalent to other_array[i]).
You will need to use one of:
other_array[i][j]
other_array(i, j) // if other_array implements operator()(int, int),
// but std::vector<> et al don't.
other_array[i].identifier // identifier is a member variable
other_array[i].identifier() // member function getting value
other_array[i].identifier(double) // member function setting value
You may or may not prefer to put get_ and set_ or similar on the last two functions should you find them useful, but from your question I think you won't: functions are prefered in APIs between parts of large systems involving many developers, or when the data items may vary and you want the algorithms working on the data to be independent thereof.
So, a good test: if you find yourself writing code like other_array[i][3] where you've decided "3" is the double with the speed in it, and other_array[i][5] because "5" is the the acceleration, then stop doing that and give them proper identifiers so you can say other_array[i].speed and .acceleration. Then other developers can read and understand it, and you're much less likely to make accidental mistakes. On the other hand, if you are iterating over those 6 elements doing exactly the same things to each, then you probably do want Particle to hold a double[6], or to provide an operator[](int). There's no problem doing both:
struct Particle
{
double x[6];
double& speed() { return x[3]; }
double speed() const { return x[3]; }
double& acceleration() { return x[5]; }
...
};
BTW / the reason that vector<vector<double> > may be too costly is that each set of 6 doubles will be allocated on the heap, and for fast allocation and deallocation many heap implementations use fixed-size buckets, so your small request will be rounded up t the next size: that may be a significant overhead. The outside vector will also need to record a extra pointer to that memory. Further, heap allocation and deallocation is relatively slow - in you're case, you'd only be doing it at startup and shutdown, but there's no particular point in making your program slower for no reason. Even more importantly, the areas on the heap may just around in memory, so your operator[] may have cache-faults pulling in more distinct memory pages than necessary, slowing the entire program. Put another way, vectors store elements contiguously, but the pointed-to-vectors may not be contiguous.