vector optimization - simple method - c++

i have a doubt on following case;
suppose i want to define a vector of vector to acomadate set of elements and i can add the data and can be used those elemnts to compute something else. then i dont want that vector anymore. then later, suppose if i want to accomadae another set of data as a vector of vector, then i can reuse the previously created variable, then;
(1) if I created my vector of vector as dynamic memory and deleted as
vector<vector<double> > *myvector = new vector<vecctor<double> >;
//do push back and use it
delete myvector;
and then reuse again
(2) if I created my vector of vector as simply
vector<vector<double> > myvector;
//do push back and use it
myvector.clear();
and then reuse again
but, i guess in both method very few memory is remaining though we have removed it. so, i would like to know what is the efficient method to defin this vector of vector.
(3) if the size of the inside vector is always 2, then is it still efficient to use
vector of vector by defining
vector<vector<double> > myvector(my_list.size(), vector<double>(2)) other than another container type
(4) if i use predefined another class to hold inside 2 elements and then take a vector of those object type as (for example XY is the class which can hold 2 elements, may be as an array)
vector<XY>;
i hope, please anyone comment me what would be the most efficient method (from 1-4) interms of speed and memory needed. is there any better ways, plz suggest me too. thanks

If you only need two elements the most efficient way is probably:
std::vector<std::tr1::array<double, 2>> myvector;
// or std::vector<std::pair<double, double>> myvector;
// use it
myvector.clear(); // This will not deallocate any memory, so what has alrdy been allocated will be used for future push_backs

If this is homework (or not) and you want to know which is faster then you should try it: run the "push back and clear" or "new and delete" in a loop a few million times and time the execution. I suspect 2. will be a bit faster.
You say you want to accommodate a set of elements, and that the size of the inner vector is 2.
That is a bit ambiguous, but you can further examine 2 things:
1. std::pair, if your "element" contains 2 other "things"
2. std::map, if you want to reference of of the "elements" based on the value of another "element"

Related

Overwriting vector data

I've got a function that uses a 2d string type vector. The function should read data from a file, store it into vector A and push back A into B. Then, the loop gets new values for A, which should overwrite the previous ones. How am I supposed to do this? Is there any way to "reset" the vector, so that vector.push_back() inserts data from the beggining, instead of attaching it to the end?
Simply use a single dimension vector of std::strings and just use std::vector::clear to remove the old content before std::vector::inserting the next.
Notice that clear just removes all the elements, but leaves the capacity unchanged.
If you really have to keep the 2 dimention vector, then just apply what I said to vector[0] or any generic vector[i] vector you may need to work on.

Empty vector in large array of objects

I have a large array of objects (10s Millions) of class A and I want to add a vector as a member to the class A. This vector is needed just for few percent of objects in the array. I was wondering, would it be a wise choice to add a vector to the class? how much memory will take an empty vector?
Now we know that empty vector are not very “big” (VC2012x64 intellisense show sizeof(std::vector<int>) is 16 byte). If sizeof(A) is much bigger than size-of vector adding a vector member to A could be a good solution for you. But if it is not good, and will add to much memory, and really not many A has the vector, I’d create a second container with the vectors. For example:
#include <unordered_map>
unordered_map<size_t , vector<T>> VectorForA;
where size_t is mean to be the type of the index of the big array of A, and vector<T> the type of the vector you want to add to A. This could be good for a fixed index "big" array. If somehow the A's in the big array dont have fixed positions, making the value of A the key could led to a simpler code(also, only if the values of A dont repeat).
NOTE: I was (I'm) wating to see a full answer from #Andy Prowl or #Tony D with I think will be very usefull

Is it okay to use constructors to initialize a 2D Vector as a one-liner in C++?

Is it okay to initialize a 2D vector like this (here all values in a 5x4 2D vectro are initialized to 3)?
std::vector<std::vector<int> > foo(5, std::vector<int>(4, 3));
This seems to behave okay, but everywhere I look on the web people seem to recommend initializing such a vector with for loops and push_back(). I was initially afraid that all rows here would point to the same vector, but that doesn't seem to be the case. Am I missing something?
This is perfectly valid - You'll get a 2D vector ([5, 4] elements) with every element initialized to 3.
For most other cases (where you e.g. want different values in different elements) you cannot use any one-liner - and therefore need loops.
Well, the code is valid and it indeed does what you want it to do (assuming I understood your intent correctly).
However, doing it that way is generally inefficient (at least in the current version of the language/library). The above initialization creates a temporary vector and then initializes individual sub-vectors by copying the original. That can be rather inefficient. For this reason in many cases it is preferrable to construct the top-level vector by itself
std::vector<std::vector<int> > foo(5);
and then iterate over it and build its individual sub-vectors in-place by doing something like
foo[i].resize(4, 3);

Choice of the most performant container (array)

This is my little big question about containers, in particular, arrays.
I am writing a physics code that mainly manipulates a big (> 1 000 000) set of "particles" (with 6 double coordinates each). I am looking for the best way (in term of performance) to implement a class that will contain a container for these data and that will provide manipulation primitives for these data (e.g. instantiation, operator[], etc.).
There are a few restrictions on how this set is used:
its size is read from a configuration file and won't change during execution
it can be viewed as a big two dimensional array of N (e.g. 1 000 000) lines and 6 columns (each one storing the coordinate in one dimension)
the array is manipulated in a big loop, each "particle / line" is accessed and computation takes place with its coordinates, and the results are stored back for this particle, and so on for each particle, and so on for each iteration of the big loop.
no new elements are added or deleted during the execution
First conclusion, as the access on the elements is essentially done by accessing each element one by one with [], I think that I should use a normal dynamic array.
I have explored a few things, and I would like to have your opinion on the one that can give me the best performances.
As I understand there is no advantage to use a dynamically allocated array instead of a std::vector, so things like double** array2d = new ..., loop of new, etc are ruled out.
So is it a good idea to use std::vector<double> ?
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > my_array that can be indexed like my_array[i][j], or is it a bad idea and it would be better to use std::vector<double> other_array and acces it with other_array[6*i+j].
Maybe this can gives better performance, especially as the number of columns is fixed and known from the beginning.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
Another option, the one that I am using so far is to use Blitz, in particular blitz::Array:
typedef blitz::Array<double,TWO_DIMENSIONS> store_t;
store_t my_store;
Where my elements are accessed like that: my_store(line, column);.
I think there are not much advantage to use Blitz in my case because I am accessing each element one by one and that Blitz would be interesting if I was using operations directly on array (like matrix multiplication) which I am not.
Do you think that Blitz is OK, or is it useless in my case ?
These are the possibilities I have considered so far, but maybe the best one I still another one, so don't hesitate to suggest me other things.
Thanks a lot for your help on this problem !
Edit:
From the very interesting answers and comments bellow a good solution seems to be the following:
Use a structure particle (containing 6 doubles) or a static array of 6 doubles (this avoid the use of two dimensional dynamic arrays)
Use a vector or a deque of this particle structure or array. It is then good to traverse them with iterators, and that will allow to change from one to another later.
In addition I can also use a Blitz::TinyVector<double,6> instead of a structure.
So is it a good idea to use std::vector<double> ?
Usually, a std::vector should be the first choice of container. You could use either std::vector<>::reserve() or std::vector<>::resize() to avoid reallocations while populating the vector. Whether any other container is better can be found by measuring. And only by measuring. But first measure whether anything the container is involved in (populating, accessing elements) is worth optimizing at all.
If I use a std::vector, should I create a two dimensional array like std::vector<std::vector<double> > [...]?
No. IIUC, you are accessing your data per particle, not per row. If that's the case, why not use a std::vector<particle>, where particle is a struct holding six values? And even if I understood incorrectly, you should rather write a two-dimensional wrapper around a one-dimensional container. Then align your data either in rows or columns - what ever is faster with your access patterns.
Do you think that Blitz is OK, or is it useless in my case?
I have no practical knowledge about blitz++ and the areas it is used in. But isn't blitz++ all about expression templates to unroll loop operations and optimizing away temporaries when doing matrix manipulations? ICBWT.
First of all, you don't want to scatter the coordinates of one given particle all over the place, so I would begin by writing a simple struct:
struct Particle { /* coords */ };
Then we can make a simple one dimensional array of these Particles.
I would probably use a deque, because that's the default container, but you may wish to try a vector, it's just that 1.000.000 of particles means about a single chunk of a few MBs. It should hold but it might strain your system if this ever grows, while the deque will allocate several chunks.
WARNING:
As Alexandre C remarked, if you go the deque road, refrain from using operator[] and prefer to use iteration style. If you really need random access and it's performance sensitive, the vector should prove faster.
The first rule when choosing from containers is to use std::vector. Then, only after your code is complete and you can actually measure performance, you can try other containers. But stick to vector first. (And use reserve() from the start)
Then, you shouldn't use an std::vector<std::vector<double> >. You know the size of your data: it's 6 doubles. No need for it to be dynamic. It is constant and fixed. You can define a struct to hold you particle members (the six doubles), or you can simply typedef it: typedef double particle[6]. Then, use a vector of particles: std::vector<particle>.
Furthermore, as your program uses the particle data contained in the vector sequentially, you will take advantage of the modern CPU cache read-ahead feature at its best performance.
You could go several ways. But in your case, don't declare astd::vector<std::vector<double> >. You're allocating a vector (and you copy it around) for every 6 doubles. Thats way too costly.
If you think that this is the best option, would it be possible to wrap this vector in a way that it can be accessed with a index operator defined as other_array[i,j] // same as other_array[6*i+j] without overhead (like function call at each access) ?
(other_array[i,j] won't work too well, as i,j employs the comma operator to evaluate the value of "i", then discards that and evaluates and returns "j", so it's equivalent to other_array[i]).
You will need to use one of:
other_array[i][j]
other_array(i, j) // if other_array implements operator()(int, int),
// but std::vector<> et al don't.
other_array[i].identifier // identifier is a member variable
other_array[i].identifier() // member function getting value
other_array[i].identifier(double) // member function setting value
You may or may not prefer to put get_ and set_ or similar on the last two functions should you find them useful, but from your question I think you won't: functions are prefered in APIs between parts of large systems involving many developers, or when the data items may vary and you want the algorithms working on the data to be independent thereof.
So, a good test: if you find yourself writing code like other_array[i][3] where you've decided "3" is the double with the speed in it, and other_array[i][5] because "5" is the the acceleration, then stop doing that and give them proper identifiers so you can say other_array[i].speed and .acceleration. Then other developers can read and understand it, and you're much less likely to make accidental mistakes. On the other hand, if you are iterating over those 6 elements doing exactly the same things to each, then you probably do want Particle to hold a double[6], or to provide an operator[](int). There's no problem doing both:
struct Particle
{
double x[6];
double& speed() { return x[3]; }
double speed() const { return x[3]; }
double& acceleration() { return x[5]; }
...
};
BTW / the reason that vector<vector<double> > may be too costly is that each set of 6 doubles will be allocated on the heap, and for fast allocation and deallocation many heap implementations use fixed-size buckets, so your small request will be rounded up t the next size: that may be a significant overhead. The outside vector will also need to record a extra pointer to that memory. Further, heap allocation and deallocation is relatively slow - in you're case, you'd only be doing it at startup and shutdown, but there's no particular point in making your program slower for no reason. Even more importantly, the areas on the heap may just around in memory, so your operator[] may have cache-faults pulling in more distinct memory pages than necessary, slowing the entire program. Put another way, vectors store elements contiguously, but the pointed-to-vectors may not be contiguous.

Passing multidimensional array back through access members

I have a class "foo" that has a multi dimensional array and need to provide a copy of the array through a getArray member. Is there a nice way of doing this when the array is dynamically created so I can not pass the array back a const as the array is always being deleted, recreated etc. I thought about creating a new dynamic array to pass it back but is this acceptable as the calling code would need to know to delete this etc.
Return an object, not a naked array. The object can have a copy constructor, destructor etc. which will do the copying, deletion etc. for the user.
class Matrix {
// handle creation and access to your multidim array
// including copying, deletion etc.
};
class A { // your class
Matrix m; // the classes matrix
Matrix getArray() {
return m;
}
};
Easy answer to your question is, not this is not a good design, as it should be the creating class that should handle the deletion/release of the array.
The main point is why do you keep deleting/recreating this multi dimensional array? Can you not create one instance, and then just modify when need be?
Personally I would return the array as it is, and iterate over it and do any calculations/functions on it during the loop therefore saving resources by not creating/deleting the array.
Neil's probably the best answer. The second best will be not to use an array. In C++, when you talk about dynamic array, it means vector.
There are two possibilities:
nested vectors: std::vector<int, std::vector<int> >(10, std::vector<int>(20))
simple vector: std::vector<int>(200)
Both will have 200 items. The first is clearly multi-dimensional, while the second leaves you the task of computing offsets.
The second ask for more work but is more performing memory-wise since a single big chunk is allocated instead of one small chunks pointing to ten medium ones...
But as Neil said, a proper class of your own to define the exact set of operations is better :)