How to pre-allocate memory for a growing Eigen::MatrixXd

How to pre-allocate memory for a growing Eigen::MatrixXd - c++

I have a growing database in form of an Eigen::MatrixXd. My matrix starts empty and gets rows added one by one until it reaches a maximum predefined (known at compile time) number of rows.
At the moment I grow it like that (from the Eigen docs and many posts here and elsewhere):
MatrixXd new_database(database.rows()+1, database.cols());
new_database << database, new_row;
database = new_database;
But this seems way more inefficient than it needs to be, since it makes a lot of useless memory reallocation and data copying every time a new row is added... Seems like I should be able to pre-allocate a bunch of memory of size MAX_ROWS*N_COLS and let the matrix grow in it, however I can't find an equivalent of std::vector's capacity with Eigen.
Note: I may need to use the matrix at anytime before it is actually full. So I do need a distinction between what would be its size and what would be its capacity.
How can I do this?
EDIT 1: I see there is a MaxSizeAtCompileTime but I find the doc rather unclear with no examples. Anyone knows if this is the way to go, how to use this parameter and how it would interact with the resize and conservativeResize?
EDIT 2: C++: Eigen conservativeResize too expensive? provides another interesting approach while raising question regarding non contiguous data... Anyone has some good insight on that matter?

First thing I want to mention before I forget, is that you may want to consider using a row major matrix for storage.
The simplest (and probably best) solution to your question would be to use block operations to access the top rows.
#include <Eigen/Core>
#include <iostream>
using namespace Eigen;
int main(void)
{
const int rows = 5;
const int cols = 6;
MatrixXd database(rows, cols);
database.setConstant(-1.0);
std::cout << database << "\n\n";
for (int i = 0; i < rows; i++)
{
database.row(i) = VectorXd::Constant(cols, i);
// Use block operations instead of the full matrix
std::cout << database.topRows(i+1) << "\n\n";
}
std::cout << database << "\n\n";
return 0;
}
Instead of just printing the matrix, you could do whatever operations you require.

Related

Copying vector elements to a vector pair

In my C++ code,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
So I combined these two vectors into a single one,
void combineVectors(vector<string>& strVector, vector <int>& intVector, vector < pair <string, int>>& pairVector)
{
for (int i = 0; i < strVector.size() || i < intVector.size(); ++i )
{
pairVector.push_back(pair<string, int> (strVector.at(i), intVector.at(i)));
}
}
Now this function is called like this,
vector <string> strVector = GetStringVector();
vector <int> intVector = GetIntVector();
vector < pair <string, int>> pairVector
combineVectors(strVector, intVector, pairVector);
//rest of the implementation
The combineVectors function uses a loop to add the elements of other 2 vectors to the vector pair. I doubt this is a efficient way as this function gets called hundrands of times passing different data. This might cause a performance issue because everytime it goes through the loop.
My goal is to copy both the vectors in "one go" to the vector pair. i.e., without using a loop. Am not sure whether that's even possible.
Is there a better way of achieving this without compromising the performance?

You have clarified that the arrays will always be of equal size. That's a prerequisite condition.
So, your situation is as follows. You have vector A over here, and vector B over there. You have no guarantees whether the actual memory that vector A uses and the actual memory that vector B uses are next to each other. They could be anywhere.
Now you're combining the two vectors into a third vector, C. Again, no guarantees where vector C's memory is.
So, you have really very little to work with, in terms of optimizations. You have no additional guarantees whatsoever. This is pretty much fundamental: you have two chunks of bytes, and those two chunks need to be copied somewhere else. That's it. That's what has to be done, that's what it all comes down to, and there is no other way to get it done, other than doing exactly that.
But there is one thing that can be done to make things a little bit faster. A vector will typically allocate memory for its values in incremental steps, reserving some extra space, initially, and as values get added to the vector, one by one, and eventually reach the vector's reserved size, the vector has to now grab a new larger block of memory, copy everything in the vector to the larger memory block, then delete the older block, and only then add the next value to the vector. Then the cycle begins again.
But you know, in advance, how many values you are about to add to the vector, so you simply instruct the vector to reserve() enough size in advance, so it doesn't have to repeatedly grow itself, as you add values to it. Before your existing for loop, simply:
pairVector.reserve(pairVector.size()+strVector.size());
Now, the for loop will proceed and insert new values into pairVector which is guaranteed to have enough space.
A couple of other things are possible. Since you have stated that both vectors will always have the same size, you only need to check the size of one of them:
for (int i = 0; i < strVector.size(); ++i )
Next step: at() performs bounds checking. This loop ensures that i will never be out of bounds, so at()'s bound checking is also some overhead you can get rid of safely:
pairVector.push_back(pair<string, int> (strVector[i], intVector[i]));
Next: with a modern C++ compiler, the compiler should be able to optimize away, automatically, several redundant temporaries, and temporary copies here. It's possible you may need to help the compiler, a little bit, and use emplace_back() instead of push_back() (assuming C++11, or later):
pairVector.emplace_back(strVector[i], intVector[i]);
Going back to the loop condition, strVector.size() gets evaluated on each iteration of the loop. It's very likely that a modern C++ compiler will optimize it away, but just in case you can also help your compiler check the vector's size() only once:
int i=strVector.size();
for (int i = 0; i < n; ++i )
This is really a stretch, but it might eke out a few extra quantums of execution time. And that pretty much all obvious optimizations here. Realistically, the most to be gained here is by using reserve(). The other optimizations might help things a little bit more, but it all boils down to moving a certain number of bytes from one area in memory to another area. There aren't really special ways of doing that, that's faster than other ways.

We can use std:generate() to achieve this:
#include <bits/stdc++.h>
using namespace std;
vector <string> strVector{ "hello", "world" };
vector <int> intVector{ 2, 3 };
pair<string, int> f()
{
static int i = -1;
++i;
return make_pair(strVector[i], intVector[i]);
}
int main() {
int min_Size = min(strVector.size(), intVector.size());
vector< pair<string,int> > pairVector(min_Size);
generate(pairVector.begin(), pairVector.end(), f);
for( int i = 0 ; i < 2 ; i++ )
cout << pairVector[i].first <<" " << pairVector[i].second << endl;
}

I'll try and summarize what you want with some possible answers depending on your situation. You say you want a new vector that is essentially a zipped version of two other vectors which contain two heterogeneous types. Where you can access the two types as some sort of pair?
If you want to make this more efficient, you need to think about what you are using the new vector for? I can see three scenarios with what you are doing.
The new vector is a copy of your data so you can do stuff with it without affecting the original vectors. (ei you still need the original two vectors)
The new vector is now the storage mechanism for your data. (ei you
no longer need the original two vectors)
You are simply coupling the vectors together to make use and representation easier. (ei where they are stored doesn't actually matter)
1) Not much you can do aside from copying the data into your new vector. Explained more in Sam Varshavchik's answer.
3) You do something like Shakil's answer or here or some type of customized iterator.
2) Here you make some optimisations here where you do zero coping of the data with the use of a wrapper class. Note: A wrapper class works if you don't need to use the actual std::vector < std::pair > class. You can make a class where you move the data into it and create access operators for it. If you can do this, it also allows you to decompose the wrapper back into the original two vectors without copying. Something like this might suffice.
class StringIntContainer {
public:
StringIntContaint(std::vector<std::string>& _string_vec, std::vector<int>& _int_vec)
: string_vec_(std::move(_string_vec)), int_vec_(std::move(_int_vec))
{
assert(string_vec_.size() == int_vec_.size());
}
std::pair<std::string, int> operator[] (std::size_t _i) const
{
return std::make_pair(string_vec_[_i], int_vec_[_i]);
}
/* You may want methods that return reference to data so you can edit it*/
std::pair<std::vector<std::string>, std::vector<int>> Decompose()
{
return std::make_pair(std::move(string_vec_), std::move(int_vec_[_i])));
}
private:
std::vector<std::string> _string_vec_;
std::vector<int> int_vec_;
};

Why a program starts slow but later gains the full speed?

I noticed that sometimes a program runs very slow but later the performance is good. For example, I have some code which I run in a loop and the first iteration takes ages but other iterations of the same code runs pretty fast. It's hard to name the circumstances because I can't figure it out and it seems that even single literal can affect this behavior. I prepared a small code snippet:
#include <chrono>
#include <vector>
#include <iostream>
using namespace std;
int main()
{
const int num{ 100000 };
vector<vector<int>> octs;
for (int i{ 0 }; i < num; ++i)
{
octs.emplace_back(vector<int>{ 42 });
}
vector<int> datas;
for (int i{ 0 }; i < num; ++i)
{
datas.push_back(42);
}
for (int n{ 0 }; n < 10; ++n)
{
cout << "start" << '\n';
//cout << 0 << "start" << '\n';
auto start = chrono::high_resolution_clock::now();
for (int i{ 0 }; i < num; ++i)
{
vector<int> points{ 42 };
}
auto end = chrono::high_resolution_clock::now();
auto time = chrono::duration_cast<chrono::milliseconds>(end - start);
cout << time.count() << '\n';
}
cin.get();
return 0;
}
The first two vectors are essential. At least with Visual Studio. Thought they're not in use they affect the performance a lot. Moreover, tweaking them also give performance effect (like change the order of initialization, remove push_back and allocate the necessary size in constructor). But this code as it is gives me the following results:
with gcc there're no problems at all
with clang the first iteration takes two times longer than the others
with vs2013 the first iteration is 100 (yes, one hundred) times slower.
Moreover, with vs2013 if I uncomment the line cout << 0 << "start" << '\n'; the performance problem goes away and all iterations are equal!
What's going on?

For your first two loops, probably the biggest performance consideration is going to be the allocation of memory, and the copying of the vector contents to the larger buffer. In this case, the fact that the loops appear to be 'gaining speed' is not surprising.
This is due to the implementation details of the vector class. Let's look at the documentation:
Internally, vectors use a dynamically allocated array to store their
elements. This array may need to be reallocated in order to grow in
size when new elements are inserted, which implies allocating a new
array and moving all elements to it. This is a relatively expensive
task in terms of processing time, and thus, vectors do not reallocate
each time an element is added to the container.
Instead, vector containers may allocate some extra storage to
accommodate for possible growth, and thus the container may have an
actual capacity greater than the storage strictly needed to contain
its elements (i.e., its size). Libraries can implement different
strategies for growth to balance between memory usage and
reallocations, but in any case, reallocations should only happen at
logarithmically growing intervals of size so that the insertion of
individual elements at the end of the vector can be provided with
amortized constant time complexity (see push_back).
So under the hood, the actual memory allocated for your vector might be much more than what you are actually using. So the vector only needs to do the costly re-allocation and copy when you add a new element to the vector which wouldn't fit into its current buffer. Moreover, since it says that re-allocations should only happen at logarithmically growing intervals, you can expect that the vector class is roughly doubling the buffer size every time it needs to re-allocate. But note that the vector implementations on various platforms are highly tuned to be optimal for the most common usage patterns for the class, which could be one factor in the different performance you are seeing across tool chains and platforms.
So you should see the loops be slow on the first several executions, and then gain more speed as push_back and emplace operations need to do fewer re-allocations and copies to accommodate the new elements.
So I think this is the main fact you can use to reason about how long your first two loops should take to execute. But for your specific examples, due to the simplicity of the program, the compiler may be taking some liberties with what code it generates. So we could imagine that a sufficiently clever optimizing compiler might be able to see that your vectors will only be growing to a size which it knows at compile time, num. And this is the biggest issue I suspect with your last loop, which seems like an arbitrary and useless test. For example, the nested loop in loop 3 can be optimized away entirely. I think this is the main reason why you are seeing such different run-time behavior across the different compilers.
If you want to get the real story, take a look at the assembly code that your compiler is generating.

C++ vectorizing a vector of vectors

I have some code that uses a vector<vector<>> to store calculation results.
Through benchmarking, I have found that this is preventing my code from vectorizing, even though I am accessing the elements with the appropriate C-stride.
I am trying to come up with a data structure that will vectorize and improve my code's performance.
I read a few posts on here, and several of them mentioned creating a class that has 2 separate vectors inside: 1 for storing the data contiguously, and another for storing indices marking the beginning of a new column/row from the original 2D vector<vector>. Essentially, it would decompose the 2D array into a 1D, and use the "helper" vector to allow for proper indexing.
My concern is that I have also read that vectorization doesn't usually happen with indirect indexing like this, such as in common compressed row storage scheme for sparse matrices.
Before I go through all the work of implementing this, has anybody run into this problem before and solved it? Any other suggestions or resources that could help?

I wrote a small matrix class based on std::vector:
#include <vector>
template <typename T>
class MyMatrix {
public:
typedef T value_type;
struct RowPointer {
int row_index;
MyMatrix* parent;
RowPointer(int r,MyMatrix* p) : row_index(r),parent(p) {}
T& operator[](int col_index) {
return parent->values[row_index*parent->cols+col_index];
}
};
MyMatrix() : rows(0),cols(0),size(0),values(){}
MyMatrix(int r,int c) : rows(r),cols(c),size(r*c),values(std::vector<T>(size)){}
RowPointer operator[](int row_index){return RowPointer(row_index,this);}
private:
size_t rows;
size_t cols;
size_t size;
std::vector<T> values;
};
It can be used like this:
MyMatrix<double> mat = MyMatrix<double>(4,6);
mat[1][2] = 3;
std::cout << mat[0][0] << " " << mat[1][2] << std::endl;
It still misses lots of stuff, but I think it is enough to illustrate the idea of flattening the matrix. From your question it was not 100% clear, if your rows have different sizes, then the access pattern is a little bit more complicated.
PS: I dont want to change the answer anymore, but I would never use a std::vector to construct a matrix again. Vectors offer flexibility that is not required for a matrix, which usually has same and fixed number of entries in each row.

std::bad_alloc at transpose of Eigen::SparseMatrix

I'm trying to calculate the following:
A = X^t * X
I'm using the Eigen::SparseMatrix and get a std::bad_alloc error on the transpose() operation:
Eigen::SparseMatrix<double> trans = sp.transpose();
sp is also a Eigen::SparseMatrix Matrix, but it is very big, on one of the smaller datasets, the commands
std::cout << "Rows: " << sp.rows() << std::endl;
std::cout << "Rows: " << sp.cols() << std::endl;
give the following result:
Rows: 2061565968
Cols: 600
(I precompute the sizes of this matrix before I start to fill it)
Is there a limit on how many entries such a matrix can hold?
I'm using a 64bit Linux system with g++
Thanks in advance
Alex

The answer from ggael worked with a slight modification:
In the definition of the SparseMatrix one cannot ommit the options, so the correct typedef is
typedef SparseMatrix<double, 0, std::ptrdiff_t> SpMat;
The 0 can also be exchanged for a 1, 0 means column-major and 1 means RowMajor
Thank your for your help

By default Eigen::SparseMatrix uses int to stores sizes and indices (for compactness). However, with that huge amount of rows, you need to use 64 integers for both sp and sp.transpose():
typedef SparseMatrix<double, 0, std::ptrdiff_t> SpMat;
Note that you can directly write:
SpMat sp, sp2;
sp2 = sp.transpose() * sp;
even though sp.transpose() will have to be evaluated into a temporary anyway.

I think it is impossible to answer your question in its current state.
There are two things. The size of the matrix - the mathematical object, and the size understood as memory it occupies. In dense matrices the are pretty much the same (linear dependence). But in sparse case the memory occupation is not tied to the size of the matrix, but to the number of non-zero elements.
So, technically, you have pretty much unlimited size constraints - equal to the Size type. However, you are, of course, still bound by memory when it comes to the number of (non-zero) elements.
You make a copy of a matrix obviously. So you could try calculating the size of the data the matrix object need to hold, and see if it fits within your memory.
This is not very trivial, but docs say that the storage is a list of non-zero elements. So a good estimate would probably be (2*sizeof(Index)+sizeof(Scalar))*sp.nonZeros() - for (x,y,value).
You could also monitor RAM usage before calling the transpose, and see if it stays within the limit if you double it.
Note: The transposition is probably not the culprit there, but operator=. Maybe you can avoid making the copy.

If I want maximum speed, should I just use an array over a std::vector? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am writing some code that needs to be as fast as possible without sucking up all of my research time (in other words, no hand optimized assembly).
My systems primarily consist of a bunch of 3D points (atomic systems) and so the code I write does lots of distance comparisons, nearest-neighbor searches, and other types of sorting and comparisons. These are large, million or billion point systems, and the naive O(n^2) nested for loops just won't cut it.
It would be easiest for me to just use a std::vector to hold point coordinates. And at first I thought it will probably be about as fast an array, so that's great! However, this question (Is std::vector so much slower than plain arrays?) has left me with a very uneasy feeling. I don't have time to write all of my code using both arrays and vectors and benchmark them, so I need to make a good decision right now.
I am sure that someone who knows the detailed implementation behind std::vector could use those functions with very little speed penalty. However, I primarily program in C, and so I have no clue what std::vector is doing behind the scenes, and I have no clue if push_back is going to perform some new memory allocation every time I call it, or what other "traps" I could fall into that make my code very slow.
An array is simple though; I know exactly when memory is being allocated, what the order of all my algorithms will be, etc. There are no blackbox unknowns that I may have to suffer through. Yet so often I see people criticized for using arrays over vectors on the internet that I can't but help wonder if I am missing some more information.
EDIT: To clarify, someone asked "Why would you be manipulating such large datasets with arrays or vectors"? Well, ultimately, everything is stored in memory, so you need to pick some bottom layer of abstraction. For instance, I use kd-trees to hold the 3D points, but even so, the kd-tree needs to be built off an array or vector.
Also, I'm not implying that compilers cannot optimize (I know the best compilers can outperform humans in many cases), but simply that they cannot optimize better than what their constraints allow, and I may be unintentionally introducing constraints simply due to my ignorance of the implementation of vectors.

all depends on this how you implement your algorithms. std::vector is such general container concept that gives us flexibility but leaves us with freedom and responsibility of structuring implementation of algorithm deliberately. Most of the efficiency overhead that we will observe from std::vector comes from copying. std::vector provides a constructor which lets you initialize N elements with value X, and when you use that, the vector is just as fast as an array.
I did a tests std::vector vs. array described here,
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
void UseVectorCtor()
{
TestTimer t("UseConstructor");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorCtor();
UseVectorPushBack();
return 0;
}
and here are results (compiled on Ubuntu amd64 with g++ -O3):
UseArray completed in 0.325 seconds
UseVector completed in 1.23 seconds
UseConstructor completed in 0.866 seconds
UseVectorPushBack completed in 8.987 seconds
The whole thing completed in 11.411 seconds
clearly push_back wasn't good choice here, using constructor is still 2 times slower than array.
Now, providing Pixel with empty copy Ctor:
Pixel(const Pixel&) {}
gives us following results:
UseArray completed in 0.331 seconds
UseVector completed in 0.306 seconds
UseConstructor completed in 0 seconds
UseVectorPushBack completed in 2.714 seconds
The whole thing completed in 3.352 seconds
So in summary: re-think your algorithm, otherwise, perhaps resort to a custom wrapper around New[]/Delete[]. In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.
In the case when you just started with vectors it might be surprising how they behave, for example this code:
class U{
int i_;
public:
U(){}
U(int i) : i_(i) {cout << "consting " << i_ << endl;}
U(const U& ot) : i_(ot.i_) {cout << "copying " << i_ << endl;}
};
int main(int argc, char** argv)
{
std::vector<U> arr(2,U(3));
arr.resize(4);
return 0;
}
results with:
consting 3
copying 3
copying 3
copying 548789016
copying 548789016
copying 3
copying 3

Vectors guarantee that the underlying data is a contiguous block in memory. The only sane way to guarantee this is by implementing it as an array.
Memory reallocation on pushing new elements can happen, because the vector can't know in advance how many elements you are going to add to it. But when you know it in advance, you can call reserve with the appropriate number of entries to avoid reallocation when adding them.
Vectors are usually preferred over arrays because they allow bound-checking when accessing elements with .at(). That means accessing indices outside of the vector doesn't cause undefined behavior like in an array. This bound-checking does however require additional CPU cycles. When you use the []-operator to access elements, no bound-checking is done and access should be as fast as an array. This however risks undefined behavior when your code is buggy.

People who invented STL, and then made it into the C++ standard library, are expletive deleted smart. Don't even let yourself imagine for one little moment you can outperform them because of your superior knowledge of legacy C arrays. (You would have a chance if you knew some Fortran though).
With std::vector, you can allocate all memory in one go, just like with C arrays. You can also allocate incrementally, again just like with C arrays. You can control when each allocation happens, just like with C arrays. Unlike with C arrays, you can also forget about it all and let the system manage the allocations for you, if that's what you want. This is all absolutely necessary, basic functionality. I'm not sure why anyone would assume it is missing.
Having said all that, go with arrays if you find them easier to understand.

I am not really advising you to go either for arrays or vectors, because I think that for your needs they may not be totally fit.
You need to be able to organize your data efficiently, so that queries would not need to scan the whole memory range to get the relevant data. So you want to group the points which are more likely to be selected together close to each other.
If your dataset is static, then you can do that sorting offline, and make your array nice and tidy to be loaded up in memory at your application start up time, and either vector or array would work (provided you do the reserve call up front for vector, since the default allocation growth scheme double up the size of the underlying array whenever it gets full, and you wouldn't want to use up 16Gb of memory for only 9Gb worth of data).
But if your dataset is dynamic, it will be difficult to do efficient inserts in your set with a vector or an array. Recall that each insert within the array would create a shift of all the successor elements of one place. Of course, an index, like the kd tree you mention, will help by avoiding a full scan of the array, but if the selected points are scattered accross the array, the effect on memory and cache will essentially be the same. The shift would also mean that the index needs to be updated.
My solution would be to cut the array in pages (either list linked or array indexed) and store data in the pages. That way, it would be possible to group relevant elements together, while still retaining the speed of contiguous memory access within pages. The index would then refer to a page and an offset in that page. Pages wouldn't be filled automatically, which leaves rooms to insert related elements, or make shifts really cheap operations.
Note that if pages are always full (excepted for the last one), you still have to shift every single one of them in case of an insert, while if you allow incomplete pages, you can limit a shift to a single page, and if that page is full, insert a new page right after it to contain the suplementary element.
Some things to keep in mind:
array and vector allocation have upper limits, which is OS dependent (these limits might be different)
On my 32bits system, the maximum allowed allocation for a vector of 3D points is at around 180 millions entries, so for larger datasets, on would have to find a different solution. Granted, on 64bits OS, that amount might be significantly larger (On windows 32bits, the maximum memory space for a process is 2Gb - I think they added some tricks on more advanced versions of the OS to extend that amount). Admittedly memory will be even more problematic for solutions like mine.
resizing of a vector requires allocating the new size of the heap, copy the elements from the old memory chunck to the new one.
So for adding just one element to the sequence, you will need twice the memory during the resizing. This issue may not come up with plain arrays, which can be reallocated using the ad hoc OS memory functions (realloc on unices for instance, but as far as I know that function doesn't make any guarantee that the same memory chunck will be reused). The problem might be avoided in vector as well if a custom allocator which would use the same functions is used.
C++ doesn't make any assumption about the underlying memory architecture.
vectors and arrays are meant to represent contiguous memory chunks provided by an allocator, and wrap that memory chunk with an interface to access it. But C++ doesn't know how the OS is managing that memory. In most modern OS, that memory is actually cut in pages, which are mapped in and out of physical memory. So my solution is somehow to reproduce that mechanism at the process level. In order to make the paging efficient, it is necessary to have our page fit the OS page, so a bit of OS dependent code will be necessary. On the other hand, this is not a concern at all for a vector or array based solution.
So in essence my answer is concerned by the efficiency of updating the dataset in a manner which will favor clustering points close to each others. It supposes that such clustering is possible. If not the case, then just pushing a new point at the end of the dataset would be perfectly alright.

Although I do not know the exact implementation of std:vector, most list systems like this are slower than arrays as they allocate memory when they are resized, normally double the current capacity although this is not always the case.
So if the vector contains 16 items and you added another, it needs memory for another 16 items. As vectors are contiguous in memory, this means that it will allocate a solid block of memory for 32 items and update the vector. You can get some performance improvements by constructing the std:vector with an initial capacity that is roughly the size you think your data set will be, although this isn't always an easy number to arrive at.

For operation that are common between vectors and arrays (hence not push_back or pop_back, since array are fixed in size) they perform exactly the same, because -by specification- they are the same.
vector access methods are so trivial that the simpler compiler optimization will wipe them out.
If you know in advance the size of a vector, just construct it by specifyinfg the size or just call resize, and you will get the same you can get with a new [].
If you don't know the size, but you know how much you will need to grow, just call reserve, and you get no penality on push_back, since all the required memory is already allocated.
In any case, relocation are not so "dumb": the capacity and the size of a vector are two distinct things, and the capacity is typically doubled upon exhaustion, so that relocation of big amounts are less and less frequent.
Also, in case you know everything about sizes, and you need no dynamic memory and want the same vector interface, consider also std::array.

Sounds like you need gigs of RAM so you're not paging. I tend to go along with #Philipp's answer, because you really really want to make sure it's not re-allocating under the hood
but
what's this about a tree that needs rebalancing?
and you're even thinking about compiler optimization?
Please take a crash course in how to optimize software.
I'm sure you know all about Big-O, but I bet you're used to ignoring the constant factors, right? They might be out of whack by 2 to 3 orders of magnitude, doing things you never would have thought costly.
If that translates to days of compute time, maybe it'll get interesting.
And no compiler optimizer can fix these things for you.
If you're academically inclined, this post goes into more detail.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js