multi dimensional array allocation in chunks - c++

I was poking around with multidimensional arrays today, and i came across blog which distinguishes rectangular arrays, and jagged arrays; usually i would do this on both jagged and rectangular:
Object** obj = new Obj*[5];
for (int i = 0; i < 5; ++i)
{
obj[i] = new Obj[10];
}
but in that blog it was said that if i knew that the 2d array was rectangular then i'm better off allocating the entire thing in a 1d array and use an improvised way of accessing the elements, something like this:
Object* obj = new Obj[rows * cols];
obj[x * cols + y];
//which should have been obj[x][y] on the previous implementation
I somehow have a clue that allocating a continuous memory chunk would be good, but i don't really understand how big of a difference this would make, can somebody explain?

First and less important, when you allocate and free your object you only need to do a single allocation/deallocation.
More important: when you use the array you basically get to trade a multiplication against a memory access. On modern computers, memory access is much much much slower than arithmetic.
That's a bit of a lie, because much of the slowness of memory accesses gets hidden by caches -- regions of memory that are being accessed frequently get stored in fast memory inside, or very near to, the CPU and can be accessed faster. But these caches are of limited size, so (1) if your array isn't being used all the time then the row pointers may not be in the cache and (2) if it is being used all the time then they may be taking up space that could otherwise be used by something else.
Exactly how it works out will depend on the details of your code, though. In many cases it will make no discernible difference one way or the other to the speed of your program. You could try it both ways and benchmark.
[EDITED to add, after being reminded of it by Peter Schneider's comment:] Also, if you allocate each row separately they may end up all being in different parts of memory, which may make your caches a bit less effective -- data gets pulled into cache in chunks, and if you often go from the end of one row to the start of the next then you'll benefit from that. But this is a subtle one; in some cases having your rows equally spaced in memory may actually make the cache perform worse, and if you allocate several rows in succession they may well end up (almost) next to one another in memory anyway, and in any case it probably doesn't matter much unless your rows are quite short.

Allocating a 2D array as a one big chunk permits the compiler to generate a more efficient code than doing it in multiple chunks. At least, there would be one pointer dereferencing operation in one chunk approach. BTW, declaring the 2D array like this:
Object obj[rows][cols];
obj[x][y];
is equivalent to:
Object* obj = new Obj[rows * cols];
obj[x * cols + y];
in terms of speed. But the first one in not dynamic (you need to specify the values of "rows" and "cols" at compile time.

By having one large contiguous chunk of memory, you may get improved performance because there is more chance that memory accesses are already in the cache. This idea is called cache locality. We say the large array has better cache locality. Modern processors have several levels of cache. The lowest level is generally the smallest and the fastest.
It still pays to access the array in meaningful ways. For example, if data is stored in row-major order and you access it in column-major order, you are scattering your memory accesses. At certain sizes, this access pattern will negate the advantages of caching.
Having good cache performance is far preferable to any concerns you may have about multiplying values for indexing.

If one of the dimensions of your array is a compile time constant you can allocate a "truly 2-dimensional array" in one chunk dynamically as well and then index it the usual way. Like all dynamic allocations of arrays, new returns a pointer to the element type. In this case of a 2-dimensional array the elements are in turn arrays -- 1-dimensional arrays. The syntax of the resulting element pointer is a bit cumbersome, mostly because the dereferencing operator*() has a lower precedence than the indexing operator[](). One possible allocation statement could be int (*arr7x11)[11] = new int[7][11];.
Below is a complete example. As you see, the innermost index in the allocation can be a run-time value; it determines the number of elements in the allocated array. The other indices determine the element type (and hence element size as well as overall size) of the dynamically allocated array, which of course must be known to perform the allocation. As discussed above, the elements are themselves arrays, here 1-dimensional arrays of 11 ints.
#include<cstdio>
using namespace std;
int main(int argc, char **argv)
{
constexpr int cols = 11;
int rows = 7;
// overwrite with cmd line arg if present.
// if scanf fails, default is retained.
if(argc >= 2) { sscanf(argv[1], "%d", &rows); }
// The actual allocation of "rows" elements of
// type "array of 'cols' ints". Note the brackets
// around *arr7x11 in order to force operator
// evaluation order. arr7x11 is a pointer to array,
// not an array of pointers.
int (*arr7x11)[cols] = new int[rows][cols];
for(int row = 0; row<rows; row++)
{
for(int col = 0; col<cols; col++)
{
arr7x11[row][col] = (row+1)*1000 + col+1;
}
}
for(int row = 0; row<rows; row++)
{
for(int col = 0; col<cols; col++)
{
printf("%6d", arr7x11[row][col]);
}
putchar('\n');
}
return 0;
}
A sample session:
g++ -std=c++14 -Wall -o 2darrdecl 2darrdecl.cpp && ./2darrdecl 3
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011

Related

How Physically are Arrays Stored (Specifically with dimensions greater than 2)?

What I Know
I know that arrays int ary[] can be expressed in the equivalent "pointer-to" format: int* ary. However, what I would like to know is that if these two are the same, how physically are arrays stored?
I used to think that the elements are stored next to each other in the ram like so for the array ary:
int size = 5;
int* ary = new int[size];
for (int i = 0; i < size; i++) { ary[i] = i; }
This (I believe) is stored in RAM like: ...[0][1][2][3][4]...
This means we can subsequently replace ary[i] with *(ary + i) by just increment the pointers' location by the index.
The Issue
The issue comes in when I am to define a 2D array in the same way:
int width = 2, height = 2;
Vector** array2D = new Vector*[height]
for (int i = 0; i < width; i++) {
array2D[i] = new Vector[height];
for (int j = 0; j < height; j++) { array2D[i][j] = (i, j); }
}
Given the class Vector is for me to store both x, and y in a single fundamental unit: (x, y).
So how exactly would the above be stored?
It cannot logically be stored like ...[(0, 0)][(1, 0)][(0, 1)][(1, 1)]... as this would mean that the (1, 0)th element is the same as the (0, 1)th.
It cannot also be stored in a 2d array like below, as the physical RAM is a single 1d array of 8 bit numbers:
...[(0, 0)][(1, 0)]...
...[(0, 1)][(1, 1)]...
Neither can it be stored like ...[&(0, 0)][&(1, 0)][&(0, 1)][&(1, 1)]..., given &(x, y) is a pointer to the location of (x, y). This would just mean each memory location would just point to another one, and the value could not be stored anywhere.
Thank you in advanced.
What OP is struggling with a dynamically allocated array of pointers to dynamically allocated arrays. Each of these allocations is its own block of memory sitting somewhere in storage. There is no connection between them other than the logical connection established by the pointers in the outer array.
To try to visualize this say we make
int ** twodee;
twodee = new int*[4];
for (int i = 0; i < 4; i++)
{
twodee[i] = new int[4];
}
and then
int count = 1;
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
{
twodee[i][j] = count++;
}
}
so we should wind up with twodee looking something like
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
right?
Logically, yes. But laid out in memory twodee might look something like this batsmurph crazy mess:
You can't really predict where your memory will be, you're at the mercy of the whatever memory manager handles the allocations and what already in storage where it might have been efficient for your memory to go. This makes laying dynamically-allocated multi-dimensional arrays out in your head almost a waste of time.
And there are a whole lot of things wrong with this when you get down into the guts of what a modern CPU can do for you. The CPU has to hop around a lot, and when it's hopping, it's ability to predict and preload the cache with memory you're likely to need in the near future is compromised. This means your gigahertz computer has to sit around and wait on your megahertz RAM a lot more than it should have to.
Try to avoid this whenever possible by allocating single, contiguous blocks of memory. You may pick up a bit of extra code mapping one dimensional memory over to other dimensions, but you don't lose any CPU time. C++ will have generated all of that mapping math for you as soon as you compiled [i][j] anyway.
The short answer to your question is: It is compiler dependent.
A more helpful answer (I hope) is that you can create 2D arrays that are layed out directly in memory, or you can create "2D arrays" that are actually 1D arrays, some with data, some with pointers to arrays.
There is a convention that the compiler is happy to generate the right kind of code to dereference and/or calculate the address of an element within an array when you use brackets to access an element in the array.
Generally arrays that are known to be 2D at compile time (eg int array2D[a][b]) will be layed out in memory without extra pointers and the compiler knows to multiply AND add to get an address each time there is an access. If your compiler isn't good at optimizing out the multiply, it makes repeated accesses much slower than they can be, so in the old days we often did pointer math ourselves to avoid the multiply if possible.
There is the issue that a compiler might optimize by rounding the lower dimension size up to a power of two, so a shift can be used instead of multiply, which would then require padding the locations (then even though they are all in one memory block, there are meaningless holes).
(Also, I'm pretty sure I've run into the problem that within a procedure, it needs to know which way the 2D array really is, so you may need to declare parameters in a way that lets the compiler know how to code the procedure, eg a[][] is different from *a[]). And obviously you can actually get the pointer from the array of pointers, if that is what you want--which isn't the same thing as the array it points too, of course.
In your code, you have clearly declared a full set of the lower dimension 1D arrays (inside the loop), and you have ALSO declared another 1D array of pointers you use to get to each one without a mulitply--instead by a dereference. So all those things will be in memory. Each 1D array will surely be sequentially layed out in a contiguous block of memory. It is just that it is entirely up to the memory manager as to where those 1D arrays are, relative to each other. (I doubt a compiler is smart enough to actually do the "new" ops at compile time, but it is theoretically possible, and would obviously affect/control the behavior if it did.)
Using the extra array of pointers clearly avoids the multiply ever and always. But it takes more space, and for sequential access actually makes the accesses slower and bigger (the extra dereference) versus maintaining a single pointer and one dereference.
Even if the 1D arrays DO end up contiguous sometimes, you might break it with another thread using the same memory manager, running a "new" while your "new" inside the loop is repeating.

trim array to elements between i and j

A classic, I'm looking for optimisation here : I have an array of things, and after some processing I know I'm only interested in elements i to j. How to trim my array in the fatset, lightest way, with complete deletions/freeing of memory of elements before i and after j ?
I'm doing mebedded C++, so I may not be able to compile all sorts of library let's say. But std or vector things welcome in a first phase !
I've tried, for array A to be trimmed between i and j, with variable numElms telling me the number of elements in A :
A = &A[i];
numElms = i-j+1;
As it is this yields an incompatibility error. Can that be fixed, and even when fixed, does that free the memory at all for now-unused elements?
A little context : This array is the central data set of my module, and it can be heavy. It will live as long as the module lives. And there's no need to carry dead weight all this time. This is the very first thing that is done - figuring which segment of the data set has to be at all analyzed, and trimming and dumping the rest forever, never to use it again (until the next cycle where we get a fresh array with possibily a compeltely different size).
When asking questions about speed your millage may very based on the size of the array you're working with, but:
Your fastest way will be to not trim the array, just use A[index + i] to find the elements you want.
The lightest way to do this would be to:
Allocate a dynamic array with malloc
Once i and j are found copy that range to the head of the dynamic array
Use realloc to resize the dynamic array to the size j - i + 1
However you have this tagged as C++ not C, so I believe that you're also interested in readability and the required programming investment, not raw speed or weight. If this is true then I would suggest use of a vector or deque.
Given vector<thing> A or a deque<thing> A you could do:
A.erase(cbegin(A), next(cbegin(A), i));
A.resize(j - i + 1);
There is no way to change aloocated memory block size in standard C++ (unless you have POD data — in this case C facilities like realloc could be used). The only way to trim an array is to allocate new array. copy/move needed elements and destroy old array.
You can do it manually, or using vectors:
int* array = new int[10]{0,1,2,3,4,5,6,7,8,9};
std::vector<int> vec {0,1,2,3,4,5,6,7,8,9};
//We want only elements 3-5
{
int* new_array = new int[3];
std::copy(array + 3, array + 6, new_array);
delete[] array;
array = new_array;
}
vec = std::vector<int>(vec.begin()+3, vec.begin()+6);
If you are using C++11, both approaches should have same perfomance.
If you only want to remove extra elements and do not really want to release memory (for example you might want to add more elements later) you can follow NathanOliver link
However, you should consider: do you really need that memory freed immideately? Do you need to move elements right now? Will you array live for such long time that this memory would be lost for your program completely? Maybe you need a range or perharps a view to the array content? In many cases you can store two pointers (or pointer and size) to denote your "new" array, while keeping old one to be released all at once.

Performance comparison of STL sort on vector of strings vs. vector of string pointers

I tried to compare the performance of STL sort on vector of strings and vector of pointers to strings.
I expected the pointers version to outperform, but the actual results for 5 million randomly generated strings are
vector of strings : 12.06 seconds
vector of pointers to strings : 16.75 seconds
What explains this behavior? I expected swapping pointers to strings should be faster than swapping string objects.
The 5 million strings were generated by converting random integers.
Compiled with (gcc 4.9.3): g++ -std=c++11 -Wall
CPU: Xeon X5650
// sort vector of strings
int main(int argc, char *argv[])
{
const int numElements=5000000;
srand(time(NULL));
vector<string> vec(numElements);
for (int i = 0; i < numElements; i++)
vec[i] = std::to_string(rand() % numElements);
unsigned before = clock();
sort(vec.begin(), vec.end());
cout<< "Time to sort: " << clock() - before << endl;
for (int i = 0; i < numElements; i++)
cout << vec[i] << endl;
return 0;
}
// sort vector of pointers to strings
bool comparePtrToString (string *s1, string *s2)
{
return (*s1 < *s2);
}
int main(int argc, char *argv[])
{
const int numElements=5000000;
srand(time(NULL));
vector<string *> vec(numElements);
for (int i = 0; i < numElements; i++)
vec[i] = new string( to_string(rand() % numElements));
unsigned before = clock();
sort(vec.begin(), vec.end(), comparePtrToString);
cout<< "Time to sort: " << clock() - before << endl;
for (int i = 0; i < numElements; i++)
cout << *vec[i] << endl;
return 0;
}
This is because all the operations that sort performs on strings is moves and swaps. Both move and swap for an std::string are constant time operations, meaning that they only involve changing some pointers.
Therefore, for both sorts moving of the data has the same performance overhead. However, in case of pointers to strings you pay some extra cost to dereference the pointers on each comparison, which causes it to be noticeably slower.
In the first case the internal pointers to representations of the strings are swapped and not the complete data copied.
You should not expect any benefit from the implementation with pointers, which in fact is slower, since the pointers have to be dereferenced additionally, to perform the comparison.
What explains this behavior? I expected swapping pointers to strings
should be faster than swapping string objects.
There's various things going on here which could impact performance.
Swapping is relatively cheap both ways. Swapping strings tends to always be a shallow operation (just swapping PODs like pointers and integrals) for large strings and possibly deep for small strings (but still quite cheap -- implementation-dependent). So swapping strings tends to be pretty cheap overall, and typically not much more expensive than simply swapping pointers to them*.
[sizeof(string) is certanly bigger than sizeof(string*), but it's not an astronomical difference basically as the operation still
occurs in constant-time, and quite a bit cheaper in this context
when the string fields already have to be fetched into a faster form
of memory for the comparator, giving us temporal locality with
respect to its fields.]
String contents must be accessed anyway both ways. Even the pointer version of your comparator has to examine the string contents (including the fields designating size and capacity). As a result, we end up paying the memory cost of fetching the data for the string contents regardless. Naturally if you just sorted the strings by pointer address (ex: without using a comparator) instead of a lexicographical comparison of the string contents, the performance edge should shift towards the pointer version since that would reduce the amount of data accessed considerably while improving spatial locality (more pointers can fit in a cache line than strings, e.g.).
The pointer version is scattering (or at least increasing the stride of) the string fields in memory. For the pointer version, you're allocating each string on the free store (in addition to the string contents which may or may not be allocated on the free store). That can disperse the memory and reduce locality of reference, so you're potentially incurring a greater cost in the comparator that way with increased cache misses. Even if a sequential allocation of this sort results in a very contiguous set of pages being allocated (ideal scenario), the stride to get from one string's fields to the next would tend to get at least a little larger because of the allocation metadata/alignment overhead (not all allocators require metadata to be stored directly in a chunk, but typically they will at least add some small overhead to the chunk size).
It might be simpler to attribute this to the cost of dereferencing the pointer but it's not so much the cost of the mov/load instruction doing the memory addressing that's expensive (in this relative context) as loading from slower/bigger forms of memory that aren't already cached/paged to faster, smaller memory. Allocating each string individually on the free store will typically increase this cost whether it's due to a loss of contiguity or a larger constant stride between each string entry (in an ideal case).
Even at a basic level without trying too hard to diagnose what's happening at the memory level, this increases the total size of the data that the machine has to look at (string contents/fields + pointer address) in addition to reduced locality/larger or variable strides (typically if you increase the amount of data accessed, it has to at least have improved locality to have a good chance of being beneficial). You might start to see more comparable times if you just sorted pointers to strings that were allocated contiguously (not in terms of the string contents which we have no control over, but just contiguous in terms of the adjacent string objects themselves -- effectively pointers to strings stored in an array). Then you'd get back the spatial locality at least for the string fields in addition to packing the data associated more tightly within a contiguous space.
Swapping smaller data types like indices or pointers can sometimes offer a benefit but they typically need to avoid examining the original contents of the data they refer to or provide a significantly cheaper swap/move behavior (in this case string is already cheap and becomes cheaper in this context considering temporal locality) or both.
Well, a std::string is typically about 3-4 times as big as a std::string*.
So just straight-up swapping two of the former shuffles that much more memory around.
But that is dwarfed by the following effects:
Locality of reference. You need to follow one more pointer to a random position to read the string.
More memory-usage: A pointer plus bookkeeping per allocation of each std::string.
Both put extra demand on caching, and the former cannot even be prefetched.
Swaping containers change just container's content, in string case is the pointer to first character of string, not whole string.
In case vectors of pointers of strings you performed one additional step - casting pointers

Speed difference of dynamic and classical multi-dimentional arrays

Are the usages (not creations) speed of dynamic and classical multi-dimensional arrays different in terms of speed?
I mean, for example, when I try to access all values in a three-dimensional array with the help of loops, Is there any speed difference between the arrays which created as dynamic and classical methods.
When I say "dynamic three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is created like this.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
When I say "classical three-dimensional array", I mean matris_cos[kuanta][d][angle_scale] is simply created like this.
float matris_cos[kuanta][d][angle_scale];
But please attention, I don't ask the creation speed of these arrays. I want to access the values of these arrays via some loops. Is there any speed difference when I try to access the values.
An array of pointers (to arrays of pointers) will require extra levels of indirection to access a random element, while a multi-dimensional array will require basic arithmetic (multiplication and pointer addition). On most modern platforms, indirection is likely to be slower unless you use cache-friendly access patterns. Also, all the elements of the multi-dimensional array will be contiguous, which could help caching if you iterate over the whole array.
Whether this difference is measurable or not is something you can only tell by measuring it.
If the extra indirection does prove to be a bottleneck, you could replace the array-of-pointers with a class to represent the multi-dimensional array with a flat array:
class array_3d {
size_t d1,d2,d3;
std::vector<float> flat;
public:
array_3d(size_t d1, size_t d2, size_t d3) :
d1(d1), d2(d2), d3(d3), flat(d1*d2*d3)
{}
float & operator()(size_t x, size_t y, size_t z) {
return flat[x*d2*d3 + y*d3 + z];
}
// and a similar const overload
};
I believe that the next C++ standard (due next year) will include dynamically sized arrays, so you should be able to use the multi-dimensional form in all cases.
You won't be able to spot any difference between them in a typical application unless your arrays are pretty huge and you spend a lot of time reading/writing to them, but nonetheless, there is a difference.
float matris_cos[kuanta][d][angle_scale];
1) The memory for this multidimensional array will be contiguous. There will be less cache misses as a result.
2) The array will require space only for the floats themselves.
matris_cos = new float**[kuanta];
for (int i = 0; i < kuanta; ++i) {
matris_cos[i] = new float*[d];
for (int j = 0; j < d; ++j)
matris_cos[i][j] = new float[angle_scale];
}
1) The memory for this multidimensional array is allocated in blocks and is thus much less likely to be contiguous. This may result in cache misses.
2) This method requires space for the pointers as well as the floats themselves.
Since there's indirection in the second case, you can expect a tiny speed difference when attempting to access or change values.
To recap:
Second case uses more memory
Second case involves indirection
Second case does not have guaranteed cache locality.

Why Maintaining Sorted Array is faster than Vector in C++

I am creating an array and vector of size 100 and generating a random value and trying to maintain both array and vector as sorted.
Here is my code for the same
vector<int> myVector;
int arr[SIZE];
clock_t start, finish;
int random;
for(int i=0; i<SIZE;i++)
{
myVector.push_back(0);
arr[i] = 0;
}
//testing for Array
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > arr[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
arr[k] = arr[k-1];
}
arr[j] = random;
break;
}
}
}
finish = clock();
cout << "Array Time " << finish - start << endl;
//Vector Processing
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > myVector[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
myVector[k] = myVector[k-1];
}
myVector[j] = random;
break;
}
}
}
finish = clock();
cout << "Vector Time " << finish - start << endl;
The output is as follows:
Array Time : 5
Vector Time: 83
I am not able to understand why vector is so slow compared to array in this case?
Doesn't this contradict the thumb-rule of preferring Vector over Array.
Please Help !
First of all: Many rules of thumb in programming are not about ganing some milliseconds in performance, but about managing complexity, therefore avoiding bugs. In this case, it's about performing range checks wich most vector implementations do in debug mode, and wich arrays don't. It's also about memory management for dynamic arrays - vector does manage it's memory itself, while you have to do it manually in arrays at the risk of introducing memory leaks (ever forgot a delete[] or used delete instead? I be you have!). And it's about ease of use, e.g. resizing the vector or inserting element in the middle, wich is tedious work with manually managed arrays.
In other words, performance measurements can never ever contradict a rule of thumb, because a rule of thumb never targets performance. Performance measurements can only be one of the few possible reasons to not obey a coding guideline.
At first sight I'd guess you have not enabled optimizations. The main source of performance loss for the vector would then be index checks that many vector implementations have enabled for debug builds. Those won't kick in in optimized builds, so that should be your first concern. Rule of thumb: performance measurements without optimizations enabled are meaningless
If enabling optimizations still does show a better performance for the array, there's another difference:
The array is stored on the stack, so the compiler can directly use the adrresses and calculate address offsets at compiletime, while the vector elements are stored on the heap and the compiler will have to dereference the pointer stored in the vector. I'd expect the optimizer to dereference the pointer once and calculate the address offsets from that point on. Still, there might be a small performance penalty compared to compiletime-calculated address offsets, especially if the optimizer can unroll the loop a bit. This still does not contradict the rule of thumb, because you are comparing apples with pears here. The rule of thumb says,
Prefer std::vector over dynamic arrays, and prefer std::array over fixed arrays.
So either use a dynamically allocated array (including some kind of delete[], please) or compare the fixed size array to a std::array. In C++14, you'll have to consider new candidates in the game, namely std::dynarray and C++14 VLAs, non-resizable, runtime length arrays comparable to C's VLAs.
Update:
As was pointed out in the comments, optimizers are good at identifying code that has no side effects, like the operations on the array that you never read from. std::vector implementations are complicated enough that optimizers typically won't see through those several layers of indirection and optimize away all the insert, so you'll get zero time for the array compared to some time for the vector. Reading the array contens after the loop will disable such rude optimizations.
The vector class has to dynamically grow the memory, that may involve copying the whole thing from time to time.
Also it has to call internal functions for many operations - like reallocating.
Also it may have security functionality like boundary checks.
Meanwhile your array is preallocated and all your operations propably do not call any internal functions.
That is the overhead price for more functionality.
And who said that vectors should be faster than arrays in all cases?
Your array does not need to grow, thats a special case where arrays are indeed faster!
Because arrays are native data types, whereas the compiler can manipulate it directly from memory, they are managed internally by the compiled exec.
On the other hand, you get vector that is more like a class, template as I read, and it needs some management going through another header files and libraries.
Essentially native data type can be managed withouth including any headers, which make them easier to manipulate from the program, without having to use external code. Which makes the overhead on the vector time is the need for the program to look through the code and use the methods related to vector data type.
Every time you need to add more code to your app and operate from it, it will make your app performance to drop
You can read about it, here, here and here