How does the capacity of std::vector grow automatically? What is the rate? - c++

I had been going through the Book:
C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:
#include <vector>
#include <iostream>
using namespace std;
int main() {
vector<int> ivec;
cout < "ivec: size: " < ivec.size() < " capacity: " < ivec.capacity() < endl;
for (int ix = 0; ix < 24; ++ix) {
ivec.push_back(ix);
cout < "ivec: size: " < ivec.size()
< " capacity: " < ivec.capacity() < endl;
}
}
Later within that article:
"Under the Rogue Wave implementation, both the size and the capacity
of ivec after its definition are 0. On inserting the first element,
however, ivec's capacity is 256 and its size is 1."
But, on correcting and running the code i get the following output:
ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32
Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.

The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.
Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.
This probably seems a bit odd, so let me explain to you how this works...
size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
...
size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
...
size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82
As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.
If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).
I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.

To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.
Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.
(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).

Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.
Grow by factor of K
y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y = + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)
K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)
T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)
K/(K-1) is a constant, and see the most common cases:
K=2, T(n) = 2/(2-1) = 2
K=1.5, T(n) = 1.5/(1.5-1) = 3
and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory
Grow by constant quantity of C
y = C + 2*C + 3*C + 4*C + ... (n/C) * C
= C(1+2+3+...+n/C), say m = n/C
= C*(m*(m-1)/2)
= n(m-1)/2
T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)
As we could see it is liner

The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..

Are you using the "Rogue Wave" implementation?
How capacity grows is up to the implementation. Yours use 2^N.

Yes, the capacity doubles each time it is exceeded. This is implementation dependent.

before pushing back an element the vector check if the size is greater than it's capacity like bellow
i will explain it with reserve function :
void push_back(const value_type &val) //push_back actual prototype
{
if (size_type < 10)
reserve(size_type + 1);
else if (size_type > (_capacity / 4 * 3))
reserve(_capacity + (this->_capacity / 4));
//then the vector get filled with value
}
size_type : the vector size.
_capacity : the vector _capacity.

Related

For finding top K elements using heap, which approach is better - NlogK or KLogN?

For finding top K elements using heap, which approach is better?
NlogK, use Minheap of size K and remove the minimum element so top k elements remain in heap
KlogN, use Maxheap, store all elements and then extract top K elements
I did some calculations and at no point, I see that NLogK better than KlogN.
N= 16 (2^4), k = 8 (2^3)
O(Nlog(K)) = 16* 3 = 48
O(Klog(N)) = 8 * 4 = 32
N= 16 (2^4), k = 12 (log to base 2 = 3.5849)
O(Nlog(K)) = 16* 3.5849 = 57.3584
O(Klog(N)) = 12 * 4 = 48
N= 256 (2^8), k = 4 (2^2)
O(Nlog(K)) = 256* 2 = 512
O(Klog(N)) = 4 * 8 = 32
N= 1048576 (2^20), k = 16 (2^4)
O(Nlog(K)) = 1048576* 4 = 4194304
O(Klog(N)) = 16 * 20 = 320
N= 1048576 (2^20), k = 1024 (2^10)
O(Nlog(K)) = 1048576* 10 = 10485760
O(Klog(N)) = 1024 * 20 = 20480
N= 1048576 (2^20), k = 524288 (2^19)
O(Nlog(K)) = 1048576* 19 = 19922944
O(Klog(N)) = 524288 * 20 = 10485760
I just wanted to confirm that my approach is correct and adding all elements to heap and extract top k elements is always the best approach. (and also simpler one)
I did some calculations and at no point, I see that NLogK better than KlogN.
Since, K <= N, NlogK will always be more than or equal to KlogN.
That does not mean, min heap approach is going to take more time than max heap approch.
Need to consider the below,
In Min heap approach, we will update the heap only if the next value is larger than the head. If the array is in ascending order, we will do it (N-K) times, if it is descending order we will not update it at all. On average, the number of times the tree gets updated is considerable less than N.
In Max heap, you need to heapify the tree of size N. If K is negligibly small when compared to N, then this time can become a dominant factor. While in the case of min heap, heapify works on the smaller set of K. Also as mentioned in point 1, most of the value of N will not trigger an update of the tree.
I wrote a small program to compare both the approach. The source can be accessed here.
Results for an array ranging from 0 to 1M in random order is below:
Test for Array size: 1000000
k MinHeapIterations MinHeapTime(ms) MaxHeapTime(ms) MaxTime/MinTime
1 15 6.07 72.03 11.88
10 114 3.85 70.09 18.19
100 913 4.11 69.60 16.93
1000 6874 5.32 72.94 13.71
10000 46123 16.52 79.89 4.83
100000 230385 78.19 132.27 1.69
1000000 0 35.86 453.57 12.65
As you can see,
Min heap does outperforms Max heap for all values of K
The no of tree updates incase of Min heap (MinHeapIterations) is very much less than (N-K)

What is the maximum number of comparisons to heapify an array?

Is there a general formula to calculate the maximum number of comparisons to heapify n elements?
If not, is 13 the max number of comparisons to heapify an array of 8 elements?
My reasoning is as such:
at h = 0, 1 node, 0 comparisons, 1* 0 = 0 comparisons
at h = 1, 2 nodes, 1 comparison each, 2*1 = 2 comparisons
at h = 2, 4 nodes, 2 comparisons each, 4*2 = 8 comparisons
at h = 3, 1 node, 3 comparisons each, 1*3 = 3 comparisons
Total = 0 + 2 + 8 + 3 =13
Accepted theory is that build-heap requires at most (2N - 2) comparisons. So the maximum number of comparisons required should be 14. We can confirm that easily enough by examining a heap of 8 elements:
7
/ \
3 1
/ \ / \
5 4 8 2
/
6
Here, the 4 leaf nodes will never move down. The nodes 5 and 1 can move down 1 level. 3 could move down two levels. And 7 could move down 3 levels. So the maximum number of level moves is:
(0*4)+(1*2)+(2*1)+(3*1) = 7
Every level move requires 2 comparisons, so the maximum number of comparisons would be 14.

Downscale array for decimal factor

Is there efficient way to downscale number of elements in array by decimal factor?
I want to downsize elements from one array by certain factor.
Example:
If I have 10 elements and need to scale down by factor 2.
1 2 3 4 5 6 7 8 9 10
scaled to
1.5 3.5 5.5 7.5 9.5
Grouping 2 by 2 and use arithmetic mean.
My problem is what if I need to downsize array with 10 elements to 6 elements? In theory I should group 1.6 elements and find their arithmetic mean, but how to do that?
Before suggesting a solution, let's define "downsize" in a more formal way. I would suggest this definition:
Downsizing starts with an array a[N] and produces an array b[M] such that the following is true:
M <= N - otherwise it would be upsizing, not downsizing
SUM(b) = (M/N) * SUM(a) - The sum is reduced proportionally to the number of elements
Elements of a participate in computation of b in the order of their occurrence in a
Let's consider your example of downsizing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to six elements. The total for your array is 55, so the total for the new array would be (6/10)*55 = 33. We can achieve this total in two steps:
Walk the array a totaling its elements until we've reached the integer part of N/M fraction (it must be an improper fraction by rule 1 above)
Let's say that a[i] was the last element of a that we could take as a whole in the current iteration. Take the fraction of a[i+1] equal to the fractional part of N/M
Continue to the next number starting with the remaining fraction of a[i+1]
Once you are done, your array b would contain M numbers totaling to SUM(a). Walk the array once more, and scale the result by N/M.
Here is how it works with your example:
b[0] = a[0] + (2/3)*a[1] = 2.33333
b[1] = (1/3)*a[1] + a[2] + (1/3)*a[3] = 5
b[2] = (2/3)*a[3] + a[4] = 7.66666
b[3] = a[5] + (2/3)*a[6] = 10.6666
b[4] = (1/3)*a[6] + a[7] + (1/3)*a[8] = 13.3333
b[5] = (2/3)*a[8] + a[9] = 16
--------
Total = 55
Scaling down by 6/10 produces the final result:
1.4 3 4.6 6.4 8 9.6 (Total = 33)
Here is a simple implementation in C++:
double need = ((double)a.size()) / b.size();
double have = 0;
size_t pos = 0;
for (size_t i = 0 ; i != a.size() ; i++) {
if (need >= have+1) {
b[pos] += a[i];
have++;
} else {
double frac = (need-have); // frac is less than 1 because of the "if" condition
b[pos++] += frac * a[i]; // frac of a[i] goes to current element of b
have = 1 - frac;
b[pos] += have * a[i]; // (1-frac) of a[i] goes to the next position of b
}
}
for (size_t i = 0 ; i != b.size() ; i++) {
b[i] /= need;
}
Demo.
You will need to resort to some form of interpolation, as the number of elements to average isn't integer.
You can consider computing the prefix sum of the array, i.e.
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
yields by summation
0 1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36 45 55
Then perform linear interpolation to get the intermediate values that you are lacking, like at 0*, 10/6, 20/6, 30/5*, 40/6, 50/6, 60/6*. (Those with an asterisk are readily available).
0 1 10/6 2 3 20/6 4 5 6 40/6 7 8 50/6 9
1 3 15/3 6 10 35/3 15 21 28 100/3 36 45 145/3 55
Now you get fractional sums by subtracting values in pairs. The first average is
(15/3-1)/(10/6) = 12/5
I can't think of anything in the C++ library that will crank out something like this, all fully cooked and ready to go.
So you'll have to, pretty much, roll up your sleeves and go to work. At this point, the question of what's the "efficient" way of doing it boils down to its very basics. Which means:
1) Calculate how big the output array should be. Based on the description of the issue, you should be able to make that calculation even before looking at the values in the input array. You know the input array's size(), you can calculate the size() of the destination array.
2) So, you resize() the destination array up front. Now, you no longer need to worry about the time wasted in growing the size of the dynamic output array, incrementally, as you go through the input array, making your calculations.
3) So what's left is the actual work: iterating over the input array, and calculating the downsized values.
auto b=input_array.begin();
auto e=input_array.end();
auto p=output_array.begin();
Don't see many other options here, besides brute force iteration and calculations. Iterate from b to e, getting your samples, calculating each downsized value, and saving the resulting value into *p++.

How to balance between two arrays such as the difference is minimized?

I have an array A[]={3,2,5,11,17} and B[]={2,3,6}, size of B is always less than A. Now I have to map from every element B to distinct elements of A such that the total difference sum( abs(Bi-Aj) ) becomes minimum (Where Bi has been mapped to Aj). What is the type of algorithm?
For the example input, I could select, 2->2=0 , 3->3=0 and then 6->5=1. So the total cost is 0+0+1 = 1. I have been thinking sorting both the arrays and then take the first sizeof B elements from the A. Will this work?
It can be thought of as an unbalanced Assignment Problem.
The cost matrix shall be the difference in values of B[i] and A[j]. You can add dummy elements to B so that the problem becomes balanced and put the costs associated very high.
Then Hungarian Algorithm can be applied to solve it.
For the example case A[]={3,2,5,11,17} and B[]={2,3,6} the cost matrix shall be:
. 3 2 5 11 17
2 1 0 3 9 15
3 0 1 2 8 14
6 3 4 1 5 11
d1 16 16 16 16 16
d2 16 16 16 16 16

In Doom3's source code, why did they use bitshift to generate the number instead of hardcoding it?

Why did they do this:
Sys_SetPhysicalWorkMemory( 192 << 20, 1024 << 20 ); //Min = 201,326,592 Max = 1,073,741,824
Instead of this:
Sys_SetPhysicalWorkMemory( 201326592, 1073741824 );
The article I got the code from
A neat property is that shifting a value << 10 is the same as multiplying it by 1024 (1 KiB), and << 20 is 1024*1024, (1 MiB).
Shifting by successive powers of 10 yields all of our standard units of computer storage:
1 << 10 = 1 KiB (Kibibyte)
1 << 20 = 1 MiB (Mebibyte)
1 << 30 = 1 GiB (Gibibyte)
...
So that function is expressing its arguments to Sys_SetPhysicalWorkMemory(int minBytes, int maxBytes) as 192 MB (min) and 1024 MB (max).
Self commenting code:
192 << 20 means 192 * 2^20 = 192 * 2^10 * 2^10 = 192 * 1024 * 1024 = 192 MByte
1024 << 20 means 1024 * 2^20 = 1 GByte
Computations on constants are optimized away so nothing is lost.
I might be wrong (and I didn't study the source) , but I guess it's just for readability reasons.
I think the point (not mentioned yet) is that
All but the most basic compilers will do the shift at compilation time. Whenever you use operators with constant expressions, the
compiler will be able to do this before the code is even generated.
Note, that before constexpr and C++11, this did not extend to
functions.