I am bulkloading an R Tree with spatialindex (http://libspatialindex.github.com/) library:
string baseName = "streets";
size_t capacity = 10 * 1024 * 1024;
bool bWriteThrough = false;
indexIdentifier = 0;
IStorageManager *disk = StorageManager::createNewDiskStorageManager(baseName, 512);
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*disk, capacity, bWriteThrough);
// bulkLoads my tree
bulkLoadRTree();
cout << "tree info:" << endl;
cout << *tree << endl;
delete disk;
The following is output at the info about the built tree:
Dimension: 2
Fill factor: 0.7
Index capacity: 100
Leaf capacity: 100
Tight MBRs: enabled
Near minimum overlap factor: 32
Reinsert factor: 0.3
Split distribution factor: 0.4
Utilization: 69%
Reads: 1
Writes: 35980
Hits: 0
Misses: 0
Tree height: 4
Number of data: 2482376
Number of nodes: 35979
Level 0 pages: 35463
Level 1 pages: 507
Level 2 pages: 8
Level 3 pages: 1
Splits: 0
Adjustments: 0
Query results: 0
now I am trying to load what I have saved in the disk:
IStorageManager *ldisk = StorageManager::loadDiskStorageManager(baseName);
SpatialIndex::StorageManager::IBuffer* fileLoadBuffer = StorageManager
::createNewRandomEvictionsBuffer(*ldisk, capacity, bWriteThrough);
id_type id = 1;
tree = RTree::loadRTree(*fileLoadBuffer, id);
cout << *tree << endl;
and the tree has only one node (the output of the tree is:)
Dimension: 2
Fill factor: 0.7
Index capacity: 100
Leaf capacity: 100
Tight MBRs: enabled
Near minimum overlap factor: 32
Reinsert factor: 0.3
Split distribution factor: 0.4
Utilization: 0%
Reads: 0
Writes: 0
Hits: 0
Misses: 0
Tree height: 1
Number of data: 0
Number of nodes: 1
Level 0 pages: 1
Splits: 0
Adjustments: 0
Query results: 0
What do I do wrong? Why don't I load the whole tree from the disk?
Did you maybe not sync your changes to disc?
Plus, usually one would implement the tree on-disk, and not read it completely on the first access. So at this point, it cannot report accurate statistics.
Or maybe your bulkLoadRTree does not use fileInMem.
One has to delete the fileInMem so the pages are further sent back to disk and further sent back to delete *disk. This line needs to be added before delete disk:
delete fileInMem
Related
I am currently developing a chess engine in C++, and I am in the process of debugging my move generator. For this purpose, I wrote a simple perft() function:
int32_t Engine::perft(GameState game_state, int32_t depth)
{
int32_t last_move_nodes = 0;
int32_t all_nodes = 0;
Timer timer;
timer.start();
int32_t output_depth = depth;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
last_move_nodes = perft_no_print(game_state, depth - 1);
all_nodes += last_move_nodes;
std::cout << index_to_square_name(move.get_from_index()) << index_to_square_name(move.get_to_index()) << ": " << last_move_nodes << "\n";
game_state.unmake_move(move);
}
std::cout << "\nDepth: " << output_depth << "\nTotal nodes: " << all_nodes << "\nTotal time: " << timer.get_milliseconds() << "ms/" << timer.get_milliseconds()/1000.0f << "s\n\n";
return all_nodes;
}
int32_t Engine::perft_no_print(GameState game_state, int32_t depth)
{
int32_t nodes = 0;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
nodes += perft_no_print(game_state, depth - 1);
game_state.unmake_move(move);
}
return nodes;
}
It's results for the initial chess position (FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1) for depths 1 and 2 match the results of stockfish's perft command, so I assume they are correct:
h2h3: 1
h2h4: 1
g2g3: 1
g2g4: 1
f2f3: 1
f2f4: 1
e2e3: 1
e2e4: 1
d2d3: 1
d2d4: 1
c2c3: 1
c2c4: 1
b2b3: 1
b2b4: 1
a2a3: 1
a2a4: 1
g1h3: 1
g1f3: 1
b1c3: 1
b1a3: 1
Depth: 1
Total nodes: 20
Total time: 1ms/0.001s
h2h3: 20
h2h4: 20
g2g3: 20
g2g4: 20
f2f3: 20
f2f4: 20
e2e3: 20
e2e4: 20
d2d3: 20
d2d4: 20
c2c3: 20
c2c4: 20
b2b3: 20
b2b4: 20
a2a3: 20
a2a4: 20
g1h3: 20
g1f3: 20
b1c3: 20
b1a3: 20
Depth: 2
Total nodes: 400
Total time: 1ms/0.001s
The results stop matching at depth 3, though:
Stockfish:
go perft 3
a2a3: 380
b2b3: 420
c2c3: 420
d2d3: 539
e2e3: 599
f2f3: 380
g2g3: 420
h2h3: 380
a2a4: 420
b2b4: 421
c2c4: 441
d2d4: 560
e2e4: 600
f2f4: 401
g2g4: 421
h2h4: 420
b1a3: 400
b1c3: 440
g1f3: 440
g1h3: 400
Nodes searched: 8902
My engine:
h2h3: 361
h2h4: 380
g2g3: 340
g2g4: 397
f2f3: 360
f2f4: 436
e2e3: 380
e2e4: 437
d2d3: 380
d2d4: 437
c2c3: 399
c2c4: 326
b2b3: 300
b2b4: 320
a2a3: 280
a2a4: 299
g1h3: 281
g1f3: 280
b1c3: 357
b1a3: 320
Depth: 3
Total nodes: 7070
Total time: 10ms/0.01s
I figured that my move generator was just buggy, and tried to track down the bugs by making a move the engine gives incorrect values for on the board and then calling perft() with depth = 2 on it to find out which moves are missing. But for all moves I tried this with, the engine suddenly starts to output the correct results I expected to get earlier!
Here is an example for the move a2a3:
When calling perft() on the initial position in stockfish, it calculates 380 subnodes for a2a3 at depth 3.
When calling perft() on the initial position in my engine, it calculates 280 subnodes for a2a3 at depth 3.
When calling perft() on the position you get after making the move a2a3 in the initial position in my engine, it calculates the correct number of total nodes at depth 2, 380:
h7h5: 19
h7h6: 19
g7g5: 19
g7g6: 19
f7f5: 19
f7f6: 19
e7e5: 19
e7e6: 19
d7d5: 19
d7d6: 19
c7c5: 19
c7c6: 19
b7b5: 19
b7b6: 19
a7a5: 19
a7a6: 19
g8h6: 19
g8f6: 19
b8c6: 19
b8a6: 19
Depth: 2
Total nodes: 380
Total time: 1ms/0.001s
If you have any idea what the problem could be here, please help me out. Thank you!
EDIT:
I discovered some interesting new facts that might help to solve the problem, but I don't know what to do with them:
For some reason, using std::sort() like this in perft():
std::sort(legal_moves.begin(), legal_moves.end(), [](auto first, auto second){ return first.get_from_index() % 8 > second.get_from_index() % 8; });
to sort the vector of legal moves causes the found number of total nodes for the initial position (for depth 3) to change from the wrong 7070 to the (also wrong) 7331.
When printing the game state after calling game_state.make_move() in perft(), it seems to have had no effect on the position bitboards (the other properties change like they are supposed to). This is very strange, because isolated, the make_move() method works just fine.
I'm unsure if you were able to pin down the issue but from the limited information available in the question, the best I can assume (and something I faced myself earlier) is that there is a problem in your unmake_move() function when it comes to captures since
Your perft fails only at level 3 - this is when the first legal capture is possible, move 1 and 2 can have no legal captures.
Your perft works fine when it's at depth 1 in the position after a2a3 rather than when it's searching at depth 3 from the start
This probably means that your unmake_move() fails at a depth greater than 1 where you need to restore some of the board's state that cannot be derived from just the move parameter you are passing in (e.g. enpassant, castling rights etc. before you made the move).
This is how you would like to debug your move generator using perft.
Given startpos as p1, generate perft(3) for your engine and sf. (you did that)
Now check any move that have different nodes, you pick a2a3. (you did that)
Given startpos + a2a3 as p2, generate perft(2) for your engine and sf. (you partially did this)
Now check any move that have different nodes in step 3. Let's say move x.
Given startpos + a2a3 + x as p3, generate perft(1) for your engine and sf.
Since that is only perft(1) by this time you will be able to figure out the wrong move or the missing move from your generator. Setup that last position or p3 on the board and see the wrong/missing moves from your engine compared to sf perft(1) result.
I have the following piece of (pseudo) code:
static void ConvertBuffer( unsigned char * buffer, const int width )
{
#pragma omp parallel for
for ( int x = 0; x < width; ++x ) // one image row
{
RGB rgb = {0,0,0}; HSB hsb;
rgb.red = (float)buffer[x] / 255.;
RGBToHSB(rgb, hsb);
buffer[x] = hsb.brightness * 255;
}
}
This is a very naive implementation of an RGB → HSB conversion algorithm.
The first implementation would pull a single scanline (=one row of the image) at a time, in my case 65536 bytes. However after trial and error on my particular system, I discovered that I could decrease the total computation time by a factor of 2, if instead I would process 16 scanlines at a time (= 1048576 bytes).
What tool are available for me to guess that magic number, possibly at runtime so that I do not need to hard-code a magical value of 16 somewhere in my code ?
If I know that RGBToHSB is embarrassingly parallel and cache friendly, can I just completely fill the L3 cache and that should be close to the maximum possible speed ?
For reference, my system is described by:
$ sudo likwid-topology
-------------------------------------------------------------
CPU type: Intel Core SandyBridge processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets: 1
Cores per socket: 4
Threads per core: 1
-------------------------------------------------------------
HWThread Thread Core Socket
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
-------------------------------------------------------------
Socket 0: ( 0 1 2 3 )
-------------------------------------------------------------
*************************************************************
Cache Topology
*************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
-------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
-------------------------------------------------------------
Level: 3
Size: 6 MB
Cache groups: ( 0 1 2 3 )
-------------------------------------------------------------
*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 1
-------------------------------------------------------------
Domain 0:
Processors: 0 1 2 3
Relative distance to nodes: 10
Memory: 122.332 MB free of total 5898.17 MB
-------------------------------------------------------------
You can't really define a 'right size' for buffering. My answer would be to set it as big as reasonably possible. I would say somewhere between 10MB and 100MB, but you can set it higher if you can afford it, or lower if you are short on RAM.
If you are reading a file and writing to a file (same or another), you should consider using memory mapped files. This way you get rid of the buffering (managed by the OS), and you can call your function once for the whole image. Note that this is probably not a good idea on a 32-bit system if your image is bigger than 4GB.
Now I am creating my own classifier for face detection.now I want to train the classifier.So when I give the command 'opencv_haartraining -data facehaar -vec vecfile.vec -bg negatives.txt -npos 3 -nneg 5 -nstages 30 -w 30 -h 32' it shows like this.What is this error?I don't understand?Could any one help me?
Data dir name: facehaar
Vec file name: vecfile.vec
BG file name: negatives.txt, is a vecfile: no
Num pos: 3
Num neg: 5
Num stages: 30
Num splits: 1 (stump as weak classifier)
Mem: 200 MB
Symmetric: TRUE
Min hit rate: 0.995000
Max false alarm rate: 0.500000
Weight trimming: 0.950000
Equal weights: FALSE
Mode: BASIC
Width: 30
Height: 32
Applied boosting algorithm: GAB
Error (valid only for Discrete and Real AdaBoost): misclass
Max number of splits in tree cascade: 0
Min number of positive samples per cluster: 500
Required leaf false alarm rate: 9.31323e-10
Tree Classifier
Stage
+---+
| 0|
+---+
Number of features used : 234720
Parent node: NULL
*** 1 cluster ***
POS: 3 3 1.000000
Invalid background description file.
I am using a d_ary_heap_indirect as a priority queue (to process items with the highest priority first) using a property map to store the priorities. However, when I change the values in the priority property map and push vertices that are already in the queue into the queue again, it results in kind of an invalid state where the vertex appears in the queue twice at different positions.
Here is a demo:
#include <iostream>
#include <iomanip>
#include <boost/graph/grid_graph.hpp>
#include <boost/graph/detail/d_ary_heap.hpp>
#include <boost/property_map/property_map.hpp>
#include <cstdlib>
template <typename TQueue>
static void OutputQueue(TQueue queue);
int main(int, char*[])
{
srand((unsigned int)time(NULL));
srand48((unsigned int)time(NULL));
boost::array<std::size_t, 2> lengths = { { 2,2 } };
typedef boost::grid_graph<2> GraphType;
GraphType graph(lengths);
typedef boost::graph_traits<GraphType>::vertex_descriptor Vertex;
typedef boost::property_map<GraphType, boost::vertex_index_t>::const_type GridIndexMapType;
GridIndexMapType gridIndexMap(get(boost::vertex_index, graph));
typedef boost::vector_property_map<std::size_t, GridIndexMapType> IndexInHeapMap;
IndexInHeapMap index_in_heap(gridIndexMap);
typedef boost::graph_traits<GraphType>::vertex_iterator VertexIteratorType;
typedef boost::vector_property_map<float, GridIndexMapType> PriorityMapType;
PriorityMapType priorityMap(gridIndexMap);
VertexIteratorType vertexIterator, vertexIteratorEnd;
typedef std::greater<float> ComparisonFunctor;
typedef boost::d_ary_heap_indirect<Vertex, 4, IndexInHeapMap, PriorityMapType, ComparisonFunctor > MutableQueueType;
ComparisonFunctor comparisonFunctor;
MutableQueueType mutableQueue(priorityMap, index_in_heap, comparisonFunctor);
std::cout << "There are " << mutableQueue.size() << " items in the queue." << std::endl;
// Add random values to the vertices and add them to the queue
for( tie(vertexIterator, vertexIteratorEnd) = vertices(graph); vertexIterator != vertexIteratorEnd; ++vertexIterator)
{
put(priorityMap, *vertexIterator, rand() % 1000);
}
for( tie(vertexIterator, vertexIteratorEnd) = vertices(graph); vertexIterator != vertexIteratorEnd; ++vertexIterator)
{
mutableQueue.push(*vertexIterator);
}
std::cout << "There are " << mutableQueue.size() << " items in the queue." << std::endl;
std::cout << "The priority queue is: " << std::endl;
OutputQueue(mutableQueue);
// Insert another set of random values for each vertex
for( tie(vertexIterator, vertexIteratorEnd) = vertices(graph); vertexIterator != vertexIteratorEnd; ++vertexIterator)
{
float newPriority = rand() % 1000;
std::cout << "New priority for " << vertexIterator->operator[](0) << ", " << vertexIterator->operator[](1) << " " << newPriority << std::endl;
put(priorityMap, *vertexIterator, newPriority);
}
for( tie(vertexIterator, vertexIteratorEnd) = vertices(graph); vertexIterator != vertexIteratorEnd; ++vertexIterator)
{
//mutableQueue.push(*vertexIterator); // This makes sense that the queue would not end up sorted
mutableQueue.push_or_update(*vertexIterator); // I thought this one should work
//mutableQueue.update(*vertexIterator); // This one actually seems to UNsort the queue?
}
std::cout << "There are " << mutableQueue.size() << " items in the queue." << std::endl;
std::cout << "The priority queue is: " << std::endl;
OutputQueue(mutableQueue);
std::cout << std::endl;
return 0;
}
template <typename TQueue>
static void OutputQueue(TQueue queue)
{
while( ! queue.empty() )
{
typename TQueue::value_type u = queue.top();
// These two lines are equivalent
std::cout << "vertex: " << u[0] << " " << u[1] << " priority: " << get(queue.keys(), u) << std::endl;
queue.pop();
}
}
And a demo output:
There are 0 items in the queue.
There are 4 items in the queue.
The priority queue is:
vertex: 1 1 priority: 445
vertex: 0 0 priority: 150
vertex: 0 1 priority: 84
vertex: 1 0 priority: 0
New priority for 0, 0 769
New priority for 1, 0 870
New priority for 0, 1 99
New priority for 1, 1 211
There are 8 items in the queue.
The priority queue is:
vertex: 0 0 priority: 769
vertex: 1 0 priority: 870
vertex: 1 0 priority: 870
vertex: 0 0 priority: 769
vertex: 1 1 priority: 211
vertex: 1 1 priority: 211
vertex: 0 1 priority: 99
vertex: 0 1 priority: 99
The demo simply sets random priority values for every vertex, and pushes them all into the queue. It then does exactly the same thing again. You can see in the output that some of the items appear in the queue at different positions (not back-to-back, as I would expect, since they reference the same priority value in the PriorityMap).
The problem is that item (0,0) (with new priority 769) appears above vertex (1,0) with priority 870. This would cause the items to be processed in the wrong order.
Is there a way to replace an item in the queue when it is pushed instead of adding a second one? (like an std::set instead of the current behavior which is like std::multiset)?
--------- Edit ------------
In the "// Insert another set of random values for each vertex" loop, I replaced the 'mutableQueue.push(*vertexIterator)' with :
mutableQueue.push_or_update(*vertexIterator);
Unfortunately it doesn't do what I'd expect - the output is now:
There are 0 items in the queue.
New priority for 0, 0 150
New priority for 1, 0 522
New priority for 0, 1 27
New priority for 1, 1 883
There are 4 items in the queue.
The priority queue is:
vertex: 1 1 priority: 883
vertex: 1 0 priority: 522
vertex: 0 0 priority: 150
vertex: 0 1 priority: 27
New priority for 0, 0 658
New priority for 1, 0 591
New priority for 0, 1 836
New priority for 1, 1 341
There are 7 items in the queue.
The priority queue is:
vertex: 0 1 priority: 836
vertex: 0 1 priority: 836
vertex: 0 0 priority: 658
vertex: 0 0 priority: 658
vertex: 1 0 priority: 591
vertex: 1 0 priority: 591
vertex: 1 1 priority: 341
Further, replacing the push() with just update() produces:
There are 0 items in the queue.
New priority for 0, 0 806
New priority for 1, 0 413
New priority for 0, 1 592
New priority for 1, 1 861
There are 4 items in the queue.
The priority queue is:
vertex: 1 1 priority: 861
vertex: 0 0 priority: 806
vertex: 0 1 priority: 592
vertex: 1 0 priority: 413
New priority for 0, 0 175
New priority for 1, 0 642
New priority for 0, 1 991
New priority for 1, 1 462
There are 4 items in the queue.
The priority queue is:
vertex: 1 1 priority: 462
vertex: 0 1 priority: 991
vertex: 1 0 priority: 642
vertex: 0 0 priority: 175
There are now only 4 items (like I would expect), but they are not sorted!
----------- Edit - more information --------------
I think there is something going wrong with the index_in_heap map. I added:
std::cout << "Index added: " << get(index_in_heap, v) << std::endl;
after this line:
put(index_in_heap, v, index);
in d_ary_heap_indirect::push(Value).
I also added
std::cout << "Index added caller: " << get(index_in_heap, v) << std::endl;
after the first round of adding values to the queue (after this line:
mutableQueue.push(*vertexIterator);
The output is:
Original priority for 0, 0 641
Index added: 0
Index added caller: 0
Original priority for 1, 0 40
Index added: 1
Index added caller: 1
Original priority for 0, 1 400
Index added: 2
Index added caller: 2
Original priority for 1, 1 664
Index added: 3
Index added caller: 0
I don't understand why this last index is 3 inside the push()
function, but 0 when I query it from the caller?
When I look at the same things inside the update() function, the
index_in_heap just seems to return garbage. That is, I look at the
value of size_type index = get(index_in_heap, v); in update(), and
when it is called with vertex (0,0), the value of 'index' is
4294967295 (when I would expect it to be in the range [0,3]).
Can anyone explain this? Perhaps I am setting up the index_in_heap map incorrectly?
The priority queue won't update its structure when you just change the priorities of the nodes. Once a node is inserted you need to consider its priority constant. If you need to update the priorities you need to tell the priority queue about this. To this end you need to tell it which node gets what new priority.
Unfortunately, tracking some sort of node identification and priority makes the priority queues slow: for a d-heap it is necessay to track where the node moved, making updates relatively expensive. For node-based heaps, e.g., Fibonacci-heaps, the node stays put but the tend to be more expensive to maintain (Fibonacci-heaps have interesting theoretical complexity which, however, only matters for impractically sized problems). I haven't come up with any middle-ground although I implemented all approaches to priority queues I could find described in books.
The d_ary_heap_indirect is designed to only allow priorities to increase. If in the update() and push_or_update() functions you change:
preserve_heap_property_up(index);
to
preserve_heap_property_up(index);
preserve_heap_property_down();
it seems to allow increasing or decreasing the priorities while keeping the queue sorted.
I had been going through the Book:
C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie, found 1 mistake in the program given under the Article 6.3 How a vector Grows Itself, this program missed a "<" in the couts:
#include <vector>
#include <iostream>
using namespace std;
int main() {
vector<int> ivec;
cout < "ivec: size: " < ivec.size() < " capacity: " < ivec.capacity() < endl;
for (int ix = 0; ix < 24; ++ix) {
ivec.push_back(ix);
cout < "ivec: size: " < ivec.size()
< " capacity: " < ivec.capacity() < endl;
}
}
Later within that article:
"Under the Rogue Wave implementation, both the size and the capacity
of ivec after its definition are 0. On inserting the first element,
however, ivec's capacity is 256 and its size is 1."
But, on correcting and running the code i get the following output:
ivec: size: 0 capacity: 0
ivec[0]=0 ivec: size: 1 capacity: 1
ivec[1]=1 ivec: size: 2 capacity: 2
ivec[2]=2 ivec: size: 3 capacity: 4
ivec[3]=3 ivec: size: 4 capacity: 4
ivec[4]=4 ivec: size: 5 capacity: 8
ivec[5]=5 ivec: size: 6 capacity: 8
ivec[6]=6 ivec: size: 7 capacity: 8
ivec[7]=7 ivec: size: 8 capacity: 8
ivec[8]=8 ivec: size: 9 capacity: 16
ivec[9]=9 ivec: size: 10 capacity: 16
ivec[10]=10 ivec: size: 11 capacity: 16
ivec[11]=11 ivec: size: 12 capacity: 16
ivec[12]=12 ivec: size: 13 capacity: 16
ivec[13]=13 ivec: size: 14 capacity: 16
ivec[14]=14 ivec: size: 15 capacity: 16
ivec[15]=15 ivec: size: 16 capacity: 16
ivec[16]=16 ivec: size: 17 capacity: 32
ivec[17]=17 ivec: size: 18 capacity: 32
ivec[18]=18 ivec: size: 19 capacity: 32
ivec[19]=19 ivec: size: 20 capacity: 32
ivec[20]=20 ivec: size: 21 capacity: 32
ivec[21]=21 ivec: size: 22 capacity: 32
ivec[22]=22 ivec: size: 23 capacity: 32
ivec[23]=23 ivec: size: 24 capacity: 32
Is the capacity increasing with the formula 2^N where N is the initial capacity? Please explain.
The rate at which the capacity of a vector grows is required by the standard to be exponential (which, IMHO, is over-specification). The standard specifies this in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.
Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.
This probably seems a bit odd, so let me explain to you how this works...
size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
...
size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
...
size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82
As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.
If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).
I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.
To be able to provide amortized constant time insertions at the end of the std::vector, the implementation must grow the size of the vector (when needed) by a factor K>1 (*), such that when trying to append to a vector of size N that is full, the vector grows to be K*N.
Different implementations use different constants K that provide different benefits, in particular most implementations go for either K = 2 or K = 1.5. A higher K will make it faster as it will require less grows, but it will at the same time have a greater memory impact. As an example, in gcc K = 2, while in VS (Dinkumware) K = 1.5.
(*) If the vector grew by a constant quantity, then the complexity of push_back would become linear instead of amortized constant. For example, if the vector grew by 10 elements when needed, the cost of growing (copy of all element to the new memory address) would be O( N / 10 ) (every 10 elements, move everything) or O( N ).
Just to add some mathematic proof on the time complexity on vector::push_back, say the size of vector is n, what we care about here is the number of copies happened so far, say y, notice the copy happens every time you grow the vector.
Grow by factor of K
y = K^1 + K^2 + K^3 ... K^log(K, n)
K*y = + K^2 + K^3 ... K^log(K, n) + K*K^log(K, n)
K*y-y = K*K^log(K, n) - K
y = K(n-1)/(K-1) = (K/(K-1))(n-1)
T(n) = y/n = (K/(K-1)) * (n-1)/n < K/(K-1) = O(1)
K/(K-1) is a constant, and see the most common cases:
K=2, T(n) = 2/(2-1) = 2
K=1.5, T(n) = 1.5/(1.5-1) = 3
and actually there is a reason of choosing K as 1.5 or 2 in different implementations, see this graph: as T(n) reaching the minimum when K is around 2, there is not much benefit on using a larger K, at the cost of allocating more memory
Grow by constant quantity of C
y = C + 2*C + 3*C + 4*C + ... (n/C) * C
= C(1+2+3+...+n/C), say m = n/C
= C*(m*(m-1)/2)
= n(m-1)/2
T(n) = y/n = (n(m-1)/2)/n = (m-1)/2 = n/2C - 1/2 = O(n)
As we could see it is liner
The capacity of the vector is completely implementation-dependent, no one can tell how it's growing..
Are you using the "Rogue Wave" implementation?
How capacity grows is up to the implementation. Yours use 2^N.
Yes, the capacity doubles each time it is exceeded. This is implementation dependent.
before pushing back an element the vector check if the size is greater than it's capacity like bellow
i will explain it with reserve function :
void push_back(const value_type &val) //push_back actual prototype
{
if (size_type < 10)
reserve(size_type + 1);
else if (size_type > (_capacity / 4 * 3))
reserve(_capacity + (this->_capacity / 4));
//then the vector get filled with value
}
size_type : the vector size.
_capacity : the vector _capacity.