Dynamic storage in C++ for chess move generation - c++

I'm coding a chess engine in C++ and I'm currently working on move generation. I'm confused as to how I should be storing moves as they are generated. I'm relatively new to C++, but is there some some of dynamic object that I can use to store moves as they come (since I cannot know how many there are).

There are many containers in C++, depending of situation you can use an std::vector, or something else.
As to choose a container would require more information from your chess engine (like how many times would it be resized, does movements can be added at front and back of your container, etc), we cannot give you a direct answer with data that you gave.
Please take a look at this question to define which one would be the most adapted for your case.

You're looking for something like an std::vector - a template that represents a collection whose size changes dynamically:
Vectors are sequence containers representing arrays that can change in size.
Just like arrays, vectors use contiguous storage locations for their elements, which means that their elements can also be accessed using offsets on regular pointers to its elements, and just as efficiently as in arrays. But unlike arrays, their size can change dynamically, with their storage being handled automatically by the container.

Many chess engines heavily use recursion. When your think say 5 moves (5 ply) ahead you're actually getting into 5 recursive calls. If you enter a call, the local variables of that function invocation are stored on the stack. So it theoretically it would suffice to have a local "chessboard", e.g. an array of fields, each holding a piece (or being empty) since all the chessboards would be retainend on the stack automatically until their function invocation returns. Since stack space is usually limited, you could also have each invocation (stack frame) only hold a pointer to a piece of heap memory, allocate it when you enter the function, dealloc when you leave it again. Each function invocation returns to it's caller the "score" (cumulative value) of the combinations at a "deeper" recursion level (using the value of pieces as usual (pawn = 1, queen = 9 etc.). Instead of allocating separate boards you could store them in a vector. The advantage is that your memory is less likely to get fragmented. Each invocation can then e.g. hold the index of its chess board state in the vector.

Related

In C++, adding element to dynamic array is done by creating a new array. In languages like Python, does it do similar thing at the assembly level?

I used many programming languages that had something like .push() or .append() to add an object to the end of a dynamic array. Now, I learnt some C++ and noticed that it does not support that, but I instead have to implement a function that manually does this, and this function creates a new temporary array that is one element longer, loops through the old array and copies each element, adds the new object, deletes the old array, and copies the temporary array to the old array. Does the languages I already worked with that allow functions like .push() do the same thing at the assembly level?
In C++, we also have constructs that do their own memory management: std::vector<> comes to mind. It is advisable to be used over plain arrays when appropriate. std::vector<> maintains a buffer and resizes + moves only when necessary (i.e., when the buffer is too small to hold the newly inserted element); then it expands usually to double or similarly.
In other languages, you also need to implement some kind of memory management. However, it's not trivial whether this kind of resizing is necessary: for most scripting languages are actually implemented in a lower-level language, very usually C/C++/C--. So it's possible that std::vector<> (or even std::map<>) is used in the background. However, at the end of the day, someone must implement resizing 1, as memory is usually viewed as a one-dimensional array from which you allocate; you can't necessarily extend an allocation infinitely.
1 Note: theoretically, on some architectures it'd be possible to delegate this to virtual memory management and use segment registers for pointers, index registers for offsets, and let paging associate segment:index to physical addresses; however, that's rarely used nowadays.
Yes they do. For example python list (which are arrays, not lists) is doing exactly the same.
Note: You do not have to implement this yourself. That's what the STL is for in C++ and the relevant data structure for a dynamic array is std::vector. Except std::vector is smarter about growing the array, growing it by more than one element so not every push has to copy all the previous data. In fact it has ammortised cost of O(1) per push. If you know how much data you have it's still best to reserve the right amount of space from the start though.

Why do we prefer vectors instead of arrays in C++? Because of core dump? [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

C++ Vector vs Array [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

Linked list vs dynamic array for implementing a stack using vector class

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.
That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?
It's difficult to compare the two, because the patterns of their memory usage are quite different.
Vector resizing
A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.
When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).
linked list allocations
Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.
Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.
As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.
Allocating and managing lots of blocks of memory is expensive.
cache usage
Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.
With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.
Summary
A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.
Yes, what you say is true for C++. For this reason, the default container inside std::stack, which is the standard stack class in C++, is neither a vector nor a linked list, but a double ended queue (a deque). This has nearly all the advantages of a vector, but it resizes much better.
Basically, an std::deque is a linked list of arrays of sorts internally. This way, when it needs to resize, it just adds another array.
First, the performance trade-offs between linked-lists and dynamic arrays are a lot more subtle than that.
The vector class in C++ is, by requirement, implemented as a "dynamic array", meaning that it must have an amortized-constant cost for inserting elements into it. How this is done is usually by increasing the "capacity" of the array in a geometric manner, that is, you double the capacity whenever you run out (or come close to running out). In the end, this means that a reallocation operation (allocating a new chunk of memory and copying the current content into it) is only going to happen on a few occasions. In practice, this means that the overhead for the reallocations only shows up on performance graphs as little spikes at logarithmic intervals. This is what it means to have "amortized-constant" cost, because once you neglect those little spikes, the cost of the insert operations is essentially constant (and trivial, in this case).
In a linked-list implementation, you don't have the overhead of reallocations, however, you do have the overhead of allocating each new element on freestore (dynamic memory). So, the overhead is a bit more regular (not spiked, which can be needed sometimes), but could be more significant than using a dynamic array, especially if the elements are rather inexpensive to copy (small in size, and simple object). In my opinion, linked-lists are only recommended for objects that are really expensive to copy (or move). But at the end of the day, this is something you need to test in any given situation.
Finally, it is important to point out that locality of reference is often the determining factor for any application that makes extensive use and traversal of the elements. When using a dynamic array, the elements are packed together in memory one after the other and doing an in-order traversal is very efficient as the CPU can preemptively cache the memory ahead of the reading / writing operations. In a vanilla linked-list implementation, the jumps from one element to the next generally involves a rather erratic jumps between wildly different memory locations, which effectively disables this "pre-fetching" behavior. So, unless the individual elements of the list are very big and operations on them are typically very long to execute, this lack of pre-fetching when using a linked-list will be the dominant performance problem.
As you can guess, I rarely use a linked-list (std::list), as the number of advantageous applications are few and far between. Very often, for large and expensive-to-copy objects, it is often preferable to simply use a vector of pointers (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory usage (for linking pointers) and you get random-access capabilities if you need it).
The main case that I can think of, where a linked-list wins over a dynamic array (or a segmented dynamic array like std::deque) is when you need to frequently insert elements in the middle (not at either ends). However, such situations usually arise when you are keeping a sorted (or ordered, in some way) set of elements, in which case, you would use a tree structure to store the elements (e.g., a binary search tree (BST)), not a linked-list. And often, such trees store their nodes (elements) using a semi-contiguous memory layout (e.g., a breadth-first layout) within a dynamic array or segmented dynamic array (e.g., a cache-oblivious dynamic array).
Yes, it's true for C++ or any other language. Dynamic array is a concept. The fact that C++ has vector doesn't change the theory. The vector in C++ actually does the resizing internally so this task isn't the developers' responsibility. The actual cost doesn't magically disappear when using vector, it's simply offloaded to the standard library implementation.
std::vector is implemented using a dynamic array, whereas std::list is implemented as a linked list. There are trade-offs for using both data structures. Pick the one that best suits your needs.
As you indicated, a dynamic array can take a larger amount of time adding an item if it gets full, as it has to expand itself. However, it is faster to access since all of its members are grouped together in memory. This tight grouping also usually makes it more cache-friendly.
Linked lists don't need to resize ever, but traversing them takes longer as the CPU must jump around in memory.
That got me wondering if this is true for c++ as there is a vector class which automatically resizes whenever new elements are inserted.
Yes, it still holds, because a vector resize is a potentially expensive operation. Internally, if the pre-allocated size for the vector is reached and you attempt to add new elements, a new allocation takes place and the old data is moved to the new memory location.
From the C++ documentation:
vector::push_back - Add element at the end
Adds a new element at the end of the vector, after its current last element. The content of val is copied (or moved) to the new element.
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style
Skip to 44:40. You should prefer std::vector whenever possible over a std::list, as explained in the video, by Bjarne himself. Since std::vector stores all of it's elements next to each other, in memory, and because of that it will have the advantage of being cached in memory. And this is true for adding and removing elements from std::vector and also searching. He states that std::list is 50-100x slower than a std::vector.
If you really want a stack, you should really use std::stack instead of making your own.

C++ std::vector vs array in the real world

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.