C++ std::vector vs array in the real world - c++

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?

A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.

One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.

A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.

I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.

Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!

It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.

As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

Related

Why do we prefer vectors instead of arrays in C++? Because of core dump? [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

Declaring 3D array structure in c++ using vector

Hi I am a graduate student studying scientific computing using c++. Some of our research focus on speed of an algorithm, therefore it is important to construct array structure that is fast enough.
I've seen two ways of constructing 3D Arrays.
First one is to use vector liblary.
vector<vector<vector<double>>> a (isize,vector<double>(jsize,vector<double>(ksize,0)))
This gives 3D array structure of size isize x jsize x ksize.
The other one is to construct a structure containing 1d array of size isize* jsize * ksize using
new double[isize*jsize*ksize]. To access the specific location of (i,j,k) easily, operator overloading is necessary(am I right?).
And from what I have experienced, first one is much faster since it can access to location (i,j,k) easily while latter one has to compute location and return the value. But I have seen some people preferring latter one over the first one. Why do they prefer the latter setting? and is there any disadvantage of using the first one?
Thanks in adavance.
Main difference between those will be the layout:
vector<vector<vector<T>>>
This will get you a 1D array of vector<vector<T>>.
Each item will be a 1D array of vector<T>.
And each item of those 1D array will be a 1D array of T.
The point is, vector itself does not store its content. It manages a chunk of memory, and stores the content there. This has a number of bad consequences:
For a matrix of dimension X·Y·Z, you will end up allocating 1 + X + X·Y memory chunks. That's horribly slow, and will trash the heap. Imagine: a cube matrix of size 20 would trigger 421 calls to new!
To access a cell, you have 3 levels of indirection:
You must access the vector<vector<vector<T>>> object to get pointer to top-level memory chunk.
You must then access the vector<vector<T>> object to get pointer to second-level memory chunk.
You must then access the vector<T> object to get pointer to the leaf memory chunk.
Only then you can access the T data.
Those memory chunks will be spread around the heap, causing a lot of cache misses and slowing the overall computation.
Should you get it wrong at some point, it is possible to end up with some lines in your matrix having different lengths. After all, they're independent 1-d arrays.
Having a contiguous memory block (like new T[X * Y * Z]) on the other hand gives:
You allocate 1 memory chunk. No heap trashing, O(1).
You only need to access the pointer to the memory chunk, then can go straight for desired element.
All matrix is contiguous in memory, which is cache-friendly.
Those days, a single cache miss means dozens or hundreds lost computing cycles, do not underestimate the cache-friendliness aspect.
By the way, there is a probably better way you didn't mention: using one of the numerous matrix libraries that will handle this for you automatically and provide nice support tools (like SSE-accelerated matrix operations). One such library is Eigen, but there are plenty others.
→ You want to do scientific computing? Let a lib handle the boilerplate and the basics so you can focus on the scientific computing part.
In my point of view, there are too much advantages std::vector's have over normal plain arrays.
In short here are some:
It is much harder to create memory leaks with std::vector. This point alone is one of the biggest advantages. This has nothing to do with performance, but should be considered all the time.
std::vector is part of the STL. This part of C++ is one of the most used one. Thousands of people use the STL and so they get "tested" every day. Over the last years they got optimized so radically, they don't lack any performance anymore. (pls correct me if i see this wrong)
Handling with std::vector is easy as 1, 2, 3. No pointer handling no nothing... Just accessing it via methods or with []-operator and more other methods.
First of all, the idea that you access (i,j,k) in your vec^3 directly is somewhat flawed. What you have is a structure of pointers where you need to dereference three pointers along the way. Note that I have no idea whether that is faster or slower than computing the position within a one-dimensional array, though. You'd need to test that and it might depend on the size of your data (especially whether it fits in a chunk).
Second, the vector^3 requires pointers and vector sizes, which require more memory. In many cases, this will be irrelevant (as the image grows cubically but the memory difference only quadratically) but if your algoritm is really going to fill out any memory available, that can matter.
Third, the raw array stores everything in consecutive memory, which is good for streaming and can be good for certain algorithms because of quick cache accesses. For example when you add one 3D image to another.
Note that all of this is about hyper-optimization that you might not need. The advantages of vectors that skratchi.at pointed out in his answer are quite strong, and I add the advantage that vectors usually increase readability. If you do not have very good reasons not to use vectors, then use them.
If you should decide for the raw array, in any case, make sure that you wrap it well and keep the class small and simple, in order to counter problems regarding leaks and such.
Welcome to SO.
If everything what you have are the two alternatives, then the first one could be better.
Prefer using STL array or vector instead of a C array
You should avoid to use C++ plain arrays since you need to manage yourself the memory allocating/deallocating with new/delete and other boilerplate code like keep track of the size/check bounds. In clearly words "C arrays are less safe, and have no advantages over array and vector."
However, there are some important drawbacks in the first alternative. Something I would like to highlight is that:
std::vector<std::vector<std::vector<T>>>
is not a 3-d matrix. In a matrix, all the rows must have the same size. On the other hand, in a "vector of vectors" there is no guarantee that all the nested vectors have the same length. The reason is that a vector is a linear 1-D structure as pointed out in the #spectras answer. Hence, to avoid all sort of bad or unexpected behaviours, you must to include guards in your code to obtain the rectangular invariant guaranteed.
Luckily, the first alternative is not the only one you may have in hands.
For example, you can replace the c-style array by a std::array:
const int n = i_size * j_size * k_size;
std::array<int, n> myFlattenMatrix;
or use std::vector in case your matrix dimensions can change.
Accessing element by its 3 coordinates
Regarding your question
To access the specific location of (i,j,k) easily, operator
overloading is necessary(am I right?).
Not exactly. Since there isn't a 3-parameter operator for neither std::vector nor array, you can't overload it. But you can create a template class or function to wrap it for you. In any case you will must to deference the 3 vectors or calculate the flatten index of the element in the linear storage.
Considering do not use a third part matrix library like Eigen for your experiments
You aren't coding it for production but for research purposes instead. Particularly, your research is exactly regarding the performance of algorithms. In that case, I prefer do not recommend to use a third part library, like Eigen, absolutely. Of course it depends a lot of what kind of "speed of an algorithm" metrics are you willing to gather, but Eigen, for instance, will do a lot of things under the hood (like vectorization) which will have a tremendous influence on your experiments. Since it will be hard for you to control those unseen optimizations, these library's features may lead you to wrong conclusions about your algorithms.
Algorithm's performance and big-o notation
Usually, the performance of algorithms are analysed by using the big-O approach where factors like the actual time spent, hardware speed or programming language traits aren't taken in account. The book "Data Structures and Algorithms in C++" by Adam Drozdek can provide more details about it.

What advantages do arrays hold over vectors?

Well, after a full year of programming and only knowing of arrays, I was made aware of the existence of vectors (by some members of StackOverflow on a previous post of mine). I did a load of researching and studying them on my own and rewrote an entire application I had written with arrays and linked lists, with vectors. At this point, I'm not sure if I'll still use arrays, because vectors seem to be more flexible and efficient. With their ability to grow and shrink in size automatically, I don't know if I'll be using arrays as much. At this point, the only advantage I personally see is that arrays are much easier to write and understand. The learning curve for arrays is nothing, where there is a small learning curve for vectors. Anyway, I'm sure there's probably a good reason for using arrays in some situation and vectors in others, I was just curious what the community thinks. I'm an entirely a novice, so I assume that I'm just not well-informed enough on the strict usages of either.
And in case anyone is even remotely curious, this is the application I'm practicing using vectors with. Its really rough and needs a lot of work: https://github.com/JosephTLyons/Joseph-Lyons-Contact-Book-Application
A std::vector manages a dynamic array. If your program need an array that changes its size dynamically at run-time then you would end up writing code to do all the things a std::vector does but probably much less efficiently.
What the std::vector does is wrap all that code up in a single class so that you don't need to keep writing the same code to do the same stuff over and over.
Accessing the data in a std::vector is no less efficient than accessing the data in a dynamic array because the std::vector functions are all trivial inline functions that the compiler optimizes away.
If, however, you need a fixed size then you can get slightly more efficient than a std::vector with a raw array. However you won't loose anything using a std::array in those cases.
The places I still use raw arrays are like when I need a temporary fixed-size buffer that isn't going to be passed around to other functions:
// some code
{ // new scope for temporary buffer
char buffer[1024]; // buffer
file.read(buffer, sizeof(buffer)); // use buffer
} // buffer is destroyed here
But I find it hard to justify ever using a raw dynamic array over a std::vector.
This is not a full answer, but one thing I can think of is, that the "ability to grow and shrink" is not such a good thing if you know what you want. For example: assume you want to save memory of 1000 objects, but the memory will be filled at a rate that will cause the vector to grow each time. The overhead you'll get from growing will be costly when you can simply define a fixed array
Generally speaking: if you will use an array over a vector - you will have more power at your hands, meaning no "background" function calls you don't actually need (resizing), no extra memory saved for things you don't use (size of vector...).
Additionally, using memory on the stack (array) is faster than heap (vector*) as shown here
*as shown here it's not entirely precise to say vectors reside on the heap, but they sure hold more memory on the heap than the array (that holds none on the heap)
One reason is that if you have a lot of really small structures, small fixed length arrays can be memory efficient.
compare
struct point
{
float coords[4]
}
with
struct point
{
std::vector<float> coords;
}
Alternatives include std::array for cases like this. Also std::vector implementations will over allocate, meaning that if you want resize to 4 slots, you might have memory allocated for 16 slots.
Furthermore, the memory locations will be scattered and hard to predict, killing performance - using an exceptionally larger number of std::vectors may also need to memory fragmentation issues, where new starts failing.
I think this question is best answered flipped around:
What advantages does std::vector have over raw arrays?
I think this list is more easily enumerable (not to say this list is comprehensive):
Automatic dynamic memory allocation
Proper stack, queue, and sort implementations attached
Integration with C++ 11 related syntactical features such as iterator
If you aren't using such features there's not any particular benefit to std::vector over a "raw array" (though, similarly, in most cases the downsides are negligible).
Despite me saying this, for typical user applications (i.e. running on windows/unix desktop platforms) std::vector or std::array is (probably) typically the preferred data structure because even if you don't need all these features everywhere, if you're already using std::vector anywhere else you may as well keep your data types consistent so your code is easier to maintain.
However, since at the core std::vector simply adds functionality on top of "raw arrays" I think it's important to understand how arrays work in order to be fully take advantage of std::vector or std::array (knowing when to use std::array being one example) so you can reduce the "carbon footprint" of std::vector.
Additionally, be aware that you are going to see raw arrays when working with
Embedded code
Kernel code
Signal processing code
Cache efficient matrix implementations
Code dealing with very large data sets
Any other code where performance really matters
The lesson shouldn't be to freak out and say "must std::vector all the things!" when you encounter this in the real world.
Also: THIS!!!!
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
embedding arrays can localise memory access requirement, improving cache hits and therefore performance

C++ Vector vs Array [duplicate]

I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
The author has a Q/A at the end of each chapter, and one of them was:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you see in the real-world more often?
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
the elements are specified at compile time, e.g. const char project[] = "Super Server";, const Colours colours[] = { Green, Yellow };
with C++11 it will be equally concise to initialise std::vectors with values
the number of elements is inherently fixed, e.g. const char* const bool_to_str[] = { "false", "true" };, Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (new[]) followed by serialised construction of objects
compiler-generated tables of const data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-local static variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)
In C++03, vectors created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default vector's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03 vectors to use shared memory like this even when specifying a custom allocator template parameter).
embedding arrays can localise memory access requirement, improving cache hits and therefore performance
That said, if it's not an active pain to use a vector (in code concision, readability or performance) then you're better off doing so: they've size(), checked random access via at(), iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end() is better than x + sizeof x / sizeof x[0] any day).
UPDATE: C++11 introduced a std::array<>, which avoids some of the costs of vectors - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
One of the best reasons to use a vector as opposed to an array is the RAII idiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
It's also a nice way to avoid having to remember to use delete.
You should also check out std::algorithm, which provides a lot of common algorithms for vector and other STL containers.
I have on a few occasions written code with vector that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array or a Blitz::Array would have been better than either of them.
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you should program that way.
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2]; If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];, the answer is clear: USE VECTOR!
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a good program that it can accomodate a wide range of possible scenarios.
Take these (trivial) scenarios as examples:
You have implemented a view
controller to track AI combatants in
a FPS. The game logic spawns a random
number of combatants in various zones
every couple of seconds. The player
is downing AI combatants at a rate
known only at run time.
A lawyer has accessed the Municipal
Court website in his state and is
querying the number of new DUI cases
that came in over the night. He
chooses to filter the list by a set
of variables including time the
accident occurred, zip code, and
arresting officer.
The operating system needs to
maintain a list of memory addresses
in use by the various programs
running on it. The number of programs
and their memory usage changes in
unpredictable ways.
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects.
Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.

Are there any practical limitations to only using std::string instead of char arrays and std::vector/list instead of arrays in c++?

I use vectors, lists, strings and wstrings obsessively in my code. Are there any catch 22s involved that should make me more interested in using arrays from time to time, chars and wchars instead?
Basically, if working in an environment which supports the standard template library is there any case using the primitive types is actually better?
For 99% of the time and for 99% of Standard Library implementations, you will find that std::vectors will be fast enough, and the convenience and safety you get from using them will more than outweigh any small performance cost.
For those very rare cases when you really need bare-metal code, you can treat a vector like a C-style array:
vector <int> v( 100 );
int * p = &v[0];
p[3] = 42;
The C++ standard guarantees that vectors are allocated contiguously, so this is guaranteed to work.
Regarding strings, the convenience factor becomes almnost overwhelming, and the performance issues tend to go away. If you go beack to C-style strings, you are also going back to the use of functions like strlen(), which are inherently very inefficent themselves.
As for lists, you should think twice, and probably thrice, before using them at all, whether your own implementation or the standard. The vast majority of computing problems are better solved using a vector/array. The reason lists appear so often in the literature is to a large part because they are a convenient data structure for textbook and training course writers to use to explain pointers and dynamic allocation in one go. I speak here as an ex training course writer.
I would stick to STL classes (vectors, strings, etc). They are safer, easier to use, more productive, with less probability to have memory leaks and, AFAIK, they make some additional, run-time checking of boundaries, at least at DEBUG time (Visual C++).
Then, measure the performance. If you identify the bottleneck(s) is on STL classes, then move to C style strings and arrays usage.
From my experience, the chances to have the bottleneck on vector or string usage are very low.
One problem is the overhead when accessing elements. Even with vector and string when you access an element by index you need to first retrieve the buffer address, then add the offset (you don't do it manually, but the compiler emits such code). With raw array you already have the buffer address. This extra indirection can lead to significant overhead in certain cases and is subject to profiling when you want to improve performance.
If you don't need real time responses, stick with your approach. They are safer than chars.
You can occasionally encounter scenarios where you'll get better performance or memory usage from doing some stuff yourself (example, std::string typically has about 24 bytes of overhead, 12 bytes for the pointers in the std::string itself, and a header block on its dynamically allocated piece).
I have worked on projects where converting from std::string to const char* saved noticeable memory (10's of MB). I don't believe these projects are what you would call typical.
Oh, using STL will hurt your compile times, and at some point that may be an issue. When your project results in over a GB of object files being passed to the linker, you might want to consider how much of that is template bloat.
I've worked on several projects where the memory overhead for strings has become problematic.
It's worth considering in advance how your application needs to scale. If you need to be storing an unbounded number of strings, using const char*s into a globally managed string table can save you huge amounts of memory.
But generally, definitely use STL types unless there's a very good reason to do otherwise.
I believe the default memory allocation technique is a buffer for vectors and strings is one that allocates double the amount of memory each time the currently allocated memory gets used up. This can be wasteful. You can provide a custom allocator of course...
The other thing to consider is stack vs. heap. Staticly sized arrays and strings can sit on the stack, or at least the compiler handles the memory management for you. Newer compilers will handle dynamically sized arrays for you too if they provide the relevant C99/C++0x feature. Vectors and strings will always use the heap, and this can introduce performance issues if you have really tight constraints.
As a rule of thumb use whats already there unless it hurts your project with its speed/memory overhead... you'll probably find that for 99% of stuff the STL provided classes save you time and effort with little to no impact on your applications performance. (i.e. "avoid premature optimisation")