In our C++ course they suggest not to use C++ arrays on new projects anymore. As far as I know Stroustroup himself suggests not to use arrays. But are there significant performance differences?
Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem that you have to keep track of the size, and you need to delete them manually and do all sorts of housekeeping.
Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use std::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.
Now, std::vector vs. native C++ arrays (taken from the internet):
// Comparison of assembly code generated for basic indexing, dereferencing,
// and increment operations on vectors and arrays/pointers.
// Assembly code was generated by gcc 4.1.0 invoked with g++ -O3 -S on a
// x86_64-suse-linux machine.
#include <vector>
struct S
{
int padding;
std::vector<int> v;
int * p;
std::vector<int>::iterator i;
};
int pointer_index (S & s) { return s.p[3]; }
// movq 32(%rdi), %rax
// movl 12(%rax), %eax
// ret
int vector_index (S & s) { return s.v[3]; }
// movq 8(%rdi), %rax
// movl 12(%rax), %eax
// ret
// Conclusion: Indexing a vector is the same damn thing as indexing a pointer.
int pointer_deref (S & s) { return *s.p; }
// movq 32(%rdi), %rax
// movl (%rax), %eax
// ret
int iterator_deref (S & s) { return *s.i; }
// movq 40(%rdi), %rax
// movl (%rax), %eax
// ret
// Conclusion: Dereferencing a vector iterator is the same damn thing
// as dereferencing a pointer.
void pointer_increment (S & s) { ++s.p; }
// addq $4, 32(%rdi)
// ret
void iterator_increment (S & s) { ++s.i; }
// addq $4, 40(%rdi)
// ret
// Conclusion: Incrementing a vector iterator is the same damn thing as
// incrementing a pointer.
Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to #bernie for reminding me).
Preamble for micro-optimizer people
Remember:
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%".
(Thanks to metamorphosis for the full quote)
Don't use a C array instead of a vector (or whatever) just because you believe it's faster as it is supposed to be lower-level. You would be wrong.
Use by default vector (or the safe container adapted to your need), and then if your profiler says it is a problem, see if you can optimize it, either by using a better algorithm, or changing container.
This said, we can go back to the original question.
Static/Dynamic Array?
The C++ array classes are better behaved than the low-level C array because they know a lot about themselves, and can answer questions C arrays can't. They are able to clean after themselves. And more importantly, they are usually written using templates and/or inlining, which means that what appears to a lot of code in debug resolves to little or no code produced in release build, meaning no difference with their built-in less safe competition.
All in all, it falls on two categories:
Dynamic arrays
Using a pointer to a malloc-ed/new-ed array will be at best as fast as the std::vector version, and a lot less safe (see litb's post).
So use a std::vector.
Static arrays
Using a static array will be at best:
as fast as the std::array version
and a lot less safe.
So use a std::array.
Uninitialized memory
Sometimes, using a vector instead of a raw buffer incurs a visible cost because the vector will initialize the buffer at construction, while the code it replaces didn't, as remarked bernie by in his answer.
If this is the case, then you can handle it by using a unique_ptr instead of a vector or, if the case is not exceptional in your codeline, actually write a class buffer_owner that will own that memory, and give you easy and safe access to it, including bonuses like resizing it (using realloc?), or whatever you need.
Vectors are arrays under the hood.
The performance is the same.
One place where you can run into a performance issue, is not sizing the vector correctly to begin with.
As a vector fills, it will resize itself, and that can imply, a new array allocation, followed by n copy constructors, followed by about n destructor calls, followed by an array delete.
If your construct/destruct is expensive, you are much better off making the vector the correct size to begin with.
There is a simple way to demonstrate this. Create a simple class that shows when it is constructed/destroyed/copied/assigned. Create a vector of these things, and start pushing them on the back end of the vector. When the vector fills, there will be a cascade of activity as the vector resizes. Then try it again with the vector sized to the expected number of elements. You will see the difference.
To respond to something Mehrdad said:
However, there might be cases where
you still need arrays. When
interfacing with low level code (i.e.
assembly) or old libraries that
require arrays, you might not be able
to use vectors.
Not true at all. Vectors degrade nicely into arrays/pointers if you use:
vector<double> vector;
vector.push_back(42);
double *array = &(*vector.begin());
// pass the array to whatever low-level code you have
This works for all major STL implementations. In the next standard, it will be required to work (even though it does just fine today).
You have even fewer reasons to use plain arrays in C++11.
There are 3 kind of arrays in nature from fastest to slowest, depending on the features they have (of course the quality of implementation can make things really fast even for case 3 in the list):
Static with size known at compile time. --- std::array<T, N>
Dynamic with size known at runtime and never resized. The typical optimization here is, that if the array can be allocated in the stack directly. -- Not available. Maybe dynarray in C++ TS after C++14. In C there are VLAs
Dynamic and resizable at runtime. --- std::vector<T>
For 1. plain static arrays with fixed number of elements, use std::array<T, N> in C++11.
For 2. fixed size arrays specified at runtime, but that won't change their size, there is discussion in C++14 but it has been moved to a technical specification and made out of C++14 finally.
For 3. std::vector<T> will usually ask for memory in the heap. This could have performance consequences, though you could use std::vector<T, MyAlloc<T>> to improve the situation with a custom allocator. The advantage compared to T mytype[] = new MyType[n]; is that you can resize it and that it will not decay to a pointer, as plain arrays do.
Use the standard library types mentioned to avoid arrays decaying to pointers. You will save debugging time and the performance is exactly the same as with plain arrays if you use the same set of features.
There is definitely a performance impact to using an std::vector vs a raw array when you want an uninitialized buffer (e.g. to use as destination for memcpy()). An std::vector will initialize all its elements using the default constructor. A raw array will not.
The c++ spec for the std:vector constructor taking a count argument (it's the third form) states:
`Constructs a new container from a variety of data sources, optionally using a user supplied allocator alloc.
Constructs the container with count default-inserted instances of T. No copies are made.
Complexity
2-3) Linear in count
A raw array does not incur this initialization cost.
Note that with a custom allocator, it is possible to avoid "initialization" of the vector's elements (i.e. to use default initialization instead of value initialization). See these questions for more details:
Is this behavior of vector::resize(size_type n) under C++11 and Boost.Container correct?
How can I avoid std::vector<> to initialize all its elements?
Go with STL. There's no performance penalty. The algorithms are very efficient and they do a good job of handling the kinds of details that most of us would not think about.
STL is a heavily optimized library. In fact, it's even suggested to use STL in games where high performance might be needed. Arrays are too error prone to be used in day to day tasks. Today's compilers are also very smart and can really produce excellent code with STL. If you know what you are doing, STL can usually provide the necessary performance. For example by initializing vectors to required size (if you know from start), you can basically achieve the array performance. However, there might be cases where you still need arrays. When interfacing with low level code (i.e. assembly) or old libraries that require arrays, you might not be able to use vectors.
About duli's contribution with my own measurements.
The conclusion is that arrays of integers are faster than vectors of integers (5 times in my example). However, arrays and vectors are arround the same speed for more complex / not aligned data.
If you compile the software in debug mode, many compilers will not inline the accessor functions of the vector. This will make the stl vector implementation much slower in circumstances where performance is an issue. It will also make the code easier to debug since you can see in the debugger how much memory was allocated.
In optimized mode, I would expect the stl vector to approach the efficiency of an array. This is since many of the vector methods are now inlined.
The performance difference between the two is very much implementation dependent - if you compare a badly implemented std::vector to an optimal array implementation, the array would win, but turn it around and the vector would win...
As long as you compare apples with apples (either both the array and the vector have a fixed number of elements, or both get resized dynamically) I would think that the performance difference is negligible as long as you follow got STL coding practise. Don't forget that using standard C++ containers also allows you to make use of the pre-rolled algorithms that are part of the standard C++ library and most of them are likely to be better performing than the average implementation of the same algorithm you build yourself.
That said, IMHO the vector wins in a debug scenario with a debug STL as most STL implementations with a proper debug mode can at least highlight/cathc the typical mistakes made by people when working with standard containers.
Oh, and don't forget that the array and the vector share the same memory layout so you can use vectors to pass data to legacy C or C++ code that expects basic arrays. Keep in mind that most bets are off in that scenario, though, and you're dealing with raw memory again.
If you're using vectors to represent multi-dimensional behavior, there is a performance hit.
Do 2d+ vectors cause a performance hit?
The gist is that there's a small amount of overhead with each sub-vector having size information, and there will not necessarily be serialization of data (as there is with multi-dimensional c arrays). This lack of serialization can offer greater than micro optimization opportunities. If you're doing multi-dimensional arrays, it may be best to just extend std::vector and roll your own get/set/resize bits function.
If you do not need to dynamically adjust the size, you have the memory overhead of saving the capacity (one pointer/size_t). That's it.
There might be some edge case where you have a vector access inside an inline function inside an inline function, where you've gone beyond what the compiler will inline and it will force a function call. That would be so rare as to not be worth worrying about - in general I would agree with litb.
I'm surprised nobody has mentioned this yet - don't worry about performance until it has been proven to be a problem, then benchmark.
I'd argue that the primary concern isn't performance, but safety. You can make a lot of mistakes with arrays (consider resizing, for example), where a vector would save you a lot of pain.
Vectors use a tiny bit more memory than arrays since they contain the size of the array. They also increase the hard disk size of programs and probably the memory footprint of programs. These increases are tiny, but may matter if you're working with an embedded system. Though most places where these differences matter are places where you would use C rather than C++.
The following simple test:
C++ Array vs Vector performance test explanation
contradicts the conclusions from "Comparison of assembly code generated for basic indexing, dereferencing, and increment operations on vectors and arrays/pointers."
There must be a difference between the arrays and vectors. The test says so... just try it, the code is there...
Sometimes arrays are indeed better than vectors. If you are always manipulating
a fixed length set of objects, arrays are better. Consider the following code snippets:
int main() {
int v[3];
v[0]=1; v[1]=2;v[2]=3;
int sum;
int starttime=time(NULL);
cout << starttime << endl;
for (int i=0;i<50000;i++)
for (int j=0;j<10000;j++) {
X x(v);
sum+=x.first();
}
int endtime=time(NULL);
cout << endtime << endl;
cout << endtime - starttime << endl;
}
where the vector version of X is
class X {
vector<int> vec;
public:
X(const vector<int>& v) {vec = v;}
int first() { return vec[0];}
};
and the array version of X is:
class X {
int f[3];
public:
X(int a[]) {f[0]=a[0]; f[1]=a[1];f[2]=a[2];}
int first() { return f[0];}
};
The array version will of main() will be faster because we are avoiding the
overhead of "new" everytime in the inner loop.
(This code was posted to comp.lang.c++ by me).
For fixed-length arrays the performance is the same (vs. vector<>) in release build, but in debug build low-level arrays win by a factor of 20 in my experience (MS Visual Studio 2015, C++ 11).
So the "save time debugging" argument in favor of STL might be valid if you (or your coworkers) tend to introduce bugs in your array usage, but maybe not if your debugging time is mostly waiting on your code to run to the point you are currently working on so that you can step through it.
Experienced developers working on numerically intensive code sometimes fall into the second group (especially if they use vector :) ).
Assuming a fixed-length array (e.g. int* v = new int[1000]; vs std::vector<int> v(1000);, with the size of v being kept fixed at 1000), the only performance consideration that really matters (or at least mattered to me when I was in a similar dilemma) is the speed of access to an element. I looked up the STL's vector code, and here is what I found:
const_reference
operator[](size_type __n) const
{ return *(this->_M_impl._M_start + __n); }
This function will most certainly be inlined by the compiler. So, as long as the only thing that you plan to do with v is access its elements with operator[], it seems like there shouldn't really be any difference in performance.
There is no argument about which of them is the best or good to use.They both have there own use cases,they both have their pros and cons.The behavior of both containers are different in different places.One of the main difficulty with arrays is that they are fixed in size if once they are defined or initialized then you can not change values and on the other side vectors are flexible, you can change vectors value whenever you want it's not fixed in size like arrays,because array has static memory allocation and vector has dynamic memory or heap memory allocation(we can push and pop elements into/from vector) and the creator of c++ Bjarne Stroustrup said that vectors are flexible to use more than arrays.
Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem you have to keep track of the size, and you need to delete them manually and do all sort of housekeeping.
We can also insert, push and pull values easily in vectors which is not easily possible in arrays.
If we talk about performance wise then if you are working with small values then you should use arrays and if you are working with big scale code then you should go with vector(vectors are good at handling big values then arrays).
Related
I am porting some C99 code that makes heavy use of variable length arrays (VLA) to C++.
I replaced the VLAs (stack allocation) with an array class that allocates memory on the heap. The performance hit was huge, a slowdown of a factor of 3.2 (see benchmarks below). What fast VLA replacement can I use in C++? My goal is to minimize performance hit when rewriting the code for C++.
One idea that was suggested to me was to write an array class that contains a fixed-size storage within the class (i.e. can be stack-allocated) and uses it for small arrays, and automatically switches to heap allocation for larger arrays. My implementation of this is at the end of the post. It works fairly well, but I still cannot reach the performance of the original C99 code. To come close to it, I must increase this fixed-size storage (MSL below) to sizes which I am not comfortable with. I don't want to allocate too-huge arrays on the stack even for the many small arrays that don't need it because I worry that it will trigger a stack overflow. A C99 VLA is actually less prone to this because it will never use more storage than needed.
I came upon std::dynarray, but my understanding is that it was not accepted into the standard (yet?).
I know that clang and gcc support VLAs in C++, but I need it to work with MSVC too. In fact better portability is one of the main goals of rewriting as C++ (the other goal being making the program, which was originally a command line tool, into a reusable library).
Benchmark
MSL refers to the array size above which I switch to heap-allocation. I use different values for 1D and 2D arrays.
Original C99 code: 115 seconds.
MSL = 0 (i.e. heap allocation): 367 seconds (3.2x).
1D-MSL = 50, 2D-MSL = 1000: 187 seconds (1.63x).
1D-MSL = 200, 2D-MSL = 4000: 143 seconds (1.24x).
1D-MSL = 1000, 2D-MSL = 20000: 131 (1.14x).
Increasing MSL further improves performance more, but eventually the program will start returning wrong results (I assume due to stack overflow).
These benchmarks are with clang 3.7 on OS X, but gcc 5 shows very similar results.
Code
This is the current "smallvector" implementation I use. I need 1D and 2D vectors. I switch to heap-allocation above size MSL.
template<typename T, size_t MSL=50>
class lad_vector {
const size_t len;
T sdata[MSL];
T *data;
public:
explicit lad_vector(size_t len_) : len(len_) {
if (len <= MSL)
data = &sdata[0];
else
data = new T[len];
}
~lad_vector() {
if (len > MSL)
delete [] data;
}
const T &operator [] (size_t i) const { return data[i]; }
T &operator [] (size_t i) { return data[i]; }
operator T * () { return data; }
};
template<typename T, size_t MSL=1000>
class lad_matrix {
const size_t rows, cols;
T sdata[MSL];
T *data;
public:
explicit lad_matrix(size_t rows_, size_t cols_) : rows(rows_), cols(cols_) {
if (rows*cols <= MSL)
data = &sdata[0];
else
data = new T[rows*cols];
}
~lad_matrix() {
if (rows*cols > MSL)
delete [] data;
}
T const * operator[] (size_t i) const { return &data[cols*i]; }
T * operator[] (size_t i) { return &data[cols*i]; }
};
Create a large buffer (MB+) in thread-local storage. (Actual memory on heap, management in TLS).
Allow clients to request memory from it in FILO manner (stack-like). (this mimics how it works in C VLAs; and it is efficient, as each request/return is just an integer addition/subtraction).
Get your VLA storage from it.
Wrap it pretty, so you can say stack_array<T> x(1024);, and have that stack_array deal with construction/destruction (note that ->~T() where T is int is a legal noop, and construction can similarly be a noop), or make stack_array<T> wrap a std::vector<T, TLS_stack_allocator>.
Data will be not as local as the C VLA data is because it will be effectively on a separate stack. You can use SBO (small buffer optimization), which is when locality really matters.
A SBO stack_array<T> can be implemented with an allocator and a std vector unioned with a std array, or with a unique ptr and custom destroyer, or a myriad of other ways. You can probably retrofit your solution, replacing your new/malloc/free/delete with calls to the above TLS storage.
I say go with TLS as that removes need for synchronization overhead while allowing multi-threaded use, and mirrors the fact that the stack itself is implicitly TLS.
Stack-buffer based STL allocator? is a SO Q&A with at least two "stack" allocators in the answers. They will need some adaption to automatically get their buffer from TLS.
Note that the TLS being one large buffer is in a sense an implementation detail. You could do large allocations, and when you run out of space do another large allocation. You just need to keep track each "stack page" current capacity and a list of stack pages, so when you empty one you can move onto an earlier one. That lets you be a bit more conservative in your TLS initial allocation without worrying about running OOM; the important part is that you are FILO and allocate rarely, not that the entire FILO buffer is one contiguous one.
I think you have already enumerated most options in your question and the comments.
Use std::vector. This is the most obvious, most hassle-free but maybe also the slowest solution.
Use platform-specific extensions on those platforms that provide them. For example, GCC supports variable-length arrays in C++ as an extension. POSIX specifies alloca which is widely supported to allocate memory on the stack. Even Microsoft Windows provides _malloca, as a quick web search told me.
In order to avoid maintenance nightmares, you'll really want to encapsulate these platform dependencies into an abstract interface that automatically and transparently chooses the appropriate mechanism for the current platform. Implementing this for all platforms will be a bit of work but if this single feature accounts for 3 × speed differences as you're reporting, it might be worth it. As a fallback for unknown platforms, I'd keep std::vector in reserve as a last resort. It is better to run slow but correctly than to behave erratic or not run at all.
Build your own variable-sized array type that implements a “small array” optimization embedded as a buffer inside the object itself as you have shown in your question. I'll just note that I'd rather try using a union of a std::array and a std::vector instead of rolling my own container.
Once you have a custom type in place, you can do interesting profiling such as maintaining a global hash table of all occurrences of this type (by source-code location) and recording each allocation size during a stress test of your program. You can then dump the hash table at program exit and plot the distributions in allocation sizes for the individual arrays. This might help you to fine-tune the amount of storage to reserve for each array individually on the stack.
Use a std::vector with a custom allocator. At program startup, allocate a few megabytes of memory and give it to a simple stack allocator. For a stack allocator, allocation is just comparing and adding two integers and deallocation is simply a subtraction. I doubt that the compiler-generated stack allocation can be much faster. Your “array stack” would then pulsate correlated to your “program stack”. This design would also have the advantage that accidental buffer overruns – while still invoking undefined behavior, trashing random data and all that bad stuff – wouldn't as easily corrupt the program stack (return addresses) as they would with native VLAs.
Custom allocators in C++ are a somewhat dirty business but some people do report they're using them successfully. (I don't have much experience with using them myself.) You might want to start looking at cppreference. Alisdair Meredith who is one of those people that promote the usage of custom allocators gave a double-session talk at CppCon'14 titled “Making Allocators Work” (part 1, part 2) that you might find interesting as well. If the std::allocator interface it too awkward to use for you, implementing your own variable (as opposed to dynamically) sized array class with your own allocator should be doable as well.
Regarding support for MSVC:
MSVC has _alloca which allocates stack space. It also has _malloca which allocates stack space if there is enough free stack space, otherwise falls back to dynamic allocation.
You cannot take advantage of the VLA type system, so you would have to change your code to work based in a pointer to first element of such an array.
You may end up needing to use a macro which has different definitions depending on the platform. E.g. invoke _alloca or _malloca on MSVC, and on g++ or other compilers, either calls alloca (if they support it), or makes a VLA and a pointer.
Consider investigating ways to rewrite the code without needing to allocate an unknown amount of stack. One option is to allocate a fixed-size buffer that is the maximum you will need. (If that would cause stack overflow it means your code is bugged anyway).
The book C++ Primer, 5th edition by Stanley B. Lippman (ISBN 0-321-71411-3/978-0-321-71411-4) mentions:
An [std::]array is a safer, easier-to-use alternative to built-in arrays.
What's wrong with built-in arrays?
A built-in array is a contiguous block of bytes, usually on the stack. You really have no decent way to keep useful information about the array, its boundaries or its state. std::array keeps this information.
Built-in arrays are decayed into pointers when passed from/to functions. This may cause:
When passing a built-in array, you pass a raw pointer. A pointer doesn't keep any information about the size of the array. You will have to pass along the size of the array and thus uglify the code. std::array can be passed as reference, copy or move.
There is no way of returning a built-in array, you will eventually return a pointer to local variable if the array was declared in that function scope.
std::array can be returned safely, because it's an object and its lifetime is managed automatically.
You can't really do useful stuff on built-in arrays such as assigning, moving or copying them. You'll end writing a customized function for each built-in array (possibly using templates). std::array can be assigned.
By accessing an element which is out of the array boundaries, you are triggering undefined behaviour. std::array::at will preform boundary checking and throw a regular C++ exception if the check fails.
Better readability: built in arrays involves pointers arithmetic. std::array implements useful functions like front, back, begin and end to avoid that.
Let's say I want to sort a built-in array, the code could look like:
int arr[7] = {/*...*/};
std::sort(arr, arr+7);
This is not the most robust code ever. By changing 7 to a different number, the code breaks.
With std::array:
std::array<int,7> arr{/*...*/};
std::sort(arr.begin(), arr.end());
The code is much more robust and flexible.
Just to make things clear, built-in arrays can sometimes be easier. For example, many Windows as well as UNIX API functions/syscalls require some (small) buffers to fill with data. I wouldn't go with the overhead of std::array instead of a simple char[MAX_PATH] that I may be using.
It's hard to gauge what the author meant, but I would guess they are referring to the following facts about native arrays:
they are raw
There is no .at member function you can use for element access with bounds checking, though I'd counter that you usually don't want that anyway. Either you're accessing an element you know exists, or you're iterating (which you can do equally well with std::array and native arrays); if you don't know the element exists, a bounds-checking accessor is already a pretty poor way to ascertain that, as it is using the wrong tool for code flow and it comes with a substantial performance penalty.
they can be confusing
Newbies tend to forget about array name decay, passing arrays into functions "by value" then performing sizeof on the ensuing pointer; this is not generally "unsafe", but it will create bugs.
they can't be assigned
Again, not inherently unsafe, but it leads to silly people writing silly code with multiple levels of pointers and lots of dynamic allocation, then losing track of their memory and committing all sorts of UB crimes.
Assuming the author is recommending std::array, that would be because it "fixes" all of the above things, leading to generally better code by default.
But are native arrays somehow inherently "unsafe" by comparison? No, I wouldn't say so.
How is std::array safer and easier-to-use than a built-in array?
It's easy to mess up with built-in arrays, especially for programmers who aren't C++ experts and programmers who sometimes make mistakes. This causes many bugs and security vulnerabilities.
With a std::array a1, you can access an element with bounds checking a.at(i) or without bounds checking a[i]. With a built-in array, it's always your responsibility to diligently avoid out-of-bounds accesses. Otherwise the code can smash some memory that goes unnoticed for a long time and becomes very difficult to debug. Even just reading outside an array's bounds can be exploited for security holes like the Heartbleed bug that divulges private encryption keys.
C++ tutorials may pretend that array-of-T and pointer-to-T are the same thing, then later tell you about various exceptions where they are not the same thing. E.g. an array-of-T in a struct is embedded in the struct, while a pointer-to-T in a struct is a pointer to memory that you'd better allocate. Or consider an array of arrays (such as a raster image). Does auto-increment the pointer to the next pixel or the next row? Or consider an array of objects where an object pointer coerces to its base class pointer. All this is complicated and the compiler doesn't catch mistakes.
With a std::array a1, you can get its size a1.size(), compare its contents to another std::array a1 == a2, and use other standard container methods like a1.swap(a2). With built-in arrays, these operations take more programming work and are easier to mess up. E.g. given int b1[] = {10, 20, 30}; to get its size without hard-coding 3, you must do sizeof(b1) / sizeof(b1[0]). To compare its contents, you must loop over those elements.
You can pass a std::array to a function by reference f(&a1) or by value f(a1) [i.e. by copy]. Passing a built-in array only goes by reference and confounds it with a pointer to the first element. That's not the same thing. The compiler doesn't pass the array size.
You can return a std::array from a function by value, return a1. Returning a built-in array return b1 returns a dangling pointer, which is broken.
You can copy a std::array in the usual way, a1 = a2, even if it contains objects with constructors. If you try that with built-in arrays, b1 = b2, it'll just copy the array pointer (or fail to compile, depending on how b2 is declared). You can get around that using memcpy(b1, b2, sizeof(b1) / sizeof(b1[0])), but this is broken if the arrays have different sizes or if they contain elements with constructors.
You can easily change code that uses std::array to use another container like std::vector or std::map.
See the C++ FAQ Why should I use container classes rather than simple arrays? to learn more, e.g. the perils of built-in arrays containing C++ objects with destructors (like std::string) or inheritance.
Don't Freak Out About Performance
Bounds-checking access a1.at(i) requires a few more instructions each time you fetch or store an array element. In some inner loop code that jams through a large array (e.g. an image processing routine that you call on every video frame), this cost might add up enough to matter. In that rare case it makes sense to use unchecked access a[i] and carefully ensure that the loop code takes care with bounds.
In most code you're either offloading the image processing code to the GPU, or the bounds-checking cost is a tiny fraction of the overall run time, or the overall run time is not at issue. Meanwhile the risk of array access bugs is high, starting with the hours it takes you to debug it.
The only benefit of a built-in array would be slightly more concise declaration syntax. But the functional benefits of std::array blow that out of the water.
I would also add that it really doesn't matter that much. If you have to support older compilers, then you don't have a choice, of course, since std::array is only for C++11. Otherwise, you can use whichever you like, but unless you make only trivial use of the array, you should prefer std::array just to keep things in line with other STL containers (e.g., what if you later decide to make the size dynamic, and use std::vector instead, then you will be happy that you used std::array because all you will have to change is probably the array declaration itself, and the rest will be the same, especially if you use auto and other type-inference features of C++11.
std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.
Arrays are used to store a sequence of objects
Check the tutorial: http://www.cplusplus.com/doc/tutorial/arrays/
A std::vector does the same but it's better than built-in arrays (e.g: in general, vector hasn't much efficiency difference than built
in arrays when accessing elements via operator[]): http://www.cplusplus.com/reference/stl/vector/
The built-in arrays are a major source of errors – especially when they are used to build multidimensional arrays.
For novices, they are also a major source of confusion. Wherever possible, use vector, list, valarray, string, etc.
STL containers don't have the same problems as built in arrays
So, there is no reason in C++ to persist in using built-in arrays. Built-in arrays are in C++ mainly for backwards compatibility with C.
If the OP really wants an array, C++11 provides a wrapper for the built-in array, std::array. Using std::array is very similar to using the built-in array has no effect on their run-time performance, with much more features.
Unlike with the other containers in the Standard Library, swapping two array containers is a linear operation that involves swapping all the elements in the ranges individually, which generally is a considerably less efficient operation. On the other side, this allows the iterators to elements in both containers to keep their original container association.
Another unique feature of array containers is that they can be treated as tuple objects: The header overloads the get function to access the elements of the array as if it was a tuple, as well as specialized tuple_size and tuple_element types.
Anyway, built-in arrays are all ways passed by reference. The reason for this is when you pass an array to a function as a argument, pointer to it's first element is passed.
when you say void f(T[] array) compiler will turn it into void f(T* array)
When it comes to strings. C-style strings (i.e. null terminated character sequences) are all ways passed by reference since they are 'char' arrays too.
STL strings are not passed by reference by default. They act like normal variables.
There are no predefined rules for making parameter pass by reference. Even though the arrays are always passed by reference automatically.
vector<vector<double>> G1=connectivity( current_combination,M,q2+1,P );
vector<vector<double>> G2=connectivity( circshift_1_dexia(current_combination),M,q1+1,P );
This could also be copying vectors since connectivity returns a vector by value. In some cases, the compiler will optimize this out. To avoid this for sure though, you can pass the vector as non-const reference to connectivity rather than returning them. The return value of maxweight is a 3-dimensional vector returned by value (which may make a copy of it).
Vectors are only efficient for insert or erase at the end, and it is best to call reserve() if you are going to push_back a lot of values. You may be able to re-write it using list if you don't really need random access; with list you lose the subscript operator, but you can still make linear passes through, and save iterators to elements, rather than subscripts.
With some compilers, it can be faster to use pre-increment, rather than post-increment. Prefer ++i to i++ unless you actually need to use the post-increment. They are not the same.
Anyway, vector is going to be horribly slow if you are not compiling with optimization on. With optimization, it is close to built-in arrays. Built-in arrays can be quite slow without optimization on also, but not as bad as vector.
std::array has the at member function which is safe. It also have begin, end, size which you can use to make your code safer.
Raw arrays don't have that. (In particular, when raw arrays are decayed to pointers -e.g. when passed as arguments-, you lose any size information, which is kept in the std::array type since it is a template with the size as argument)
And a good optimizing C++11 compiler will handle std::array (or references to them) as efficiently as raw arrays.
Built-in arrays are not inherently unsafe - if used correctly. But it is easier to use built-in arrays incorrectly than it is to use alternatives, such as std::array, incorrectly and these alternatives usually offer better debugging features to help you detect when they have been used incorrectly.
Built-in arrays are subtle. There are lots of aspects that behave unexpectedly, even to experienced programmers.
std::array<T, N> is truly a wrapper around a T[N], but many of the already mentioned aspects are sorted out for free essentially, which is very optimal and something you want to have.
These are some I haven't read:
Size: N shall be a constant expression, it cannot be variable, for both. However, with built-in arrays, there's VLA(Variable Length Array) that allows that as well.
Officially, only C99 supports them. Still, many compilers allow that in previous versions of C and C++ as extensions. Therefore, you may have
int n; std::cin >> n;
int array[n]; // ill-formed but often accepted
that compiles fine. Had you used std::array, this could never work because N is required and checked to be an actual constant expression!
Expectations for arrays: A common flawed expectation is that the size of the array is carried along with the array itself when the array isn't really an array anymore, due to the kept misleading C syntax as in:
void foo(int array[])
{
// iterate over the elements
for (int i = 0; i < sizeof(array); ++i)
array[i] = 0;
}
but this is wrong because array has already decayed to a pointer, which has no information about the size of the pointed area. That misconception triggers undefined behavior if array has less than sizeof(int*) elements, typically 8, apart from being logically erroneous.
Crazy uses: Even further on that, there are some quirks arrays have:
Whether you have array[i] or i[array], there is no difference. This is not true for array, because calling an overloaded operator is effectively a function call and the order of the parameters matters.
Zero-sized arrays: N shall be greater than zero but it is still allowed as an extensions and, as before, often not warned about unless more pedantry is required. Further information here.
array has different semantics:
There is a special case for a zero-length array (N == 0). In that
case, array.begin() == array.end(), which is some unique value. The
effect of calling front() or back() on a zero-sized array is
undefined.
Between std::vector and std::array in TR1 and C++11, there are safe alternatives for both dynamic and fixed-size arrays which know their own length and don't exhibit horrible pointer/array duality.
So my question is, are there any circumstances in C++ when C arrays must be used (other than calling C library code), or is it reasonable to "ban" them altogether?
EDIT:
Thanks for the responses everybody, but it turns out this question is a duplicate of
Now that we have std::array what uses are left for C-style arrays?
so I'll direct everybody to look there instead.
[I'm not sure how to close my own question, but if a moderator (or a few more people with votes) wander past, please feel free to mark this as a dup and delete this sentence.]
I didnt want to answer this at first, but Im already getting worried that this question is going to be swamped with C programmers, or people who write C++ as object oriented C.
The real answer is that in idiomatic C++ there is almost never ever a reason to use a C style array. Even when using a C style code base, I usually use vectors. How is that possible, you say? Well, if you have a vector v and a C style function requires a pointer to be passed in, you can pass &v[0] (or better yet, v.data() which is the same thing).
Even for performance, its very rare that you can make a case for a C style array. A std::vector does involve a double indirection but I believe this is generally optimized away. If you dont trust the compiler (which is almost always a terrible move), then you can always use the same technique as above with v.data() to grab a pointer for your tight loop. For std::array, I believe the wrapper is even thinner.
You should only use one if you are an awesome programmer and you know exactly why you are doing it, or if an awesome programmer looks at your problem and tells you to. If you arent awesome and you are using C style arrays, the chances are high (but not 100%) that you are making a mistake,
Foo data[] = {
is a pretty common pattern. Elements can be added to it easily, and the size of the data array grows based on the elements added.
With C++11 you can replicate this with a std::array:
template<class T, class... Args>
auto make_array( Args&&... args )
-> std::array< T, sizeof...(Args) >
{
return { std::forward<Args>(args)... };
}
but even this isn't as good as one might like, as it does not support nested brackets like a C array does.
Suppose Foo was struct Foo { int x; double y; };. Then with C style arrays we can:
Foo arr[] = {
{1,2.2},
{3,4.5},
};
meanwhile
auto arr = make_array<Foo>(
{1,2.2},
{3,4.5}
};
does not compile. You'd have to repeat Foo for each line:
auto arr = make_array<Foo>(
Foo{1,2.2},
Foo{3,4.5}
};
which is copy-paste noise that can get in the way of the code being expressive.
Finally, note that "hello" is a const array of size 6. Code needs to know how to consume C-style arrays.
My typical response to this situation is to convert C-style arrays and C++ std::arrays into array_views, a range that consists of two pointers, and operate on them. This means I do not care if I was fed an array based on C or C++ syntax: I just care I was fed a packed sequence of data elements. These can also consume std::dynarrays and std::vectors with little work.
It did require writing an array_view, or stealing one from boost, or waiting for it to be added to the standard.
Sometimes an exsisting code base can force you to use them
The last time I needed to use them in new code was when I was doing embedded work and the standard library just didn't have an implementation of std::vector or std::array. In some older code bases you have to use arrays because of design decisions made by the previous developers.
In most cases if you are starting a new project with C++11 the old C style arrays are a fairly poor choice. This is because relative to std::array they are difficult to get correct and this difficulty is a direct expense when developing. This C++ FAQ entry sums up my thoughts on the matter fairly well: http://www.parashift.com/c++-faq/arrays-are-evil.html
Pre-C++14: In some (rare) cases, the missing initialization of types like int can improve the execution speed notably. Especially if some algorithm needs many short-lived arrays during his execution and the machine has not enough memory for pre-allocating making sense and/or the sizes could not be known first
C-style arrays are very useful in embedded system where memory is constrained (and severely limited).
The arrays allow for programming without dynamic memory allocation. Dynamic memory allocation generates fragmented memory and at some point in run-time, the memory has to be defragmented. In safety critical systems, defragmentation cannot occur during the periods that have critical timing.
The const arrays allow for data to be put into Read Only Memory or Flash memory, out of the precious RAM area. The data can be directly accessed and does not require any additional initialization time, as with std::vector or std::array.
The C-style array is a convenient tool to place raw data into a program. For example, bitmap data for images or fonts. In smaller embedded systems with no hard drives or flash drives, the data must directly accessed. C-style arrays allow for this.
Edit 1:
Also, std::array cannot be used with compiler that don't support C++11 or afterwards.
Many companies do not want to switch compilers once a project has started. Also, they may need to keep the compiler version around for maintenance fixes, and when Agencies require the company to reproduce an issue with a specified software version of the product.
I found just one reason today : when you want to know preciselly the size of the data block and control it for aligning in a giant data block .
This is usefull when your are dealing with stream processors or Streaming extensions like AVX or SSE.
Control the data block allocation to a huge single aligned block in memory is usefull. Your objects can manipulate the segments they are responsible and, when they finished , you can move and/or process the huge vector in an aligned way .
In a comparison on performance between Java and C++ our professor claimed, that theres no benefit in choosing a native array (dynamic array like int * a = new int[NUM] ) over std::vector. She claimed too, that the optimized vector can be faster, but didn't tell us why. What is the magic behind the vector?
PLEASE just answers on performance, this is not a general poll!
Any super optimized code with lower level stuff like raw arrays can beat or tie the performance of an std::vector. But the benefits of having a vector far outweigh any small performance gains you get from low level code.
vector manages it's own memory, array you have to remember to manage the memory
vector allows you to make use of stdlib more easily (with dynamic arrays you have to keep track of the end ptr yourself) which is all very well written well tested code
sometimes vector can even be faster, like qsort v std::sort, read this article, keep in mind the reason this can happen is because writing higher level code can often allow you to reason better about the problem at hand.
Since the std containers are good to use for everything else, that dynamic arrays don't cover, keeping code consistent in style makes it more readable and less prone to errors.
One thing I would keep in mind is that you shouldn't compare a dynamic array to a std::vector they are two different things. It should be compared to something like std::dynarray which unfortunately probably won't make it into c++14 (boost prolly has one, and I'm sure there are reference implementations lying around). A std::dynarray implementation is unlikely to have any performance difference from a native array.
There is benefit of using vector instead of array when the number of elements needs to be changed.
Neither optimized vector can be faster than array.
The only performance optimization a std::vector can offer over a plain array is, that it can request more memory than currently needed. std::vector::reserve does just that. Adding elements up to its capacity() will not involve any more allocations.
This can be implemented with plain arrays as well, though.
I know that manual dynamic memory allocation is a bad idea in general, but is it sometimes a better solution than using, say, std::vector?
To give a crude example, if I had to store an array of n integers, where n <= 16, say. I could implement it using
int* data = new int[n]; //assuming n is set beforehand
or using a vector:
std::vector<int> data;
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
It is always better to use std::vector/std::array, at least until you can conclusively prove (through profiling) that the T* a = new T[100]; solution is considerably faster in your specific situation. This is unlikely to happen: vector/array is an extremely thin layer around a plain old array. There is some overhead to bounds checking with vector::at, but you can circumvent that by using operator[].
I can't think of any case where dynamically allocating a C style
vector makes sense. (I've been working in C++ for over 25
years, and I've yet to use new[].) Usually, if I know the
size up front, I'll use something like:
std::vector<int> data( n );
to get an already sized vector, rather than using push_back.
Of course, if n is very small and is known at compile time,
I'll use std::array (if I have access to C++11), or even
a C style array, and just create the object on the stack, with
no dynamic allocation. (Such cases seem to be rare in the
code I work on; small fixed size arrays tend to be members of
classes. Where I do occasionally use a C style array.)
If you know the size in advance (especially at compile time), and don't need the dynamic re-sizing abilities of std::vector, then using something simpler is fine.
However, that something should preferably be std::array if you have C++11, or something like boost::scoped_array otherwise.
I doubt there'll be much efficiency gain unless it significantly reduces code size or something, but it's more expressive which is worthwhile anyway.
You should try to avoid C-style-arrays in C++ whenever possible. The STL provides containers which usually suffice for every need. Just imagine reallocation for an array or deleting elements out of its middle. The container shields you from handling this, while you would have to take care of it for yourself, and if you haven't done this a hundred times it is quite error-prone.
An exception is of course, if you are adressing low-level-issues which might not be able to cope with STL-containers.
There have already been some discussion about this topic. See here on SO.
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
Call me a simpleton, but 99.9999...% of the times I would just use a standard container. The default choice should be std::vector, but also std::deque<> could be a reasonable option sometimes. If the size is known at compile-time, opt for std::array<>, which is a lightweight, safe wrapper of C-style arrays which introduces zero overhead.
Standard containers expose member functions to specify the initial reserved amount of memory, so you won't have troubles with reallocations, and you won't have to remember delete[]ing your array. I honestly do not see why one should use manual memory management.
Efficiency shouldn't be an issue, since you have throwing and non-throwing member functions to access the contained elements, so you have a choice whether to favor safety or performance.
std::vector could be constructed with an size_type parameter that instantiate the vector with the specified number of elements and that does a single dynamic allocation (same as your array) and also you can use reserve to decrease the number of re-allocations over the usage time.
In n is known at compile-time, then you should choose std::array as:
std::array<int, n> data; //n is compile-time constant
and if n is not known at compile-time, OR the array might grow at runtime, then go for std::vector:
std::vector<int> data(n); //n may be known at runtime
Or in some cases, you may also prefer std::deque which is faster than std::vector in some scenario. See these:
C++ benchmark – std::vector VS std::list VS std::deque
Using Vector and Deque by Herb Sutter
Hope that helps.
From a perspective of someone who often works with low level code with C++, std vectors are really just helper methods with a safety net for a classic C style array. The only overheads you'd experience realistically are memory allocations and safety checks for boundaries. If you're writing a program which needs performance and are going to be using vectors as a regular array I'd recommend to just use C style arrays instead of vectors. You should realistically be vetting the data that comes into the application and check the boundaries yourself to avoid checks on every memory access to the array.
It's good to see that others are checking the differences of the C ways and the C++ ways. More often than not C++ standard methods have significantly worse performance and uglier syntax than their C counterparts and is generally the reason people call C++ bloated. I think C++ focuses more on safety and making the language more like JavaScript/C# even though the language fundamentally lacks the foundation to be one.