Related
C++ inherited arrays from C where they are used virtually everywhere. C++ provides abstractions that are easier to use and less error-prone (std::vector<T> since C++98 and std::array<T, n> since C++11), so the need for arrays does not arise quite as often as it does in C. However, when you read legacy code or interact with a library written in C, you should have a firm grasp on how arrays work.
This FAQ is split into five parts:
arrays on the type level and accessing elements
array creation and initialization
assignment and parameter passing
multidimensional arrays and arrays of pointers
common pitfalls when using arrays
If you feel something important is missing in this FAQ, write an answer and link it here as an additional part.
In the following text, "array" means "C array", not the class template std::array. Basic knowledge of the C declarator syntax is assumed. Note that the manual usage of new and delete as demonstrated below is extremely dangerous in the face of exceptions, but that is the topic of another FAQ.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
Arrays on the type level
An array type is denoted as T[n] where T is the element type and n is a positive size, the number of elements in the array. The array type is a product type of the element type and the size. If one or both of those ingredients differ, you get a distinct type:
#include <type_traits>
static_assert(!std::is_same<int[8], float[8]>::value, "distinct element type");
static_assert(!std::is_same<int[8], int[9]>::value, "distinct size");
Note that the size is part of the type, that is, array types of different size are incompatible types that have absolutely nothing to do with each other. sizeof(T[n]) is equivalent to n * sizeof(T).
Array-to-pointer decay
The only "connection" between T[n] and T[m] is that both types can implicitly be converted to T*, and the result of this conversion is a pointer to the first element of the array. That is, anywhere a T* is required, you can provide a T[n], and the compiler will silently provide that pointer:
+---+---+---+---+---+---+---+---+
the_actual_array: | | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^
|
|
|
| pointer_to_the_first_element int*
This conversion is known as "array-to-pointer decay", and it is a major source of confusion. The size of the array is lost in this process, since it is no longer part of the type (T*). Pro: Forgetting the size of an array on the type level allows a pointer to point to the first element of an array of any size. Con: Given a pointer to the first (or any other) element of an array, there is no way to detect how large that array is or where exactly the pointer points to relative to the bounds of the array. Pointers are extremely stupid.
Arrays are not pointers
The compiler will silently generate a pointer to the first element of an array whenever it is deemed useful, that is, whenever an operation would fail on an array but succeed on a pointer. This conversion from array to pointer is trivial, since the resulting pointer value is simply the address of the array. Note that the pointer is not stored as part of the array itself (or anywhere else in memory). An array is not a pointer.
static_assert(!std::is_same<int[8], int*>::value, "an array is not a pointer");
One important context in which an array does not decay into a pointer to its first element is when the & operator is applied to it. In that case, the & operator yields a pointer to the entire array, not just a pointer to its first element. Although in that case the values (the addresses) are the same, a pointer to the first element of an array and a pointer to the entire array are completely distinct types:
static_assert(!std::is_same<int*, int(*)[8]>::value, "distinct element type");
The following ASCII art explains this distinction:
+-----------------------------------+
| +---+---+---+---+---+---+---+---+ |
+---> | | | | | | | | | | | int[8]
| | +---+---+---+---+---+---+---+---+ |
| +---^-------------------------------+
| |
| |
| |
| | pointer_to_the_first_element int*
|
| pointer_to_the_entire_array int(*)[8]
Note how the pointer to the first element only points to a single integer (depicted as a small box), whereas the pointer to the entire array points to an array of 8 integers (depicted as a large box).
The same situation arises in classes and is maybe more obvious. A pointer to an object and a pointer to its first data member have the same value (the same address), yet they are completely distinct types.
If you are unfamiliar with the C declarator syntax, the parenthesis in the type int(*)[8] are essential:
int(*)[8] is a pointer to an array of 8 integers.
int*[8] is an array of 8 pointers, each element of type int*.
Accessing elements
C++ provides two syntactic variations to access individual elements of an array.
Neither of them is superior to the other, and you should familiarize yourself with both.
Pointer arithmetic
Given a pointer p to the first element of an array, the expression p+i yields a pointer to the i-th element of the array. By dereferencing that pointer afterwards, one can access individual elements:
std::cout << *(x+3) << ", " << *(x+7) << std::endl;
If x denotes an array, then array-to-pointer decay will kick in, because adding an array and an integer is meaningless (there is no plus operation on arrays), but adding a pointer and an integer makes sense:
+---+---+---+---+---+---+---+---+
x: | | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^ ^ ^
| | |
| | |
| | |
x+0 | x+3 | x+7 | int*
(Note that the implicitly generated pointer has no name, so I wrote x+0 in order to identify it.)
If, on the other hand, x denotes a pointer to the first (or any other) element of an array, then array-to-pointer decay is not necessary, because the pointer on which i is going to be added already exists:
+---+---+---+---+---+---+---+---+
| | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^ ^ ^
| | |
| | |
+-|-+ | |
x: | | | x+3 | x+7 | int*
+---+
Note that in the depicted case, x is a pointer variable (discernible by the small box next to x), but it could just as well be the result of a function returning a pointer (or any other expression of type T*).
Indexing operator
Since the syntax *(x+i) is a bit clumsy, C++ provides the alternative syntax x[i]:
std::cout << x[3] << ", " << x[7] << std::endl;
Due to the fact that addition is commutative, the following code does exactly the same:
std::cout << 3[x] << ", " << 7[x] << std::endl;
The definition of the indexing operator leads to the following interesting equivalence:
&x[i] == &*(x+i) == x+i
However, &x[0] is generally not equivalent to x. The former is a pointer, the latter an array. Only when the context triggers array-to-pointer decay can x and &x[0] be used interchangeably. For example:
T* p = &array[0]; // rewritten as &*(array+0), decay happens due to the addition
T* q = array; // decay happens due to the assignment
On the first line, the compiler detects an assignment from a pointer to a pointer, which trivially succeeds. On the second line, it detects an assignment from an array to a pointer. Since this is meaningless (but pointer to pointer assignment makes sense), array-to-pointer decay kicks in as usual.
Ranges
An array of type T[n] has n elements, indexed from 0 to n-1; there is no element n. And yet, to support half-open ranges (where the beginning is inclusive and the end is exclusive), C++ allows the computation of a pointer to the (non-existent) n-th element, but it is illegal to dereference that pointer:
+---+---+---+---+---+---+---+---+....
x: | | | | | | | | | . int[8]
+---+---+---+---+---+---+---+---+....
^ ^
| |
| |
| |
x+0 | x+8 | int*
For example, if you want to sort an array, both of the following would work equally well:
std::sort(x + 0, x + n);
std::sort(&x[0], &x[0] + n);
Note that it is illegal to provide &x[n] as the second argument since this is equivalent to &*(x+n), and the sub-expression *(x+n) technically invokes undefined behavior in C++ (but not in C99).
Also note that you could simply provide x as the first argument. That is a little too terse for my taste, and it also makes template argument deduction a bit harder for the compiler, because in that case the first argument is an array but the second argument is a pointer. (Again, array-to-pointer decay kicks in.)
Programmers often confuse multidimensional arrays with arrays of pointers.
Multidimensional arrays
Most programmers are familiar with named multidimensional arrays, but many are unaware of the fact that multidimensional array can also be created anonymously. Multidimensional arrays are often referred to as "arrays of arrays" or "true multidimensional arrays".
Named multidimensional arrays
When using named multidimensional arrays, all dimensions must be known at compile time:
int H = read_int();
int W = read_int();
int connect_four[6][7]; // okay
int connect_four[H][7]; // ISO C++ forbids variable length array
int connect_four[6][W]; // ISO C++ forbids variable length array
int connect_four[H][W]; // ISO C++ forbids variable length array
This is how a named multidimensional array looks like in memory:
+---+---+---+---+---+---+---+
connect_four: | | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
Note that 2D grids such as the above are merely helpful visualizations. From the point of view of C++, memory is a "flat" sequence of bytes. The elements of a multidimensional array are stored in row-major order. That is, connect_four[0][6] and connect_four[1][0] are neighbors in memory. In fact, connect_four[0][7] and connect_four[1][0] denote the same element! This means that you can take multi-dimensional arrays and treat them as large, one-dimensional arrays:
int* p = &connect_four[0][0];
int* q = p + 42;
some_int_sequence_algorithm(p, q);
Anonymous multidimensional arrays
With anonymous multidimensional arrays, all dimensions except the first must be known at compile time:
int (*p)[7] = new int[6][7]; // okay
int (*p)[7] = new int[H][7]; // okay
int (*p)[W] = new int[6][W]; // ISO C++ forbids variable length array
int (*p)[W] = new int[H][W]; // ISO C++ forbids variable length array
This is how an anonymous multidimensional array looks like in memory:
+---+---+---+---+---+---+---+
+---> | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
|
+-|-+
p: | | |
+---+
Note that the array itself is still allocated as a single block in memory.
Arrays of pointers
You can overcome the restriction of fixed width by introducing another level of indirection.
Named arrays of pointers
Here is a named array of five pointers which are initialized with anonymous arrays of different lengths:
int* triangle[5];
for (int i = 0; i < 5; ++i)
{
triangle[i] = new int[5 - i];
}
// ...
for (int i = 0; i < 5; ++i)
{
delete[] triangle[i];
}
And here is how it looks like in memory:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
^
| +---+---+---+---+
| | | | | |
| +---+---+---+---+
| ^
| | +---+---+---+
| | | | | |
| | +---+---+---+
| | ^
| | | +---+---+
| | | | | |
| | | +---+---+
| | | ^
| | | | +---+
| | | | | |
| | | | +---+
| | | | ^
| | | | |
| | | | |
+-|-+-|-+-|-+-|-+-|-+
triangle: | | | | | | | | | | |
+---+---+---+---+---+
Since each line is allocated individually now, viewing 2D arrays as 1D arrays does not work anymore.
Anonymous arrays of pointers
Here is an anonymous array of 5 (or any other number of) pointers which are initialized with anonymous arrays of different lengths:
int n = calculate_five(); // or any other number
int** p = new int*[n];
for (int i = 0; i < n; ++i)
{
p[i] = new int[n - i];
}
// ...
for (int i = 0; i < n; ++i)
{
delete[] p[i];
}
delete[] p; // note the extra delete[] !
And here is how it looks like in memory:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
^
| +---+---+---+---+
| | | | | |
| +---+---+---+---+
| ^
| | +---+---+---+
| | | | | |
| | +---+---+---+
| | ^
| | | +---+---+
| | | | | |
| | | +---+---+
| | | ^
| | | | +---+
| | | | | |
| | | | +---+
| | | | ^
| | | | |
| | | | |
+-|-+-|-+-|-+-|-+-|-+
| | | | | | | | | | |
+---+---+---+---+---+
^
|
|
+-|-+
p: | | |
+---+
Conversions
Array-to-pointer decay naturally extends to arrays of arrays and arrays of pointers:
int array_of_arrays[6][7];
int (*pointer_to_array)[7] = array_of_arrays;
int* array_of_pointers[6];
int** pointer_to_pointer = array_of_pointers;
However, there is no implicit conversion from T[h][w] to T**. If such an implicit conversion did exist, the result would be a pointer to the first element of an array of h pointers to T (each pointing to the first element of a line in the original 2D array), but that pointer array does not exist anywhere in memory yet. If you want such a conversion, you must create and fill the required pointer array manually:
int connect_four[6][7];
int** p = new int*[6];
for (int i = 0; i < 6; ++i)
{
p[i] = connect_four[i];
}
// ...
delete[] p;
Note that this generates a view of the original multidimensional array. If you need a copy instead, you must create extra arrays and copy the data yourself:
int connect_four[6][7];
int** p = new int*[6];
for (int i = 0; i < 6; ++i)
{
p[i] = new int[7];
std::copy(connect_four[i], connect_four[i + 1], p[i]);
}
// ...
for (int i = 0; i < 6; ++i)
{
delete[] p[i];
}
delete[] p;
Assignment
For no particular reason, arrays cannot be assigned to one another. Use std::copy instead:
#include <algorithm>
// ...
int a[8] = {2, 3, 5, 7, 11, 13, 17, 19};
int b[8];
std::copy(a + 0, a + 8, b);
This is more flexible than what true array assignment could provide because it is possible to copy slices of larger arrays into smaller arrays.
std::copy is usually specialized for primitive types to give maximum performance. It is unlikely that std::memcpy performs better. If in doubt, measure.
Although you cannot assign arrays directly, you can assign structs and classes which contain array members. That is because array members are copied memberwise by the assignment operator which is provided as a default by the compiler. If you define the assignment operator manually for your own struct or class types, you must fall back to manual copying for the array members.
Parameter passing
Arrays cannot be passed by value. You can either pass them by pointer or by reference.
Pass by pointer
Since arrays themselves cannot be passed by value, usually a pointer to their first element is passed by value instead. This is often called "pass by pointer". Since the size of the array is not retrievable via that pointer, you have to pass a second parameter indicating the size of the array (the classic C solution) or a second pointer pointing after the last element of the array (the C++ iterator solution):
#include <numeric>
#include <cstddef>
int sum(const int* p, std::size_t n)
{
return std::accumulate(p, p + n, 0);
}
int sum(const int* p, const int* q)
{
return std::accumulate(p, q, 0);
}
As a syntactic alternative, you can also declare parameters as T p[], and it means the exact same thing as T* p in the context of parameter lists only:
int sum(const int p[], std::size_t n)
{
return std::accumulate(p, p + n, 0);
}
You can think of the compiler as rewriting T p[] to T *p in the context of parameter lists only. This special rule is partly responsible for the whole confusion about arrays and pointers. In every other context, declaring something as an array or as a pointer makes a huge difference.
Unfortunately, you can also provide a size in an array parameter which is silently ignored by the compiler. That is, the following three signatures are exactly equivalent, as indicated by the compiler errors:
int sum(const int* p, std::size_t n)
// error: redefinition of 'int sum(const int*, size_t)'
int sum(const int p[], std::size_t n)
// error: redefinition of 'int sum(const int*, size_t)'
int sum(const int p[8], std::size_t n) // the 8 has no meaning here
Pass by reference
Arrays can also be passed by reference:
int sum(const int (&a)[8])
{
return std::accumulate(a + 0, a + 8, 0);
}
In this case, the array size is significant. Since writing a function that only accepts arrays of exactly 8 elements is of little use, programmers usually write such functions as templates:
template <std::size_t n>
int sum(const int (&a)[n])
{
return std::accumulate(a + 0, a + n, 0);
}
Note that you can only call such a function template with an actual array of integers, not with a pointer to an integer. The size of the array is automatically inferred, and for every size n, a different function is instantiated from the template. You can also write quite useful function templates that abstract from both the element type and from the size.
5. Common pitfalls when using arrays.
5.1 Pitfall: Trusting type-unsafe linking.
OK, you’ve been told, or have found out yourself, that globals (namespace
scope variables that can be accessed outside the translation unit) are
Evil™. But did you know how truly Evil™ they are? Consider the
program below, consisting of two files [main.cpp] and [numbers.cpp]:
// [main.cpp]
#include <iostream>
extern int* numbers;
int main()
{
using namespace std;
for( int i = 0; i < 42; ++i )
{
cout << (i > 0? ", " : "") << numbers[i];
}
cout << endl;
}
// [numbers.cpp]
int numbers[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
In Windows 7 this compiles and links fine with both MinGW g++ 4.4.1 and
Visual C++ 10.0.
Since the types don't match, the program crashes when you run it.
In-the-formal explanation: the program has Undefined Behavior (UB), and instead
of crashing it can therefore just hang, or perhaps do nothing, or it
can send threating e-mails to the presidents of the USA, Russia, India,
China and Switzerland, and make Nasal Daemons fly out of your nose.
In-practice explanation: in main.cpp the array is treated as a pointer, placed
at the same address as the array. For 32-bit executable this means that the first
int value in the array, is treated as a pointer. I.e., in main.cpp the
numbers variable contains, or appears to contain, (int*)1. This causes the
program to access memory down at very bottom of the address space, which is
conventionally reserved and trap-causing. Result: you get a crash.
The compilers are fully within their rights to not diagnose this error,
because C++11 §3.5/10 says, about the requirement of compatible types
for the declarations,
[N3290 §3.5/10]
A violation of this rule on type identity does not require a diagnostic.
The same paragraph details the variation that is allowed:
… declarations for an array object can specify array types that
differ by the presence or absence of a major array bound (8.3.4).
This allowed variation does not include declaring a name as an array in one
translation unit, and as a pointer in another translation unit.
5.2 Pitfall: Doing premature optimization (memset & friends).
Not written yet
5.3 Pitfall: Using the C idiom to get number of elements.
With deep C experience it’s natural to write …
#define N_ITEMS( array ) (sizeof( array )/sizeof( array[0] ))
Since an array decays to pointer to first element where needed, the
expression sizeof(a)/sizeof(a[0]) can also be written as
sizeof(a)/sizeof(*a). It means the same, and no matter how it’s
written it is the C idiom for finding the number elements of array.
Main pitfall: the C idiom is not typesafe. For example, the code
…
#include <stdio.h>
#define N_ITEMS( array ) (sizeof( array )/sizeof( *array ))
void display( int const a[7] )
{
int const n = N_ITEMS( a ); // Oops.
printf( "%d elements.\n", n );
}
int main()
{
int const moohaha[] = {1, 2, 3, 4, 5, 6, 7};
printf( "%d elements, calling display...\n", N_ITEMS( moohaha ) );
display( moohaha );
}
passes a pointer to N_ITEMS, and therefore most likely produces a wrong
result. Compiled as a 32-bit executable in Windows 7 it produces …
7 elements, calling display...
1 elements.
The compiler rewrites int const a[7] to just int const a[].
The compiler rewrites int const a[] to int const* a.
N_ITEMS is therefore invoked with a pointer.
For a 32-bit executable sizeof(array) (size of a pointer) is then 4.
sizeof(*array) is equivalent to sizeof(int), which for a 32-bit executable is also 4.
In order to detect this error at run time you can do …
#include <assert.h>
#include <typeinfo>
#define N_ITEMS( array ) ( \
assert(( \
"N_ITEMS requires an actual array as argument", \
typeid( array ) != typeid( &*array ) \
)), \
sizeof( array )/sizeof( *array ) \
)
7 elements, calling display...
Assertion failed: ( "N_ITEMS requires an actual array as argument", typeid( a ) != typeid( &*a ) ), file runtime_detect
ion.cpp, line 16
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
The runtime error detection is better than no detection, but it wastes a little
processor time, and perhaps much more programmer time. Better with detection at
compile time! And if you're happy to not support arrays of local types with C++98,
then you can do that:
#include <stddef.h>
typedef ptrdiff_t Size;
template< class Type, Size n >
Size n_items( Type (&)[n] ) { return n; }
#define N_ITEMS( array ) n_items( array )
Compiling this definition substituted into the first complete program, with g++,
I got …
M:\count> g++ compile_time_detection.cpp
compile_time_detection.cpp: In function 'void display(const int*)':
compile_time_detection.cpp:14: error: no matching function for call to 'n_items(const int*&)'
M:\count> _
How it works: the array is passed by reference to n_items, and so it does
not decay to pointer to first element, and the function can just return the
number of elements specified by the type.
With C++11 you can use this also for arrays of local type, and it's the type safe
C++ idiom for finding the number of elements of an array.
5.4 C++11 & C++14 pitfall: Using a constexpr array size function.
With C++11 and later it's natural, but as you'll see dangerous!, to
replace the C++03 function
typedef ptrdiff_t Size;
template< class Type, Size n >
Size n_items( Type (&)[n] ) { return n; }
with
using Size = ptrdiff_t;
template< class Type, Size n >
constexpr auto n_items( Type (&)[n] ) -> Size { return n; }
where the significant change is the use of constexpr, which allows
this function to produce a compile time constant.
For example, in contrast to the C++03 function, such a compile time constant
can be used to declare an array of the same size as another:
// Example 1
void foo()
{
int const x[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
constexpr Size n = n_items( x );
int y[n] = {};
// Using y here.
}
But consider this code using the constexpr version:
// Example 2
template< class Collection >
void foo( Collection const& c )
{
constexpr int n = n_items( c ); // Not in C++14!
// Use c here
}
auto main() -> int
{
int x[42];
foo( x );
}
The pitfall: as of July 2015 the above compiles with MinGW-64 5.1.0 with
-pedantic-errors, and,
testing with the online compilers at gcc.godbolt.org/, also with clang 3.0
and clang 3.2, but not with clang 3.3, 3.4.1, 3.5.0, 3.5.1, 3.6 (rc1) or
3.7 (experimental). And important for the Windows platform, it does not compile
with Visual C++ 2015. The reason is a C++11/C++14 statement about use of
references in constexpr expressions:
C++11 C++14 $5.19/2 nineth dash
A conditional-expression e is a core constant expression unless the evaluation
of e, following the rules of the abstract machine (1.9), would evaluate one of the
following expressions:
⋮
an id-expression that refers to a variable or data member of reference type
unless the reference has a preceding initialization and either
it is initialized with a constant expression or
it is a non-static data member of an object whose lifetime began within
the evaluation of e;
One can always write the more verbose
// Example 3 -- limited
using Size = ptrdiff_t;
template< class Collection >
void foo( Collection const& c )
{
constexpr Size n = std::extent< decltype( c ) >::value;
// Use c here
}
… but this fails when Collection is not a raw array.
To deal with collections that can be non-arrays one needs the overloadability of an
n_items function, but also, for compile time use one needs a compile time
representation of the array size. And the classic C++03 solution, which works fine
also in C++11 and C++14, is to let the function report its result not as a value
but via its function result type. For example like this:
// Example 4 - OK (not ideal, but portable and safe)
#include <array>
#include <stddef.h>
using Size = ptrdiff_t;
template< Size n >
struct Size_carrier
{
char sizer[n];
};
template< class Type, Size n >
auto static_n_items( Type (&)[n] )
-> Size_carrier<n>;
// No implementation, is used only at compile time.
template< class Type, size_t n > // size_t for g++
auto static_n_items( std::array<Type, n> const& )
-> Size_carrier<n>;
// No implementation, is used only at compile time.
#define STATIC_N_ITEMS( c ) \
static_cast<Size>( sizeof( static_n_items( c ).sizer ) )
template< class Collection >
void foo( Collection const& c )
{
constexpr Size n = STATIC_N_ITEMS( c );
// Use c here
(void) c;
}
auto main() -> int
{
int x[42];
std::array<int, 43> y;
foo( x );
foo( y );
}
About the choice of return type for static_n_items: this code doesn't use std::integral_constant
because with std::integral_constant the result is represented
directly as a constexpr value, reintroducing the original problem. Instead
of a Size_carrier class one can let the function directly return a
reference to an array. However, not everybody is familiar with that syntax.
About the naming: part of this solution to the constexpr-invalid-due-to-reference
problem is to make the choice of compile time constant explicit.
Hopefully the oops-there-was-a-reference-involved-in-your-constexpr issue will be fixed with
C++17, but until then a macro like the STATIC_N_ITEMS above yields portability,
e.g. to the clang and Visual C++ compilers, retaining type safety.
Related: macros do not respect scopes, so to avoid name collisions it can be a
good idea to use a name prefix, e.g. MYLIB_STATIC_N_ITEMS.
Array creation and initialization
As with any other kind of C++ object, arrays can be stored either directly in named variables (then the size must be a compile-time constant; C++ does not support VLAs), or they can be stored anonymously on the heap and accessed indirectly via pointers (only then can the size be computed at runtime).
Automatic arrays
Automatic arrays (arrays living "on the stack") are created each time the flow of control passes through the definition of a non-static local array variable:
void foo()
{
int automatic_array[8];
}
Initialization is performed in ascending order. Note that the initial values depend on the element type T:
If T is a POD (like int in the above example), no initialization takes place.
Otherwise, the default-constructor of T initializes all the elements.
If T provides no accessible default-constructor, the program does not compile.
Alternatively, the initial values can be explicitly specified in the array initializer, a comma-separated list surrounded by curly brackets:
int primes[8] = {2, 3, 5, 7, 11, 13, 17, 19};
Since in this case the number of elements in the array initializer is equal to the size of the array, specifying the size manually is redundant. It can automatically be deduced by the compiler:
int primes[] = {2, 3, 5, 7, 11, 13, 17, 19}; // size 8 is deduced
It is also possible to specify the size and provide a shorter array initializer:
int fibonacci[50] = {0, 1, 1}; // 47 trailing zeros are deduced
In that case, the remaining elements are zero-initialized. Note that C++ allows an empty array initializer (all elements are zero-initialized), whereas C89 does not (at least one value is required). Also note that array initializers can only be used to initialize arrays; they cannot later be used in assignments.
Static arrays
Static arrays (arrays living "in the data segment") are local array variables defined with the static keyword and array variables at namespace scope ("global variables"):
int global_static_array[8];
void foo()
{
static int local_static_array[8];
}
(Note that variables at namespace scope are implicitly static. Adding the static keyword to their definition has a completely different, deprecated meaning.)
Here is how static arrays behave differently from automatic arrays:
Static arrays without an array initializer are zero-initialized prior to any further potential initialization.
Static POD arrays are initialized exactly once, and the initial values are typically baked into the executable, in which case there is no initialization cost at runtime. This is not always the most space-efficient solution, however, and it is not required by the standard.
Static non-POD arrays are initialized the first time the flow of control passes through their definition. In the case of local static arrays, that may never happen if the function is never called.
(None of the above is specific to arrays. These rules apply equally well to other kinds of static objects.)
Array data members
Array data members are created when their owning object is created. Unfortunately, C++03 provides no means to initialize arrays in the member initializer list, so initialization must be faked with assignments:
class Foo
{
int primes[8];
public:
Foo()
{
primes[0] = 2;
primes[1] = 3;
primes[2] = 5;
// ...
}
};
Alternatively, you can define an automatic array in the constructor body and copy the elements over:
class Foo
{
int primes[8];
public:
Foo()
{
int local_array[] = {2, 3, 5, 7, 11, 13, 17, 19};
std::copy(local_array + 0, local_array + 8, primes + 0);
}
};
In C++0x, arrays can be initialized in the member initializer list thanks to uniform initialization:
class Foo
{
int primes[8];
public:
Foo() : primes { 2, 3, 5, 7, 11, 13, 17, 19 }
{
}
};
This is the only solution that works with element types that have no default constructor.
Dynamic arrays
Dynamic arrays have no names, hence the only means of accessing them is via pointers. Because they have no names, I will refer to them as "anonymous arrays" from now on.
In C, anonymous arrays are created via malloc and friends. In C++, anonymous arrays are created using the new T[size] syntax which returns a pointer to the first element of an anonymous array:
std::size_t size = compute_size_at_runtime();
int* p = new int[size];
The following ASCII art depicts the memory layout if the size is computed as 8 at runtime:
+---+---+---+---+---+---+---+---+
(anonymous) | | | | | | | | |
+---+---+---+---+---+---+---+---+
^
|
|
+-|-+
p: | | | int*
+---+
Obviously, anonymous arrays require more memory than named arrays due to the extra pointer that must be stored separately. (There is also some additional overhead on the free store.)
Note that there is no array-to-pointer decay going on here. Although evaluating new int[size] does in fact create an array of integers, the result of the expression new int[size] is already a pointer to a single integer (the first element), not an array of integers or a pointer to an array of integers of unknown size. That would be impossible, because the static type system requires array sizes to be compile-time constants. (Hence, I did not annotate the anonymous array with static type information in the picture.)
Concerning default values for elements, anonymous arrays behave similar to automatic arrays.
Normally, anonymous POD arrays are not initialized, but there is a special syntax that triggers value-initialization:
int* p = new int[some_computed_size]();
(Note the trailing pair of parenthesis right before the semicolon.) Again, C++0x simplifies the rules and allows specifying initial values for anonymous arrays thanks to uniform initialization:
int* p = new int[8] { 2, 3, 5, 7, 11, 13, 17, 19 };
If you are done using an anonymous array, you have to release it back to the system:
delete[] p;
You must release each anonymous array exactly once and then never touch it again afterwards. Not releasing it at all results in a memory leak (or more generally, depending on the element type, a resource leak), and trying to release it multiple times results in undefined behavior. Using the non-array form delete (or free) instead of delete[] to release the array is also undefined behavior.
C++ inherited arrays from C where they are used virtually everywhere. C++ provides abstractions that are easier to use and less error-prone (std::vector<T> since C++98 and std::array<T, n> since C++11), so the need for arrays does not arise quite as often as it does in C. However, when you read legacy code or interact with a library written in C, you should have a firm grasp on how arrays work.
This FAQ is split into five parts:
arrays on the type level and accessing elements
array creation and initialization
assignment and parameter passing
multidimensional arrays and arrays of pointers
common pitfalls when using arrays
If you feel something important is missing in this FAQ, write an answer and link it here as an additional part.
In the following text, "array" means "C array", not the class template std::array. Basic knowledge of the C declarator syntax is assumed. Note that the manual usage of new and delete as demonstrated below is extremely dangerous in the face of exceptions, but that is the topic of another FAQ.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
Arrays on the type level
An array type is denoted as T[n] where T is the element type and n is a positive size, the number of elements in the array. The array type is a product type of the element type and the size. If one or both of those ingredients differ, you get a distinct type:
#include <type_traits>
static_assert(!std::is_same<int[8], float[8]>::value, "distinct element type");
static_assert(!std::is_same<int[8], int[9]>::value, "distinct size");
Note that the size is part of the type, that is, array types of different size are incompatible types that have absolutely nothing to do with each other. sizeof(T[n]) is equivalent to n * sizeof(T).
Array-to-pointer decay
The only "connection" between T[n] and T[m] is that both types can implicitly be converted to T*, and the result of this conversion is a pointer to the first element of the array. That is, anywhere a T* is required, you can provide a T[n], and the compiler will silently provide that pointer:
+---+---+---+---+---+---+---+---+
the_actual_array: | | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^
|
|
|
| pointer_to_the_first_element int*
This conversion is known as "array-to-pointer decay", and it is a major source of confusion. The size of the array is lost in this process, since it is no longer part of the type (T*). Pro: Forgetting the size of an array on the type level allows a pointer to point to the first element of an array of any size. Con: Given a pointer to the first (or any other) element of an array, there is no way to detect how large that array is or where exactly the pointer points to relative to the bounds of the array. Pointers are extremely stupid.
Arrays are not pointers
The compiler will silently generate a pointer to the first element of an array whenever it is deemed useful, that is, whenever an operation would fail on an array but succeed on a pointer. This conversion from array to pointer is trivial, since the resulting pointer value is simply the address of the array. Note that the pointer is not stored as part of the array itself (or anywhere else in memory). An array is not a pointer.
static_assert(!std::is_same<int[8], int*>::value, "an array is not a pointer");
One important context in which an array does not decay into a pointer to its first element is when the & operator is applied to it. In that case, the & operator yields a pointer to the entire array, not just a pointer to its first element. Although in that case the values (the addresses) are the same, a pointer to the first element of an array and a pointer to the entire array are completely distinct types:
static_assert(!std::is_same<int*, int(*)[8]>::value, "distinct element type");
The following ASCII art explains this distinction:
+-----------------------------------+
| +---+---+---+---+---+---+---+---+ |
+---> | | | | | | | | | | | int[8]
| | +---+---+---+---+---+---+---+---+ |
| +---^-------------------------------+
| |
| |
| |
| | pointer_to_the_first_element int*
|
| pointer_to_the_entire_array int(*)[8]
Note how the pointer to the first element only points to a single integer (depicted as a small box), whereas the pointer to the entire array points to an array of 8 integers (depicted as a large box).
The same situation arises in classes and is maybe more obvious. A pointer to an object and a pointer to its first data member have the same value (the same address), yet they are completely distinct types.
If you are unfamiliar with the C declarator syntax, the parenthesis in the type int(*)[8] are essential:
int(*)[8] is a pointer to an array of 8 integers.
int*[8] is an array of 8 pointers, each element of type int*.
Accessing elements
C++ provides two syntactic variations to access individual elements of an array.
Neither of them is superior to the other, and you should familiarize yourself with both.
Pointer arithmetic
Given a pointer p to the first element of an array, the expression p+i yields a pointer to the i-th element of the array. By dereferencing that pointer afterwards, one can access individual elements:
std::cout << *(x+3) << ", " << *(x+7) << std::endl;
If x denotes an array, then array-to-pointer decay will kick in, because adding an array and an integer is meaningless (there is no plus operation on arrays), but adding a pointer and an integer makes sense:
+---+---+---+---+---+---+---+---+
x: | | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^ ^ ^
| | |
| | |
| | |
x+0 | x+3 | x+7 | int*
(Note that the implicitly generated pointer has no name, so I wrote x+0 in order to identify it.)
If, on the other hand, x denotes a pointer to the first (or any other) element of an array, then array-to-pointer decay is not necessary, because the pointer on which i is going to be added already exists:
+---+---+---+---+---+---+---+---+
| | | | | | | | | int[8]
+---+---+---+---+---+---+---+---+
^ ^ ^
| | |
| | |
+-|-+ | |
x: | | | x+3 | x+7 | int*
+---+
Note that in the depicted case, x is a pointer variable (discernible by the small box next to x), but it could just as well be the result of a function returning a pointer (or any other expression of type T*).
Indexing operator
Since the syntax *(x+i) is a bit clumsy, C++ provides the alternative syntax x[i]:
std::cout << x[3] << ", " << x[7] << std::endl;
Due to the fact that addition is commutative, the following code does exactly the same:
std::cout << 3[x] << ", " << 7[x] << std::endl;
The definition of the indexing operator leads to the following interesting equivalence:
&x[i] == &*(x+i) == x+i
However, &x[0] is generally not equivalent to x. The former is a pointer, the latter an array. Only when the context triggers array-to-pointer decay can x and &x[0] be used interchangeably. For example:
T* p = &array[0]; // rewritten as &*(array+0), decay happens due to the addition
T* q = array; // decay happens due to the assignment
On the first line, the compiler detects an assignment from a pointer to a pointer, which trivially succeeds. On the second line, it detects an assignment from an array to a pointer. Since this is meaningless (but pointer to pointer assignment makes sense), array-to-pointer decay kicks in as usual.
Ranges
An array of type T[n] has n elements, indexed from 0 to n-1; there is no element n. And yet, to support half-open ranges (where the beginning is inclusive and the end is exclusive), C++ allows the computation of a pointer to the (non-existent) n-th element, but it is illegal to dereference that pointer:
+---+---+---+---+---+---+---+---+....
x: | | | | | | | | | . int[8]
+---+---+---+---+---+---+---+---+....
^ ^
| |
| |
| |
x+0 | x+8 | int*
For example, if you want to sort an array, both of the following would work equally well:
std::sort(x + 0, x + n);
std::sort(&x[0], &x[0] + n);
Note that it is illegal to provide &x[n] as the second argument since this is equivalent to &*(x+n), and the sub-expression *(x+n) technically invokes undefined behavior in C++ (but not in C99).
Also note that you could simply provide x as the first argument. That is a little too terse for my taste, and it also makes template argument deduction a bit harder for the compiler, because in that case the first argument is an array but the second argument is a pointer. (Again, array-to-pointer decay kicks in.)
Programmers often confuse multidimensional arrays with arrays of pointers.
Multidimensional arrays
Most programmers are familiar with named multidimensional arrays, but many are unaware of the fact that multidimensional array can also be created anonymously. Multidimensional arrays are often referred to as "arrays of arrays" or "true multidimensional arrays".
Named multidimensional arrays
When using named multidimensional arrays, all dimensions must be known at compile time:
int H = read_int();
int W = read_int();
int connect_four[6][7]; // okay
int connect_four[H][7]; // ISO C++ forbids variable length array
int connect_four[6][W]; // ISO C++ forbids variable length array
int connect_four[H][W]; // ISO C++ forbids variable length array
This is how a named multidimensional array looks like in memory:
+---+---+---+---+---+---+---+
connect_four: | | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
| | | | | | | |
+---+---+---+---+---+---+---+
Note that 2D grids such as the above are merely helpful visualizations. From the point of view of C++, memory is a "flat" sequence of bytes. The elements of a multidimensional array are stored in row-major order. That is, connect_four[0][6] and connect_four[1][0] are neighbors in memory. In fact, connect_four[0][7] and connect_four[1][0] denote the same element! This means that you can take multi-dimensional arrays and treat them as large, one-dimensional arrays:
int* p = &connect_four[0][0];
int* q = p + 42;
some_int_sequence_algorithm(p, q);
Anonymous multidimensional arrays
With anonymous multidimensional arrays, all dimensions except the first must be known at compile time:
int (*p)[7] = new int[6][7]; // okay
int (*p)[7] = new int[H][7]; // okay
int (*p)[W] = new int[6][W]; // ISO C++ forbids variable length array
int (*p)[W] = new int[H][W]; // ISO C++ forbids variable length array
This is how an anonymous multidimensional array looks like in memory:
+---+---+---+---+---+---+---+
+---> | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
| | | | | | | | |
| +---+---+---+---+---+---+---+
|
+-|-+
p: | | |
+---+
Note that the array itself is still allocated as a single block in memory.
Arrays of pointers
You can overcome the restriction of fixed width by introducing another level of indirection.
Named arrays of pointers
Here is a named array of five pointers which are initialized with anonymous arrays of different lengths:
int* triangle[5];
for (int i = 0; i < 5; ++i)
{
triangle[i] = new int[5 - i];
}
// ...
for (int i = 0; i < 5; ++i)
{
delete[] triangle[i];
}
And here is how it looks like in memory:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
^
| +---+---+---+---+
| | | | | |
| +---+---+---+---+
| ^
| | +---+---+---+
| | | | | |
| | +---+---+---+
| | ^
| | | +---+---+
| | | | | |
| | | +---+---+
| | | ^
| | | | +---+
| | | | | |
| | | | +---+
| | | | ^
| | | | |
| | | | |
+-|-+-|-+-|-+-|-+-|-+
triangle: | | | | | | | | | | |
+---+---+---+---+---+
Since each line is allocated individually now, viewing 2D arrays as 1D arrays does not work anymore.
Anonymous arrays of pointers
Here is an anonymous array of 5 (or any other number of) pointers which are initialized with anonymous arrays of different lengths:
int n = calculate_five(); // or any other number
int** p = new int*[n];
for (int i = 0; i < n; ++i)
{
p[i] = new int[n - i];
}
// ...
for (int i = 0; i < n; ++i)
{
delete[] p[i];
}
delete[] p; // note the extra delete[] !
And here is how it looks like in memory:
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
^
| +---+---+---+---+
| | | | | |
| +---+---+---+---+
| ^
| | +---+---+---+
| | | | | |
| | +---+---+---+
| | ^
| | | +---+---+
| | | | | |
| | | +---+---+
| | | ^
| | | | +---+
| | | | | |
| | | | +---+
| | | | ^
| | | | |
| | | | |
+-|-+-|-+-|-+-|-+-|-+
| | | | | | | | | | |
+---+---+---+---+---+
^
|
|
+-|-+
p: | | |
+---+
Conversions
Array-to-pointer decay naturally extends to arrays of arrays and arrays of pointers:
int array_of_arrays[6][7];
int (*pointer_to_array)[7] = array_of_arrays;
int* array_of_pointers[6];
int** pointer_to_pointer = array_of_pointers;
However, there is no implicit conversion from T[h][w] to T**. If such an implicit conversion did exist, the result would be a pointer to the first element of an array of h pointers to T (each pointing to the first element of a line in the original 2D array), but that pointer array does not exist anywhere in memory yet. If you want such a conversion, you must create and fill the required pointer array manually:
int connect_four[6][7];
int** p = new int*[6];
for (int i = 0; i < 6; ++i)
{
p[i] = connect_four[i];
}
// ...
delete[] p;
Note that this generates a view of the original multidimensional array. If you need a copy instead, you must create extra arrays and copy the data yourself:
int connect_four[6][7];
int** p = new int*[6];
for (int i = 0; i < 6; ++i)
{
p[i] = new int[7];
std::copy(connect_four[i], connect_four[i + 1], p[i]);
}
// ...
for (int i = 0; i < 6; ++i)
{
delete[] p[i];
}
delete[] p;
Assignment
For no particular reason, arrays cannot be assigned to one another. Use std::copy instead:
#include <algorithm>
// ...
int a[8] = {2, 3, 5, 7, 11, 13, 17, 19};
int b[8];
std::copy(a + 0, a + 8, b);
This is more flexible than what true array assignment could provide because it is possible to copy slices of larger arrays into smaller arrays.
std::copy is usually specialized for primitive types to give maximum performance. It is unlikely that std::memcpy performs better. If in doubt, measure.
Although you cannot assign arrays directly, you can assign structs and classes which contain array members. That is because array members are copied memberwise by the assignment operator which is provided as a default by the compiler. If you define the assignment operator manually for your own struct or class types, you must fall back to manual copying for the array members.
Parameter passing
Arrays cannot be passed by value. You can either pass them by pointer or by reference.
Pass by pointer
Since arrays themselves cannot be passed by value, usually a pointer to their first element is passed by value instead. This is often called "pass by pointer". Since the size of the array is not retrievable via that pointer, you have to pass a second parameter indicating the size of the array (the classic C solution) or a second pointer pointing after the last element of the array (the C++ iterator solution):
#include <numeric>
#include <cstddef>
int sum(const int* p, std::size_t n)
{
return std::accumulate(p, p + n, 0);
}
int sum(const int* p, const int* q)
{
return std::accumulate(p, q, 0);
}
As a syntactic alternative, you can also declare parameters as T p[], and it means the exact same thing as T* p in the context of parameter lists only:
int sum(const int p[], std::size_t n)
{
return std::accumulate(p, p + n, 0);
}
You can think of the compiler as rewriting T p[] to T *p in the context of parameter lists only. This special rule is partly responsible for the whole confusion about arrays and pointers. In every other context, declaring something as an array or as a pointer makes a huge difference.
Unfortunately, you can also provide a size in an array parameter which is silently ignored by the compiler. That is, the following three signatures are exactly equivalent, as indicated by the compiler errors:
int sum(const int* p, std::size_t n)
// error: redefinition of 'int sum(const int*, size_t)'
int sum(const int p[], std::size_t n)
// error: redefinition of 'int sum(const int*, size_t)'
int sum(const int p[8], std::size_t n) // the 8 has no meaning here
Pass by reference
Arrays can also be passed by reference:
int sum(const int (&a)[8])
{
return std::accumulate(a + 0, a + 8, 0);
}
In this case, the array size is significant. Since writing a function that only accepts arrays of exactly 8 elements is of little use, programmers usually write such functions as templates:
template <std::size_t n>
int sum(const int (&a)[n])
{
return std::accumulate(a + 0, a + n, 0);
}
Note that you can only call such a function template with an actual array of integers, not with a pointer to an integer. The size of the array is automatically inferred, and for every size n, a different function is instantiated from the template. You can also write quite useful function templates that abstract from both the element type and from the size.
5. Common pitfalls when using arrays.
5.1 Pitfall: Trusting type-unsafe linking.
OK, you’ve been told, or have found out yourself, that globals (namespace
scope variables that can be accessed outside the translation unit) are
Evil™. But did you know how truly Evil™ they are? Consider the
program below, consisting of two files [main.cpp] and [numbers.cpp]:
// [main.cpp]
#include <iostream>
extern int* numbers;
int main()
{
using namespace std;
for( int i = 0; i < 42; ++i )
{
cout << (i > 0? ", " : "") << numbers[i];
}
cout << endl;
}
// [numbers.cpp]
int numbers[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
In Windows 7 this compiles and links fine with both MinGW g++ 4.4.1 and
Visual C++ 10.0.
Since the types don't match, the program crashes when you run it.
In-the-formal explanation: the program has Undefined Behavior (UB), and instead
of crashing it can therefore just hang, or perhaps do nothing, or it
can send threating e-mails to the presidents of the USA, Russia, India,
China and Switzerland, and make Nasal Daemons fly out of your nose.
In-practice explanation: in main.cpp the array is treated as a pointer, placed
at the same address as the array. For 32-bit executable this means that the first
int value in the array, is treated as a pointer. I.e., in main.cpp the
numbers variable contains, or appears to contain, (int*)1. This causes the
program to access memory down at very bottom of the address space, which is
conventionally reserved and trap-causing. Result: you get a crash.
The compilers are fully within their rights to not diagnose this error,
because C++11 §3.5/10 says, about the requirement of compatible types
for the declarations,
[N3290 §3.5/10]
A violation of this rule on type identity does not require a diagnostic.
The same paragraph details the variation that is allowed:
… declarations for an array object can specify array types that
differ by the presence or absence of a major array bound (8.3.4).
This allowed variation does not include declaring a name as an array in one
translation unit, and as a pointer in another translation unit.
5.2 Pitfall: Doing premature optimization (memset & friends).
Not written yet
5.3 Pitfall: Using the C idiom to get number of elements.
With deep C experience it’s natural to write …
#define N_ITEMS( array ) (sizeof( array )/sizeof( array[0] ))
Since an array decays to pointer to first element where needed, the
expression sizeof(a)/sizeof(a[0]) can also be written as
sizeof(a)/sizeof(*a). It means the same, and no matter how it’s
written it is the C idiom for finding the number elements of array.
Main pitfall: the C idiom is not typesafe. For example, the code
…
#include <stdio.h>
#define N_ITEMS( array ) (sizeof( array )/sizeof( *array ))
void display( int const a[7] )
{
int const n = N_ITEMS( a ); // Oops.
printf( "%d elements.\n", n );
}
int main()
{
int const moohaha[] = {1, 2, 3, 4, 5, 6, 7};
printf( "%d elements, calling display...\n", N_ITEMS( moohaha ) );
display( moohaha );
}
passes a pointer to N_ITEMS, and therefore most likely produces a wrong
result. Compiled as a 32-bit executable in Windows 7 it produces …
7 elements, calling display...
1 elements.
The compiler rewrites int const a[7] to just int const a[].
The compiler rewrites int const a[] to int const* a.
N_ITEMS is therefore invoked with a pointer.
For a 32-bit executable sizeof(array) (size of a pointer) is then 4.
sizeof(*array) is equivalent to sizeof(int), which for a 32-bit executable is also 4.
In order to detect this error at run time you can do …
#include <assert.h>
#include <typeinfo>
#define N_ITEMS( array ) ( \
assert(( \
"N_ITEMS requires an actual array as argument", \
typeid( array ) != typeid( &*array ) \
)), \
sizeof( array )/sizeof( *array ) \
)
7 elements, calling display...
Assertion failed: ( "N_ITEMS requires an actual array as argument", typeid( a ) != typeid( &*a ) ), file runtime_detect
ion.cpp, line 16
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
The runtime error detection is better than no detection, but it wastes a little
processor time, and perhaps much more programmer time. Better with detection at
compile time! And if you're happy to not support arrays of local types with C++98,
then you can do that:
#include <stddef.h>
typedef ptrdiff_t Size;
template< class Type, Size n >
Size n_items( Type (&)[n] ) { return n; }
#define N_ITEMS( array ) n_items( array )
Compiling this definition substituted into the first complete program, with g++,
I got …
M:\count> g++ compile_time_detection.cpp
compile_time_detection.cpp: In function 'void display(const int*)':
compile_time_detection.cpp:14: error: no matching function for call to 'n_items(const int*&)'
M:\count> _
How it works: the array is passed by reference to n_items, and so it does
not decay to pointer to first element, and the function can just return the
number of elements specified by the type.
With C++11 you can use this also for arrays of local type, and it's the type safe
C++ idiom for finding the number of elements of an array.
5.4 C++11 & C++14 pitfall: Using a constexpr array size function.
With C++11 and later it's natural, but as you'll see dangerous!, to
replace the C++03 function
typedef ptrdiff_t Size;
template< class Type, Size n >
Size n_items( Type (&)[n] ) { return n; }
with
using Size = ptrdiff_t;
template< class Type, Size n >
constexpr auto n_items( Type (&)[n] ) -> Size { return n; }
where the significant change is the use of constexpr, which allows
this function to produce a compile time constant.
For example, in contrast to the C++03 function, such a compile time constant
can be used to declare an array of the same size as another:
// Example 1
void foo()
{
int const x[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 4};
constexpr Size n = n_items( x );
int y[n] = {};
// Using y here.
}
But consider this code using the constexpr version:
// Example 2
template< class Collection >
void foo( Collection const& c )
{
constexpr int n = n_items( c ); // Not in C++14!
// Use c here
}
auto main() -> int
{
int x[42];
foo( x );
}
The pitfall: as of July 2015 the above compiles with MinGW-64 5.1.0 with
-pedantic-errors, and,
testing with the online compilers at gcc.godbolt.org/, also with clang 3.0
and clang 3.2, but not with clang 3.3, 3.4.1, 3.5.0, 3.5.1, 3.6 (rc1) or
3.7 (experimental). And important for the Windows platform, it does not compile
with Visual C++ 2015. The reason is a C++11/C++14 statement about use of
references in constexpr expressions:
C++11 C++14 $5.19/2 nineth dash
A conditional-expression e is a core constant expression unless the evaluation
of e, following the rules of the abstract machine (1.9), would evaluate one of the
following expressions:
⋮
an id-expression that refers to a variable or data member of reference type
unless the reference has a preceding initialization and either
it is initialized with a constant expression or
it is a non-static data member of an object whose lifetime began within
the evaluation of e;
One can always write the more verbose
// Example 3 -- limited
using Size = ptrdiff_t;
template< class Collection >
void foo( Collection const& c )
{
constexpr Size n = std::extent< decltype( c ) >::value;
// Use c here
}
… but this fails when Collection is not a raw array.
To deal with collections that can be non-arrays one needs the overloadability of an
n_items function, but also, for compile time use one needs a compile time
representation of the array size. And the classic C++03 solution, which works fine
also in C++11 and C++14, is to let the function report its result not as a value
but via its function result type. For example like this:
// Example 4 - OK (not ideal, but portable and safe)
#include <array>
#include <stddef.h>
using Size = ptrdiff_t;
template< Size n >
struct Size_carrier
{
char sizer[n];
};
template< class Type, Size n >
auto static_n_items( Type (&)[n] )
-> Size_carrier<n>;
// No implementation, is used only at compile time.
template< class Type, size_t n > // size_t for g++
auto static_n_items( std::array<Type, n> const& )
-> Size_carrier<n>;
// No implementation, is used only at compile time.
#define STATIC_N_ITEMS( c ) \
static_cast<Size>( sizeof( static_n_items( c ).sizer ) )
template< class Collection >
void foo( Collection const& c )
{
constexpr Size n = STATIC_N_ITEMS( c );
// Use c here
(void) c;
}
auto main() -> int
{
int x[42];
std::array<int, 43> y;
foo( x );
foo( y );
}
About the choice of return type for static_n_items: this code doesn't use std::integral_constant
because with std::integral_constant the result is represented
directly as a constexpr value, reintroducing the original problem. Instead
of a Size_carrier class one can let the function directly return a
reference to an array. However, not everybody is familiar with that syntax.
About the naming: part of this solution to the constexpr-invalid-due-to-reference
problem is to make the choice of compile time constant explicit.
Hopefully the oops-there-was-a-reference-involved-in-your-constexpr issue will be fixed with
C++17, but until then a macro like the STATIC_N_ITEMS above yields portability,
e.g. to the clang and Visual C++ compilers, retaining type safety.
Related: macros do not respect scopes, so to avoid name collisions it can be a
good idea to use a name prefix, e.g. MYLIB_STATIC_N_ITEMS.
Array creation and initialization
As with any other kind of C++ object, arrays can be stored either directly in named variables (then the size must be a compile-time constant; C++ does not support VLAs), or they can be stored anonymously on the heap and accessed indirectly via pointers (only then can the size be computed at runtime).
Automatic arrays
Automatic arrays (arrays living "on the stack") are created each time the flow of control passes through the definition of a non-static local array variable:
void foo()
{
int automatic_array[8];
}
Initialization is performed in ascending order. Note that the initial values depend on the element type T:
If T is a POD (like int in the above example), no initialization takes place.
Otherwise, the default-constructor of T initializes all the elements.
If T provides no accessible default-constructor, the program does not compile.
Alternatively, the initial values can be explicitly specified in the array initializer, a comma-separated list surrounded by curly brackets:
int primes[8] = {2, 3, 5, 7, 11, 13, 17, 19};
Since in this case the number of elements in the array initializer is equal to the size of the array, specifying the size manually is redundant. It can automatically be deduced by the compiler:
int primes[] = {2, 3, 5, 7, 11, 13, 17, 19}; // size 8 is deduced
It is also possible to specify the size and provide a shorter array initializer:
int fibonacci[50] = {0, 1, 1}; // 47 trailing zeros are deduced
In that case, the remaining elements are zero-initialized. Note that C++ allows an empty array initializer (all elements are zero-initialized), whereas C89 does not (at least one value is required). Also note that array initializers can only be used to initialize arrays; they cannot later be used in assignments.
Static arrays
Static arrays (arrays living "in the data segment") are local array variables defined with the static keyword and array variables at namespace scope ("global variables"):
int global_static_array[8];
void foo()
{
static int local_static_array[8];
}
(Note that variables at namespace scope are implicitly static. Adding the static keyword to their definition has a completely different, deprecated meaning.)
Here is how static arrays behave differently from automatic arrays:
Static arrays without an array initializer are zero-initialized prior to any further potential initialization.
Static POD arrays are initialized exactly once, and the initial values are typically baked into the executable, in which case there is no initialization cost at runtime. This is not always the most space-efficient solution, however, and it is not required by the standard.
Static non-POD arrays are initialized the first time the flow of control passes through their definition. In the case of local static arrays, that may never happen if the function is never called.
(None of the above is specific to arrays. These rules apply equally well to other kinds of static objects.)
Array data members
Array data members are created when their owning object is created. Unfortunately, C++03 provides no means to initialize arrays in the member initializer list, so initialization must be faked with assignments:
class Foo
{
int primes[8];
public:
Foo()
{
primes[0] = 2;
primes[1] = 3;
primes[2] = 5;
// ...
}
};
Alternatively, you can define an automatic array in the constructor body and copy the elements over:
class Foo
{
int primes[8];
public:
Foo()
{
int local_array[] = {2, 3, 5, 7, 11, 13, 17, 19};
std::copy(local_array + 0, local_array + 8, primes + 0);
}
};
In C++0x, arrays can be initialized in the member initializer list thanks to uniform initialization:
class Foo
{
int primes[8];
public:
Foo() : primes { 2, 3, 5, 7, 11, 13, 17, 19 }
{
}
};
This is the only solution that works with element types that have no default constructor.
Dynamic arrays
Dynamic arrays have no names, hence the only means of accessing them is via pointers. Because they have no names, I will refer to them as "anonymous arrays" from now on.
In C, anonymous arrays are created via malloc and friends. In C++, anonymous arrays are created using the new T[size] syntax which returns a pointer to the first element of an anonymous array:
std::size_t size = compute_size_at_runtime();
int* p = new int[size];
The following ASCII art depicts the memory layout if the size is computed as 8 at runtime:
+---+---+---+---+---+---+---+---+
(anonymous) | | | | | | | | |
+---+---+---+---+---+---+---+---+
^
|
|
+-|-+
p: | | | int*
+---+
Obviously, anonymous arrays require more memory than named arrays due to the extra pointer that must be stored separately. (There is also some additional overhead on the free store.)
Note that there is no array-to-pointer decay going on here. Although evaluating new int[size] does in fact create an array of integers, the result of the expression new int[size] is already a pointer to a single integer (the first element), not an array of integers or a pointer to an array of integers of unknown size. That would be impossible, because the static type system requires array sizes to be compile-time constants. (Hence, I did not annotate the anonymous array with static type information in the picture.)
Concerning default values for elements, anonymous arrays behave similar to automatic arrays.
Normally, anonymous POD arrays are not initialized, but there is a special syntax that triggers value-initialization:
int* p = new int[some_computed_size]();
(Note the trailing pair of parenthesis right before the semicolon.) Again, C++0x simplifies the rules and allows specifying initial values for anonymous arrays thanks to uniform initialization:
int* p = new int[8] { 2, 3, 5, 7, 11, 13, 17, 19 };
If you are done using an anonymous array, you have to release it back to the system:
delete[] p;
You must release each anonymous array exactly once and then never touch it again afterwards. Not releasing it at all results in a memory leak (or more generally, depending on the element type, a resource leak), and trying to release it multiple times results in undefined behavior. Using the non-array form delete (or free) instead of delete[] to release the array is also undefined behavior.
Please explain to me how the b pointer shows the last element.
Every time, it prints out the last element, no matter how long the array is. If you use *b alone in cout, it shows a number out of array.
#include <iostream>
#include <stdio.h>
using namespace std;
int main()
{
int a[] = {1,2,3,4,5,6,7,8,9,10,11};
int *b =(int*) (&a+1);
cout << *(b-1) << endl;
return 0;
}
This expression
&a+1
has the type int ( * )[11] and points to the memory after the last element of the array a.
In this declaration
int *b =(int*) (&a+1);
you interpreted the expression as having the pointer type int * that points to after the last element of the array a. Instead you could write
int *b = a + 11;
So the expression
b - 1
points to the last element of the array a.
Thus you may imagine the expression *( b - 1 ) the following way
*( a + 11 - 1 ) => *( a + 10 ) => a[10]
Per pointer arithmetic rules, incrementing/decrementing a pointer by N elements will adjust the value of the pointer by N * sizeof(T) bytes, where T is the dereferenced type of the pointer.
&a is a pointer to the array itself, which has a type of int[11], so you have a pointer of type int(*)[11] to the beginning of the array. Lets call this A1 in the diagram below.
Adding +1 to that pointer will advance it by sizeof(int[11]) (aka sizeof(int)*11) bytes, thus producing a new int(*)[11] pointer to the memory address immediately following the entire array. Let's call this A2 in the diagram.
You are then type-casting that new pointer, so now you have a pointer of type int* to the end of the array. This is the memory address you are assigning to your int *b pointer variable. Lets call this B1 in the diagram below.
Subtracting -1 from that pointer will reduce it by sizeof(int) bytes, thus producing a new int* pointer to the memory address of the last int element in the array. Lets call this B2 in the diagram below.
So, when you dereference b to print the int that it is pointing at, you are printing the value of the last int in the array. If you don't decrement b, it is pointing past the end of the array, and you have undefined behavior. You might just print out random garbage, or you might crash your app. Anything could happen.
---------------------------------------------------------------------
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---------------------------------------------------------------------
^ ^ ^
| | |
|_A1 --------------------------------------------------------|----->|_A2
+1 | |
|_B2 <-|_B1
-1
I'm not able to understand this statement of code that I came across during my interview.
int(*(*ptr[3])(char*))[2];
I've tried looking at an IDE but all I have is that it is an array of data type
int (*(*[3])(char *))
I wasn't able to understand this.
May be you could just break it down one at a time to understand the syntax better. First start up with a simple definition without the array notation
int(*(*ptr)(char*));
So ptr is a function pointer that takes a char pointer as an argument and returns a pointer to an int. Now extending it to the array notation
int(*(*ptr[3])(char*))[2];
which means you have an array of function pointers, each of which will take a char pointer argument and return a pointer to an array of two integers.
You can see this working if you have a make a function call using these pointers you define. Note that, the below functions are for demonstrative purposes only and do not convey any logical purpose
#include <iostream>
static int arr[2] = { 2, 2 };
// initialize 'bar' as a function that accepts char* and returns
// int(*)[2]
int (*bar(char * str))[2] {
return &arr;
}
int main() {
// pointer definition, not initialized yet
int(*(*foo[3])(char*))[2];
char ch = 'f';
// as long as the signatures for the function pointer and
// bar matches, the assignment below shouldn't be a problem
foo[0] = bar;
// invoking the function by de-referencing the pointer at foo[0]
// Use 'auto' for C++11 or declare ptr as int (*ptr)[2]
auto *ptr = (*foo[0])(&ch);
return 0;
}
You have to unfold the type from the inside out and recall that [] and () (on the right) bind stronger than * (on the left). To override this binding, parentheses () are used.
int(*(*ptr[3])(char*))[2];
^ ^ ^^ ^ ^ ^
| | || | | |
| | |ptr is | |
| | | | | |
| | | an array of three
| | | | |
| | pointers to |
| | | |
| | a function taking a char* and returning
| | |
| a pointer to |
| an array of two
ints
i.e. "ptr is an array of three pointers to a function taking a char* and returning a pointer to an array of two ints".
int (*ptr)(char (*ch)[]);
What does the above declaration means? Does it mean
ptr is pointer to a function that accepts an argument which is array of pointers to characters returning integer?
How to evaluate?
ptr is pointer to a function that accepts an argument which is a pointer to an array of characters, returning integer.
There's rule: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html
Briefly, you should start from identifier, then parse everything from identifier to the right (it can be () - function or [] array), then parse everything from identifier to the left. Parentheses changes this order - you should parse everything in the most inner parentheses first and so on, it works like with arithmetic calculations.
In other words, there is an order of precedence (which can be changed by parentheses), from higher to lower:
1) () - function and [] - array, from left to right;
2) * - pointer, type, type modifier, from right to left.
Your example
int (*ptr)(char (*ch)[])
We start from identifier
int (*ptr)(char (*ch)[]); // (1)ptr
|_|
1
Identifier ptr is in parentheses, so we parse everything in parenteses first
(*ptr) // (1)ptr
|_|
1
There's nothing to the right, so we parse to the left
(*ptr) // (1)ptr is (2)a pointer
||_|
2 1
We finished in parentheses, now we parse to the right of parentheses
int (*ptr)(char (*ch)[]); // (1)ptr is (2)a pointer to (3)function
||_| |____________|
2 1 3
So far we ignore function arguments and parse to the left of parentheses
int (*ptr)(char (*ch)[]); // (1)ptr is (2)a pointer to (3)function which returns (4)int
|_| ||_| |____________|
4 2 1 3
In the same way we parse argument of function (I've inserted some spaces for better alignment)
char (* ch )[ ] // (1)ch is (2)a pointer to (3)array of (4)chars
|___| | |_| |_|
4 2 1 3
Finally, we have:
ptr is a pointer to function which returns int and accepts a pointer to array of chars as argument
As you have written, ptr is a pointer to a function that returns int, and takes as an argument a pointer to an array of char.
However, you are not allowed to have a pointer to an array without a bound on the array. So, your variable is incorrectly specified, and will not compile. It seems you want ptr to have a type that can accept a pointer to a function that can take any size array. This requires a template construction. For a function argument, it would have the form:
template <unsigned N>
int foo (int (*ptr)(char (*)[N])) {
//...
}
Normally, the way to simplify such types is to use typedef to represent the complicated parts, so that the variable itself becomes a simple pointer to some type. This is particularly useful when trying to write a function that returns a function pointer.
void x (char *s) {}
typedef void xtype (char *);
void (* y_hard ())(char *) { return x; }
xtype * y_easy () { return x; }
However, the parameterized nature of the function argument makes that harder to achieve. Assuming C++ 11, you can use the following construct (thanks to this answer):
template <unsigned N>
using ArrayArg = const char [N];
template <unsigned N>
using Function = int (ArrayArg<N> *);
template <unsigned N>
int foo (Function<N> *ptr) {
//...
}
It works fine in GCC.
Yes ptr is a function pointer. It is a pointer to a function returning an integer and accepting a pointer to a character array as argument.
Consider the function fun with the following prototype,
int fun(char (*ptr)[]);
fun() is a function accepting a pointer to a character array as argument.
and the following piece of code compile without any error or warning,
int (*ptr)(char (*ch)[]);
ptr=fun;