What's the difference between the below 2 codes in main() function? Both are filled with 0s. I expect array q to be filled with 0s since it's value-init. However array p is also filled with 0s.
int main() {
int *p = new int[3];
int *q = new int[3]();
}
q is guaranteed to be filled with 0s.
p will be pointing to uninitialized memory and therefore will filled with whatever data happened to be left in that memory location from previous usage... it might be zeroes, but it might be anything else; you can't rely on it being set to any particular value, so you have to write to that memory before you read from it or you'll invoke Undefined Behavior.
() is the initializer. This particular initializer sets every element of the array to zero. If you don't provide an initializer, default initialization takes place, which, in case of int *p = new int[3];, results in an array filled with indeterminate values ("garbage") which may or may not be 0;
I'm a programming and c++ novice. I'd appreciate some help with this.
the following program (in c++) doesn't encounter any problem either in compilation or run-time:
int main()
{
int b = 5;
int*a = &b;
*(a+5) = 6;
return 0;
}
But according to everything I learned it shouldn't work, because a is a pointer to a single variable. What am I missing here?
Your program should indeed not encounter any problem at compile time. It is all valid code with regards to compilation.
However it will encounter undefined behaviour at runtime as a+5 is not a valid address.
If you want to know why it should compile, you can write code like this:
int func( int * buf, size_t size )
{
for( size_t i = 0; i < size; ++i )
{
*(buf + size) = static_cast<int>(i); // or (int)i in C
}
}
int main()
{
int buf[ 6 ];
func( buf, 6 );
}
In your code a is a pointer to memory. a + 5 means an address 5 "ints" on from where a points. As a was pointed at a single integer b, there are no guarantees about such an address. Interestingly enough, it is well defined to refer to a+1 even though it points to a place in memory that you should not read from or write to. But the pointer itself has some guarantees, i.e. it will be greater than a and if you subtract 1 from it you will get back to a and if you do a ptrdiff between it and a you will get 1. But that is just a special property of "one past the end" which allows programmers to specify memory ranges.
The program do have an undefined behaviour:
int main()
{
//This cause the loading of the "main" function to allocate memory for variable b
//It could be in a memory page that was already allocated to the program
//or in a new allocated page.
int b = 5;
//Here a just getting the address of variable b.
int*a = &b;
//This is the undefined behavior and can end up in two cases:
// 1. If (a+5) value is in a memory space that is allocated to the application.
// Then no runtime error will happen, and the value will be writing there.
// probably dirting some other value, and can cause an undefined behavior later
// in the application execution.
// 2. If (a+5) value is in a memory space that wasn't allocated to the application.
// the application will crash
*(a+5) = 6;
return 0;
}
Now, since a page size is probably 4096 and b is somewhere within a page, *b+5 is in most cases still be in the same page. If you want to challenge it more change it from 5 to 5000 or higher and the chance for crashes will increase.
Yes it shouldn't work when you access memory space which is not in your process region, but perhaps no one has owned that particular region ((a + 5)) which didn't cause run time illegal memory access or it can. Hence its a UB.
Just adding to the existing answers.
The access
*(a+5) = a[5]
So this is the location not allocated by you.
In the case of array say
int a[6];
You have a valid access from a[0] to a[5] where a[5] is the last element of the array and any further access like a[6] will lead to undefined behavior as that location is not allocated by you.
Similarly you just have a integer allocated like
int b=5;
int *a = &b;
a is a pointer pointing to &b i.e address of b.
So the valid access for this is just a[0] which is the only location allocated by you on the stack.
Any other access like a[1] a[2]... and so on will lead to undefined behavior.
The access turns out to be VALID if you have something like
int b[6];
int *a = b;
Now a[5] will give the value of the last element of the array b
This question already has answers here:
How do I use arrays in C++?
(5 answers)
Closed 7 years ago.
I'm trying to understand the different ways of declaring an array (of one or two dimensions) in C++ and what exactly they return (pointers, pointers to pointers, etc.)
Here are some examples:
int A[2][2] = {0,1,2,3};
int A[2][2] = {{0,1},{2,3}};
int **A = new int*[2];
int *A = new int[2][2];
In each case, what exactly is A? Is it a pointer, double pointer? What happens when I do A+1? Are these all valid ways of declaring matrices?
Also, why does the first option not need the second set of curly braces to define "columns"?
Looks like you got a plethora of answers while I was writing mine, but I might as well post my answer anyway so I don't feel like it was all for nothing...
(all sizeof results taken from VC2012 - 32 bit build, pointer sizes would, of course, double with a 64 bit build)
size_t f0(int* I);
size_t f1(int I[]);
size_t f2(int I[2]);
int main(int argc, char** argv)
{
// A0, A1, and A2 are local (on the stack) two-by-two integer arrays
// (they are technically not pointers)
// nested braces not needed because the array dimensions are explicit [2][2]
int A0[2][2] = {0,1,2,3};
// nested braces needed because the array dimensions are not explicit,
//so the braces let the compiler deduce that the missing dimension is 2
int A1[][2] = {{0,1},{2,3}};
// this still works, of course. Very explicit.
int A2[2][2] = {{0,1},{2,3}};
// A3 is a pointer to an integer pointer. New constructs an array of two
// integer pointers (on the heap) and returns a pointer to the first one.
int **A3 = new int*[2];
// if you wanted to access A3 with a double subscript, you would have to
// make the 2 int pointers in the array point to something valid as well
A3[0] = new int[2];
A3[1] = new int[2];
A3[0][0] = 7;
// this one doesn't compile because new doesn't return "pointer to int"
// when it is called like this
int *A4_1 = new int[2][2];
// this edit of the above works but can be confusing
int (*A4_2)[2] = new int[2][2];
// it allocates a two-by-two array of integers and returns a pointer to
// where the first integer is, however the type of the pointer that it
// returns is "pointer to integer array"
// now it works like the 2by2 arrays from earlier,
// but A4_2 is a pointer to the **heap**
A4_2[0][0] = 6;
A4_2[0][1] = 7;
A4_2[1][0] = 8;
A4_2[1][1] = 9;
// looking at the sizes can shed some light on subtle differences here
// between pointers and arrays
A0[0][0] = sizeof(A0); // 16 // typeof(A0) is int[2][2] (2by2 int array, 4 ints total, 16 bytes)
A0[0][1] = sizeof(A0[0]); // 8 // typeof(A0[0]) is int[2] (array of 2 ints)
A1[0][0] = sizeof(A1); // 16 // typeof(A1) is int[2][2]
A1[0][1] = sizeof(A1[0]); // 8 // typeof(A1[0]) is int[2]
A2[0][0] = sizeof(A2); // 16 // typeof(A2) is int[2][2]
A2[0][1] = sizeof(A2[0]); // 8 // typeof(A1[0]) is int[2]
A3[0][0] = sizeof(A3); // 4 // typeof(A3) is int**
A3[0][1] = sizeof(A3[0]); // 4 // typeof(A3[0]) is int*
A4_2[0][0] = sizeof(A4_2); // 4 // typeof(A4_2) is int(*)[2] (pointer to array of 2 ints)
A4_2[0][1] = sizeof(A4_2[0]); // 8 // typeof(A4_2[0]) is int[2] (the first array of 2 ints)
A4_2[1][0] = sizeof(A4_2[1]); // 8 // typeof(A4_2[1]) is int[2] (the second array of 2 ints)
A4_2[1][1] = sizeof(*A4_2); // 8 // typeof(*A4_2) is int[2] (different way to reference the first array of 2 ints)
// confusion between pointers and arrays often arises from the common practice of
// allowing arrays to transparently decay (implicitly convert) to pointers
A0[1][0] = f0(A0[0]); // f0 returns 4.
// Not surprising because declaration of f0 demands int*
A0[1][1] = f1(A0[0]); // f1 returns 4.
// Still not too surprising because declaration of f1 doesn't
// explicitly specify array size
A2[1][0] = f2(A2[0]); // f2 returns 4.
// Much more surprising because declaration of f2 explicitly says
// it takes "int I[2]"
int B0[25];
B0[0] = sizeof(B0); // 100 == (sizeof(int)*25)
B0[1] = f2(B0); // also compiles and returns 4.
// Don't do this! just be aware that this kind of thing can
// happen when arrays decay.
return 0;
}
// these are always returning 4 above because, when compiled,
// all of these functions actually take int* as an argument
size_t f0(int* I)
{
return sizeof(I);
}
size_t f1(int I[])
{
return sizeof(I);
}
size_t f2(int I[2])
{
return sizeof(I);
}
// indeed, if I try to overload f0 like this, it will not compile.
// it will complain that, "function 'size_t f0(int *)' already has a body"
size_t f0(int I[2])
{
return sizeof(I);
}
yes, this sample has tons of signed/unsigned int mismatch, but that part isn't relevant to the question. Also, don't forget to delete everything created with new and delete[] everything created with new[]
EDIT:
"What happens when I do A+1?" -- I missed this earlier.
Operations like this would be called "pointer arithmetic" (even though I called out toward the top of my answer that some of these are not pointers, but they can turn into pointers).
If I have a pointer P to an array of someType, then subscript access P[n] is exactly the same as using this syntax *(P + n). The compiler will take into account the size of the type being pointed to in both cases. So, the resulting opcode will actually do something like this for you *(P + n*sizeof(someType)) or equivalently *(P + n*sizeof(*P)) because the physical cpu doesn't know or care about all our made up "types". In the end, all pointer offsets have to be a byte count. For consistency, using array names like pointers works the same here.
Turning back to the samples above: A0, A1, A2, and A4_2 all behave the same with pointer arithmetic.
A0[0] is the same as *(A0+0), which references the first int[2] of A0
similarly:
A0[1] is the same as *(A0+1) which offsets the "pointer" by sizeof(A0[0]) (i.e. 8, see above) and it ends up referencing the second int[2] of A0
A3 acts slightly differently. This is because A3 is the only one that doesn't store all 4 ints of the 2 by 2 array contiguously. In my example, A3 points to an array of 2 int pointers, each of these point to completely separate arrays of two ints. Using A3[1] or *(A3+1) would still end up directing you to the second of the two int arrays, but it would do it by offsetting only 4bytes from the beginning of A3 (using 32 bit pointers for my purposes) which gives you a pointer that tells you where to find the second two-int array. I hope that makes sense.
For the array declaration, the first specified dimension is the outermost one, an array that contains other arrays.
For the pointer declarations, each * adds another level of indirection.
The syntax was designed, for C, to let declarations mimic the use. Both the C creators and the C++ creator (Bjarne Stroustrup) have described the syntax as a failed experiment. The main problem is that it doesn't follow the usual rules of substitution in mathematics.
In C++11 you can use std::array instead of the square brackets declaration.
Also you can define a similar ptr type builder e.g.
template< class T >
using ptr = T*;
and then write
ptr<int> p;
ptr<ptr<int>> q;
int A[2][2] = {0,1,2,3};
int A[2][2] = {{0,1},{2,3}};
These declare A as array of size 2 of array of size 2 of int. The declarations are absolutely identical.
int **A = new int*[2];
This declares a pointer to pointer to int initialized with an array of two pointers. You should allocate memory for these two pointers as well if you want to use it as two-dimensional array.
int *A = new int[2][2];
And this doesn't compile because the type of right part is pointer to array of size 2 of int which cannot be converted to pointer to int.
In all valid cases A + 1 is the same as &A[1], that means it points to the second element of the array, that is, in case of int A[2][2] to the second array of two ints, and in case of int **A to the second pointer in the array.
The other answers have covered the other declarations but I will explain why you don't need the braces in the first two initializations. The reason why these two initializations are identical:
int A[2][2] = {0,1,2,3};
int A[2][2] = {{0,1},{2,3}};
is because it's covered by aggregate initialization. Braces are allowed to be "elided" (omitted) in this instance.
The C++ standard provides an example in § 8.5.1:
[...]
float y[4][3] = {
{ 1, 3, 5 },
{ 2, 4, 6 },
{ 3, 5, 7 },
};
[...]
In the following example, braces in the initializer-list are elided;
however the initializer-list has the same effect as the
completely-braced initializer-list of the above example,
float y[4][3] = {
1, 3, 5, 2, 4, 6, 3, 5, 7
};
The initializer for y begins with a left brace, but the one for y[0]
does not, therefore three elements from the list are used. Likewise
the next three are taken successively for y[1] and y[2].
Ok I will try it to explain it to you:
This is a initialization. You create a two dimensional array with the values:
A[0][0] -> 0
A[0][1] -> 1
A[1][0] -> 2
A[1][1] -> 3
This is the exactly the same like above, but here you use braces. Do it always like this its better for reading.
int **A means you have a pointer to a pointer of ints. When you do new int*[2] you will reserve memory for 2 Pointer of integer.
This doesn't will be compiled.
int A[2][2] = {0,1,2,3};
int A[2][2] = {{0,1},{2,3}};
These two are equivalent.
Both mean: "I declare a two dimentional array of integers. The array is of size 2 by 2".
Memory however is not two dimensional, it is not laid out in grids, but (conceptionaly) in one long line. In a multi-dimensional array, each row is just allocated in memory right after the previous one.
Because of this, we can go to the memory address pointed to by A and either store two lines of length 2, or one line of length 4, and the end result in memory will be the same.
int **A = new int*[2];
Declares a pointer to a pointer called A.
A stores the address of a pointer to an array of size 2 containing ints. This array is allocated on the heap.
int *A = new int[2][2];
A is a pointer to an int.
That int is the beginning of a 2x2 int array allocated in the heap.
Aparrently this is invalid:
prog.cpp:5:23: error: cannot convert ‘int (*)[2]’ to ‘int*’ in initialization
int *A = new int[2][2];
But due to what we saw with the first two, this will work (and is 100% equivalent):
int *A new int[4];
int A[2][2] = {0,1,2,3};
A is an array of 4 ints. For the coder's convenience, he has decided to declare it as a 2 dimensional array so compiler will allow coder to access it as a two dimensional array. Coder has initialized all elements linearly as they are laid in memory. As usual, since A is an array, A is itself the address of the array so A + 1 (after application of pointer math) offset A by the size of 2 int pointers. Since the address of an array points to the first element of that array, A will point to first element of the second row of the array, value 2.
Edit: Accessing a two dimensional array using a single array operator will operate along the first dimension treating the second as 0. So A[1] is equivalent to A[1][0]. A + 1 results in equivalent pointer addition.
int A[2][2] = {{0,1},{2,3}};
A is an array of 4 ints. For the coder's convenience, he has decided to declare it as a 2 dimensional array so compiler will allow coder to access it as a two dimensional array. Coder has initialized elements by rows. For the same reasons above, A + 1 points to value 2.
int **A = new int*[2];
A is pointer to int pointer that has been initialized to point to an array of 2 pointers to int pointers. Since A is a pointer, A + 1 takes the value of A, which is the address of the pointer array (and thus, first element of the array) and adds 1 (pointer math), where it will now point to the second element of the array. As the array was not initialized, actually doing something with A + 1 (like reading it or writing to it) will be dangerous (who knows what value is there and what that would actually point to, if it's even a valid address).
int *A = new int[2][2];
Edit: as Jarod42 has pointed out, this is invalid. I think this may be closer to what you meant. If not, we can clarify in the comments.
int *A = new int[4];
A is a pointer to int that has been initialized to point to an anonymous array of 4 ints. Since A is a pointer, A + 1 takes the value of A, which is the address of the pointer array (and thus, first element of the array) and adds 1 (pointer math), where it will now point to the second element of the array.
Some takeaways:
In the first two cases, A is the address of an array while in the last two, A is the value of the pointer which happened to be initialized to the address of an array.
In the first two, A cannot be changed once initialized. In the latter two, A can be changed after initialization and point to some other memory.
That said, you need to be careful with how you might use pointers with an array element. Consider the following:
int *a = new int(5);
int *b = new int(6);
int c[2] = {*a, *b};
int *d = a;
c+1 is not the same as d+1. In fact, accessing d+1 is very dangerous. Why? Because c is an array of int that has been initialized by dereferencing a and b. that means that c, is the address of a chunk of memory, where at that memory location is value which has been set to the value pointed to by tovariable a, and at the next memory location that is a value pinned to by variable b. On the other hand d is just the address of a. So you can see, c != d therefore, there is no reason that c + 1 == d + 1.
In this code:
int * p = new int(44);
p is allocated on the heap and the value it points to is 44;
but now I can also do something like this:
p[1] = 33;
without getting an error. I always thought
int * p = new int(44);
was just another way of saying "P is allocated on the heap and points to an address containing 44" but apparently it makes p a pointer to an array of ints? is the size of this new array 44? Or is this result unusual.
You were right: P is allocated on the heap and points to an address containing 44. There's no array allocated. p[1] = 33; is what they call "undefined behavior". Your program might crash, but it's not guaranteed to crash every single time you do this.
int *p_scalar = new int(5); //allocates an integer, set to 5.
If you access p_scalar[n] (n <> 0) it may crash
In your example, the C++ language gives you a default implementation for the subscript operator for pointers that looks somehow similar to this:
(*Ptr)& operator[](size_t index)
{
return *(address + index*sizeof(*Ptr));
}
This can always be overloaded and replaced for any type. Integers are not an exception, when you say:
int * pointer = alocate_int_array();
pointer[1] = 1;
You're using the compiler-augmented default implementation of that operator.
int* p_bob = new int;
*p_bob = 78;
The above code makes sense to me. I use the de-reference operation to allocation new memory and assign a value of 78.
int* p_dynint = new int[10];
*p_dynint[2] = 12;
This however doesn't make sense. If I try to use the de-reference operator on p_dynint[] I get an error. Why would an array be any different?
*p_bob = 78; this assigns the value 78 to the memory pointed to by p_bob (which represents an int).
p_dynint[2] = 12; simply accesses the 3rd element.
p_dynint[2] is actually equivalent to *(p_dynint+2).
p_dynint[2] is equivalent to *(p_dynint + 2). The derefencing is implied in the [] operator.
It is no real problem to do this:
int* p_dynint=new int[10];
//write first element
*p_dynint=10;
//write second element
*(p_dynint+1)=20;
//write three elements to std::cout
std::cout<<p_dynint[0]<<p_dynint[1]<<p_dynint[10]<<std::endl;
This example also highlights a problem with arrays. You can read and write anything. The output generated by p_dynint[10] is an int but its value is just the next few bytes converted to an int.
Use containers if possible (for further reasoning read this)