C++ negative array index [duplicate] - c++

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 5 years ago.
I alloced an int array of 3 elements, and thought of this code below:
int a[3];
for(int i = -2; i < 3; ++i){
a[i] = i;
cout<<a[i]<<" ";
}
Here's its output:
-2 -1 0 1 2
It seems like array a has 5 alloced space, and a is at the middle of those space.
Any ideas?

To explain how negative indexes work, you first have to learn (or remember) that for any array or pointer a and index i, the expression a[i] is equal to *(a + i).
That means that you can have a pointer to the middle element of an array, and use it with a positive or negative index, and it's simple arithmetic.
Example:
int a[3] = { 1, 2, 3 };
int* p = &a[1]; // Make p point to the second element
std::cout << p[-1]; // Prints the first element of a, equal to *(p - 1)
std::cout << p[ 0]; // Prints the second element of a, equal to *p
std::cout << p[ 1]; // Prints the third element of a, equal to *(p + 1)
Somewhat graphically it can be seen like
+------+------+------+
| a[0] | a[1] | a[2] |
+------+------+------+
^ ^ ^
| | |
p-1 p p+1
Now when using a negative index in an array, as in the example of a in the question, it will be something like this:
+-----+-------+-------+------+------+------+
| ... | a[-2] | a[-1] | a[0] | a[1] | a[2] |
+-----+-------+-------+------+------+------+
That is, negative indexes here will be out of bounds of the array, and will lead to undefined behavior.

That is something you should never ever do! C++ doesn't check the boundaries of built in plain arrays so technically you can access locations which are out of the allocated space (which is only 3 ints not 5) but you will ultimately produce errors.

When I ran this on my own IDE (Visual Studio 2017) it threw an exception stating that the stack around the array had become corrupted. What I think is happening is that 3 spaces in memory are being allocated but you force the array to write to five spaces, those preceding and after the initial allocation. This will work technically, but is definitely not recommended. You are basically writing over top of something in your memory and that can have bad consequences when done in large arrays.

Related

How is char* textMessages[] formatted in memory?

As we know, multi-array like int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}}; is contiguous, so it is exactly the same as int array2[6] = { 0, 1, 2, 3, 4, 5 };
for (int *p = *array1, i = 0; i < 6; i++, p++)
{
std::cout << *p << std::endl;
}
0
1
2
3
4
5
Then, I have these codes:
char *textMessages[] = {
"Small text message",
"Slightly larger text message",
"A really large text message that ",
"is spread over multiple lines*"
};
I find that its layout is not the same as int[3][2]:
For every sub message, like "Small text message", every alpha is contiguous and offset is 1(sizeof(char) == 1).
However, the final element of "Small text message" -- e and the first element of "Slightly larger text message" -- S is not contiguous in memeary:
char *textMessages[] = {
"Small text message",
"Slightly larger text message",
"A really large text message that ",
"is spread over multiple lines*"
};
char *a = *(textMessages)+17, *b = *(textMessages + 1), *c = *(textMessages + 1) + 27, *d = *(textMessages + 2), *e = *(textMessages + 2) + 31, *f = *(textMessages + 3);
std::ptrdiff_t a_to_b = b - a, c_to_d = d - c, e_to_f = f - e;
printf("they(a b c d e f) are all messages's first or final element: %c %c %c %c %c %c\n", *a, *b, *c, *d, *e, *f);
printf("\n\naddress of element above: \n%p\n%p\n%p\n%p\n%p\n%p\n", a, b, c, d, e, f);
printf("\n\nptrdiff of a to b, c to d and e to f: %d %d %d\n", a_to_b, c_to_d, e_to_f);
they(a b c d e f) are all messages' first or final element: e S e A t i
address of element above:
002F8B41
002F8B44
002F8B5F
002F8B64
002F8B83
002F8B88
ptrdiff of a to b, c to d and e to f: 3 5 5
My question is:
What does 3 5 5 mean here?
Why 3 5 5, not 5 5 5
What's the layout here?
Edit:
I don't think this question duplicates of How are multi-dimensional arrays formatted in memory?, because what I ask is not the same as that question's doubt and the solution should not that's question's answers.
How is char* textMessages[] formatted in memory?
Just like other single dimensional arrays. Each element is stored in a consecutive memory location. Those elements are pointers to char object.
Each of those pointers point to the beginning of a string literal. String literals have static storage duration, and their memory location is implementation defined.
What does 3 5 5 mean here?
You've done pointer subtraction between pointers that do not point to the same array (each string literal is a separate array). The behaviour of the program is technically undefined because of this.
In practice, most of the time, what you get is the distance of the pointed values in memory. Since the location of those arrays is implementation defined, there isn't anything meaningful about those values.
Why 3 5 5, not 5 5 5
Because the behaviour is undefined
Because that happens to be the distance between the pointer character objects. The distance will depend on where the compiler chooses to store the string literals.
You can pick either explanation depending on your point of view.
PS. You are converting the string literals to a pointer to non-const char. This conversion has been deprecated ever since C++ was standardized and has been ill-formed since C++11.
PPS. Accessing int *p = *array1 beyond the bounds of array1[0] which has the size 2, as in your first code snippet, technically has undefined behaviour. Same applies to *(textMessages + 2) + 31 in the second.
String literals have Static storage duration meaning that they are allocated in memory when the program starts but they are not assured to be in contiguous memory from one another as, at this point, the program might not even know they are in an array.
When the array is then constructed the addresses of those strings are placed in contiguous memory (but, of course, not the string themselves)
P.S.
What I refer as "strings" in the above really means "string literal"
On a language lawyer point of view, this:
int array1[3][2] = {{0, 1}, {2, 3}, {4, 5}};
for (int *p = *array1, i = 0; i < 6; i++, p++)
{
std::cout << *p << std::endl;
}
is undefined per standard, because array1 is an array of 3 arrays of size 2. So 0 and 1 are is same array, but not 1 and 2 so incrementing the pointer is correct at first time but the second incrementation makes it point past the first array (which is correct) so dereferencing it is formally UB.
Of course, any current and past implementation do accept it.
But this is a quite different animal:
char *textMessages[] = {
"Small text message",
"Slightly larger text message",
"A really large text message that ",
"is spread over multiple lines*"
};
Here textMessages is an array of pointers and not a 2D array. But it is even worse. It is an array of 4 char * pointers pointing to string litterals, and it is undefined behaviour to modify a string litteral. That means that textMessages[0][0] = 'X'; is likely to crash the program.
But once we know we have an array of pointers to string litterals, all becomes clear: the compiler has stored the string litterals in memory the way it wanted, and has just given pointers to that memory. So the 3,5,5 are just padding values because your compiler has decided to store text litteral that way.
3 5 5 means nothing here, as well as 5 5 5 .
char *textMessages[] is an char* array, the elements of it are pointers. And they (the pointers) are contiguous in the array. But the value of these pointers are not that related. The strings in your code may existed in different places.
The result on my compiler is: 243 309 1861

Memory allocation for a 2D array in c++

I'm new to c++ and trying to understand how memory allocating is done for 2D arrays.I have gone through several threads in SOF and these answers.
answer1
answer2
Those answers says that continuous memory allocation happen for 2D arrays as well like for normal arrays. (correct me if I'm wrong) Answer 2 mention that it should be able to access array[i][j] th element using*(array + (i * ROW_SIZE) + j)
But when I'm trying to do that it gives me an unexpected result.
Code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
int array[2][2] = { {1, 2}, {3, 4} };
cout<<*((array+1*2) + 1)<<endl;
cout<<**((array+1*2) + 1)<<endl;
return 0;
}
Result:
0x76e1f4799008
2012576581
It doesn't give me array[i][j] th element. Can some one explain what is happening here, whether that solution is wrong or have i misunderstood the previous answers.
First you have to get a pointer to the first element of the whole thing, i.e. not just to the first row:
int *p = &array[0][0]
After that, you can index p like you want:
std::cout << *(p + 1*2 + 1) << std::endl;
The reason why your initial tries didn't work is related to types. What is happening is that in your first try you are trying to evaluate
*(array + 1*2 + 1)
Now since, array is of type "array-of-length-2" containing "array-of-length-2" containing ints, i.e. (int[2])[2] what this will do is increment array with 3 times the size of its elements, which in turn are int[2]. This will jump outside of your array. In fact *(array+3) is equivalent to array[3]. Now, that clearly is something with type int[2], which will decay to a pointer when you try to print it. In your second try, you dereference this, yielding something like array[3][0], which now is of the right type, but again, points outside of the array.
it is more like when you simulate a 2d array as a one dimensional array like this
0 1 2
3 4 5
6 7 8
and the array will be 0 1 2 3 4 5 6 7 8 to access 4 it should be *(a[0]+(1*3)+1)a[0] is the same as a(name of array).
array is 2D pointer
**array = array[0][0]
*(*array + j) = array[0][j]
*(*(array+i) + j) = array[i][j]

How are 2-Dimensional Arrays stored in memory?

#include<bits/stdc++.h>
using namespace std;
int main()
{
int a[101][101];
a[2][0]=10;
cout<<a+2<<endl;
cout<<*(a+2)<<endl;
cout<<*(*(a+2));
return 0;
}
Why are the values of a+2 and *(a+2) same? Thanks in advance!
a is a 2D array, that means an array of arrays. But it decays to a pointer to an array when used in appropriate context. So:
in a+2, a decays to a pointer to int arrays of size 101. When you pass is to an ostream, you get the address of the first element of this array, that is &(a[2][0])
in *(a+2) is by definition a[2]: it is an array of size 101 that starts at a[2][0]. It decays to a pointer to int, and when you pass it to an ostream you get the address of its first element, that is still &(a[2][0])
**(a+2) is by definition a[2][0]. When you pass it to an ostream you get its int value, here 10.
But beware: a + 2 and a[2] are both pointers to the same address (static_cast<void *>(a+2) is the same as static_cast<void *>(a[2])), but they are pointers to different types: first points to int array of size 101, latter to int.
I'll try to explain you how the memory is mapped by the compiler:
Let's consider a more pratical example multi-dimentional array:
int a[3][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
You can execute the command
x/10w a
In GDB and look at the memory:
0x7fffffffe750: 1 2 3 4
0x7fffffffe760: 5 6 7 8
0x7fffffffe770: 9 0
Each element is stored in a int type (32 bit / 4 bytes).
So the first element of the matrix has been stored in:
1) a[0][0] -> 0x7fffffffe750
2) a[0][1] -> 0x7fffffffe754
3) a[0][2] -> 0x7fffffffe758
4) a[1][0] -> 0x7fffffffe75c
5) a[1][1] -> 0x7fffffffe760
6) a[1][2] -> 0x7fffffffe764
7) a[2][0] -> 0x7fffffffe768
...
The command:
std::cout << a + 2 << '\n'
It will print the address 0x7fffffffe768 because of the
pointer aritmetic:
Type of a is int** so it's a pointer to pointers.
a+2 is the a[0] (the first row) + 2. The result is a pointer
to the third row.
*(a+2) deferences the third row, that's {7,8,9}
The third row is an array of int, that's a pointer to int.
Then the operator<< will print the value of that pointer.
A 2-dimensional array is an array of arrays, so it's stored like this in memory:
char v[2][3] = {{1,3,5},{5,10,2}};
Content: | 1 | 3 | 5 | 5 | 10 | 2
Address: v v+1 v+2 v+3 v+4 v+5
To access v[x][y], the compiler rewrites it as: *(v + y * M + x) (where M is the second dimension specified)
For example, to access v[1][1], the compiler rewrites it as *(v + 1*3 + 1) => *(v + 4)
Be aware that this is not the same as a pointer to a pointer (char**).
A pointer to a pointer is not an array: it contains and address to a memory cell, which contains another address.
To access a member of a 2-dimensional array using a pointer to a pointer, this is what is done:
char **p;
/* Initialize it */
char c = p[3][5];
Go to the address specified by the content of p;
Add the offset to that address (3 in our case);
Go to that address and get its content (our new address).
Add the second offset to that new address (5 in our case).
Get the content of that address.
While to access the member via a traditional 2-dimensional array, these are the steps:
char p[10][10];
char c = p[3][5];
Get the address of pand sum the first offset (3), multiplied by the dimension of a row (10).
Add the the second offset (5) to the result.
Get the content of that address.
If you have an array like this
T a[N];
then the name of the array is implicitly converted to pointer to its first element with rare exceptions (as for example using an array name in the sizeof operator).
So for example in expression ( a + 2 ) a is converted type T * with value &a[0].
Relative to your example wuth array
int a[101][101];
in expression
a + 2
a is converted to rvalue of type int ( * )[101] and points to the first "row" of the array. a + 2 points to the third "row" of the array. The type of the row is int[101]
Expression *(a+2) gives this third row that has type int[101] that is an array. And this array as it is used in an expression in turn is converted to pointer to its first element of type int *.
It is the same starting address of the memory area occupied by the third row.
Only expression ( a + 2 ) has type int ( * )[101] while expression *( a + 2 ) has type int *. But the both yield the same value - starting address of the memory area occupied by the third row of the array a.
The first element of an array is at the same location as the array itself - there is no "empty space" in an array.
In cout << a + 2, a is implicitly converted into a pointer to its first element, &a[0], and a + 2 is the location of a's third element, &a[2].
In cout << *(a + 2), the array *(a + 2) - that is, a[2] - is converted into a pointer to its first element, &a[2][0].
Since the location of the third element of a and the location of the first element of the third element of a are the same, the output is the same.

How to use pointers in for loops to count characters?

I was looking through the forums and saw a question about counting the numbers of each letter in a string. I am teaching myself and have done some research and am now starting to do projects. Here I have printed the elements of the array. But I do so without pointers. I know I can use a pointer to an array and have it increment for each value, but I need some help doing so.
Here is my code without the pointer:
code main() {
char alph [] = {'a', 'b', 'c'};
int i, o;
o = 0;
for(i=0; i < 3; i++)
{ cout << alph[i] << ' '; };
};
Here is my bad code that doesn't work trying to get the pointer to work.
main() {
char alph [] = {'a', 'b', 'c'};
char *p;
p = alph;
for (; p<=3; p++);
cout << *p;
return 0;
};
I hope that it's not too obvious of an answer; I don't mean to waste anyone's time. Also this is my first post so if anyone wants to give me advice, thank you.
Very good try. There's just one tiny thing wrong, which is this:
p <= 3
Pointers are just some number which represents a memory address. When you do p = alph, you're not setting p to 0, you're setting it to point to the same address as alph. When looping over an array with pointers, you have to compare the current pointer with a pointer that is one past the end of the array. To get a pointer to one element past the end of the array, you add the number of elements to the array:
alph + 3 // is a pointer to one past the end of the array
Then your loop becomes
for (; p < alph + 3; ++p)
You may think that getting a pointer to one past the end of the array is going out of bounds of the array. However, you're free to get a pointer to anywhere in memory, as long as you don't dereference it. Since the pointer alph + 3 is never dereferenced - you only use it as a marker for the end of the array - and everything is fine.
Here are some rough correlations for the different versions:
/-----------------------------------\
| Pointer Version | Index Version |
-------------------------------------
| p | i |
| p = alph | i = 0 |
| *p | alph[i] |
| alph + 3 | 3 |
| p < alph + 3 | i < 3 |
\-----------------------------------/
Also note that instead of doing alph + 3, you may want to use sizeof. sizeof gives you the number of bytes that an object occupies in memory. For arrays, it gives you the number of bytes the whole array takes up (but it doesn't work with pointers, so you can do it with alph but not with p, for example). The advantage of using sizeof to get the size of the array is that if you change the size later, you do not have to go and find all the places where you wrote alph + 3 and change them. You can do that like this:
for (; p < alph + sizeof(alph); ++p)
Additional note: because the size of char is defined to be 1 byte, this works. If you change the array to an array of int, for example (or any other type that is bigger than 1 byte) it will not work any more. To remedy this, you divide the total size in bytes of the array with the size of a single element:
for (; p < alph + sizeof(alph) / sizeof(*alph); ++p)
This may be a little complicated to understand, but all you're doing is taking the total number of bytes of the array and dividing it by the size of a single element to find the number of elements in the array. Note that you are adding the number of elements in the array, not the size in bytes of the array. This is a consequence of how pointer arithmetic works (C++ automatically multiplies the number you add to a pointer by the size of the type that the pointer points to).
For example, if you have
int alph[3];
Then sizeof(alph) == 12 because each int is 4 bytes big. sizeof(*alph) == 4 because *alph is the first element of the array. Then sizeof(alph) / sizeof(alph) is equal to 12 / 4 which is 3, which is the number of elements in the array. Then by doing
for (; p < alph + sizeof(alph) / sizeof(*alph); ++p)
that is equivalent to
for (; p < alph + 12 / 4; ++p)
which is the same as
for (; p < alph + 3; ++p)
Which is correct.
This has the good advantage that if you change the array size to 50 and change the type to short (or any other combination of type and array size), the code will still work correctly.
When you get more advanced (and hopefully understand arrays enough to stop using them...) then you will learn how to use std::end to do all this work for you:
for (; p < std::end(alph); ++p)
Or you can just use the range-based for loop introduced in C++11:
for (char c : alph)
Which is even easier. I recommend understanding pointers and pointer arithmetic well before reclining on the convenient tools of the Standard Library though.
Also good job SO, you properly syntax-highlighted my ASCII-Art-Chart.
The pointer loop should be:
for (char * p = alph; p != alph + 3; ++p)
{
std::cout << *p << std::endl;
}
You can get a bit more fancy by hoisting the end of the array out of the loop, and by inferring the array size automatically:
for (char * p = alph, * end = alph + sizeof(alph)/sizeof(alph[0]); p != end; ++p)
In C++11, you can do even better:
for (char c : alph) { std::cout << c << std::endl; }
Your problem (apart from compilation issues noted in my comment to the question) is that pointers are not as small as 3. You need:
for (p = alph; p < alph + sizeof(alph); p++)
for the loop. Note that sizeof() generates a compile-time constant. (In C99 or C2011, that is not always the case; in C++, it is always the case, AFAIK). In this context, sizeof() is a fancy way of writing 3, but if you add new characters to the array, it adjusts automatically, whereas if you write 3 and change things, you have to remember to change the 3 to the new value too.
Ruminations on the use of sizeof()
As Kerrek SB points out, sizeof() returns the size of an array in bytes. By definition, sizeof(char) == 1, so in this context, it was safe to use sizeof() on the array. There are also times when it is not safe - or you have to do some extra work. If you had an array of some other type, you can use:
SomeType array[] = { 2, 3, 5, 7, 11, 13 };
then the number of elements in the array is (sizeof(array)/sizeof(array[0])). That is, the number of elements is the total size of the array, in bytes, divided by the size of one element (also in bytes).
The other big gotcha is that if you 'pass an array' as a function argument, you can't use sizeof() on it to get the correct size - you get the size of a pointer instead. This is a good reason for not using C-style strings or C-style arrays: use std::string and std::vector<SomeType> instead, not least because you can find their actual size reliably with a member function call.
I recommand you the following revised code.
char alph [] = {'a', 'b', 'c', 0};
char *p;
p = alph;
while(*p)
{
std::cout << *p++;
}
0 marks the end of a string in both c and c++.

Help me understand this Strange C++ code

This was a question in our old C++ exam. This code is driving me crazy, could anyone explain what it does and - especially - why?
int arr[3]={10,20,30};
int *arrp = new int;
(*(arr+1)+=3)+=5;
(arrp=&arr[0])++;
std::cout<<*arrp;
This statement writes to the object *(arr+1) twice without an intervening sequence point so has undefined behavior.
(*(arr+1)+=3)+=5;
This statement writes to the object arrp twice without an intervening sequence point so has undefined behavior.
(arrp=&arr[0])++;
The code could result in anything happening.
Reference: ISO/IEC 14882:2003 5 [expr]/4: "Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression."
(*(arr+1)+=3)+=5;
arr + 1 - element with index 1
*(arr + 1) - value of this element
(arr + 1) += 3 - increase by 3
((arr+1)+=3)+=5 - increase by 5;
so arr[1] == 28
(arrp=&arr[0])++;
arr[0] - value of element 0
&arr[0] - address of element 0
arrp=&arr[0] - setting arrp to point to elem 0
(arrp=&arr[0])++ - set arr to point to elem 1
result: 28
This line:
(*(arr+1)+=3)+=5;
produces the same result as this (see footnote):
arr[1] += 3;
arr[1] += 5;
This line:
(arrp=&arr[0])++;
produces the same result as this (see footnote):
int* arrp = arr+1;
So this line:
std::cout<<*arrp
prints out 28.
But this code leaks memory because int *arrp = new int; allocates a new int on the heap which will be lost on assignment by (arrp=&arr[0])++;
Footnote: Of course I'm assuming an absence of weirdness.
Edit: Apparently some of the lines in fact lead to undefined behavior, due to C++ Standard 5/4. So this really is a crappy exam question.
int arr[3]={10,20,30}; // obvious?
int *arrp = new int; // allocated memory for an int
(*(arr+1)+=3)+=5; // (1)
(arrp=&arr[0])++; // (2)
std::cout<<*arrp; // (3)
(1)
*(arr+1) is the same as arr[1], which means that *(arr+1)+=3 will increase arr[1] by 3, so arr[1] == 23 now.
(*(arr+1)+=3)+=5 means arr[1] is increased by another 5, so it will be 28 now.
(2)
arrp will pont to the address of the first element of arr (arr[0]). The pointer arrp will then be incremented, thus it will point to the second element after the entire statement is executed.
(3)
Prints what arrp points to: the second element of arr, meaning 28.
Well, remember that arrays can be interpreted as pointers
int arr[3]={10,20,30};
int *arrp = new int;
creates an array arr of three integers and an int pointer that gets assigned with a freshly allocated value.
Since assignment operators return a reference to the value that has been assigned in order to allow multi-assignment,
(*(arr+1)+=3)+=5;
is equivalent to
*(arr+1)+=3;
*(arr+1)+=5;
*(arr + 1) refers to the first element of the array arr, therefore arr[1] is effectively increased by eight.
(arrp=&arr[0])++; assigns the address of the first array element to arrp and afterward increments this pointer which now points to the second element (arr[1] again).
By dereferencing it in std::cout<<*arrp, you output arr[1] which now holds the value 20 + 3 + 5 = 28.
So the code prints 28 (and furthermore creates a memory-leak since the new int initially assigned to arrp never gets deleted)
I'll try to answer you by rewriting the code in a simpler way.
int arr[3]={10,20,30};
int *arrp = new int;
(*(arr+1)+=3)+=5;
(arrp=&arr[0])++;
std::cout<<*arrp;
=== equals ===
int arr[3]={10,20,30};//create array of 3 elements and assign them
int *arrp = new int;//create an integer pointer and allocate an int to it(useless)
//(*(arr+1)+=3)+=5;
arr[1] = arr[1] + 3;//arr[1] == arr+1 because it is incrementing the arr[0] pointer
arr[1] = arr[1] + 5;
//(arrp=&arr[0])++;
arrp = &arr[0];//point the integer pointer to the first element in arr[]
arrp++;//increment the array pointer, so this really is now pointing to arr[1]
std::cout<<*arrp;//just print out the value, which is arr[1]
I am assuming you understand pointers and basic c.