C++ Array out of Range access to calculate pointer valid? - c++

Is the following code guaranteed to be working?
int* arr = new int[2];
std::cout << &arr[0x100];
Is this considered good practice or would it be cleaner to add an offset the regular way?
Edit: By "working" I mean that it should print the pointer to the theoretical member at 0x100. Basically if this is equivalent to "std::cout << ((unsigned int)arr + 0x100*sizeof(int));".

With my compiler (Cygwin GCC) getting the address at this value is the same as doing pointer arithmetic, although each is undefined behavior (UB). As mentioned in the comment below by Jens, at http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html, I found the following helpful.
It is also worth pointing out that both Clang and GCC nail down a few behaviors that the C standard leaves undefined. The things I'll describe are both undefined according to the standard and treated as undefined behavior by both of these compilers in their default modes.
Dereferences of Wild Pointers and Out of Bounds Array Accesses: Dereferencing random pointers (like NULL, pointers to free'd memory, etc) and the special case of accessing an array out of bounds is a common bug in C applications which hopefully needs no explanation. To eliminate this source of undefined behavior, array accesses would have to each be range checked, and the ABI would have to be changed to make sure that range information follows around any pointers that could be subject to pointer arithmetic. This would have an extremely high cost for many numerical and other applications, as well as breaking binary compatibility with every existing C library.
The pointer arithmetic is also UB. So you have an address, but you cannot dereference the pointer to it. So there is really no use in having this address at all. Just getting the address is UB and should not be used in code.
See this answer for out-of-bounds pointers:
Why is out-of-bounds pointer arithmetic undefined behaviour?
My sample code:
int* arr = new int[2];
std::cout << arr << std::endl;
std::cout << &(arr[0])<< std::endl;
std::cout << &(arr[1])<< std::endl;
std::cout << &arr[0x100] << std::endl; // UB, cannot be dereferenced
std::cout << &arr[256] << std::endl; // cannot be dereferenced, so no use in having it
std::cout << arr + 0x100; // UB here too, no use in having this address
Sample Output:
0x60003ae50
0x60003ae50
0x60003ae54
0x60003b250
0x60003b250
0x60003b250

In the first line you allocate 2 integer values. In the second line, you access memory outside this range. This is not allowed at all.
Edit: Some interesting comments here. But I cannot understand, why it should be needed to cite the standard for such a simple answer and why is pointer arithmetic discussed here so much?
From a logical view, std::cout << &arr[0x100] consists of 3 steps:
1. access the non existing member of an array
2. get the address of the non existing member
3. use the address of the non existing member
If the first step is invalid, aren't all the following undefined?

Related

Why is there a value printed and not NULL/0 after incrementing a pointer in C++?

I am fairly new to C++, so excuse if this is quite basic.
I am trying to understand the value printed after I increment my pointer in the following piece of code
int main()
{
int i = 5;
int* pointeri = &i;
cout << pointeri << "\n";
pointeri++;
i =7;
cout << *pointeri << "\n";
}
When I deference the pointer, it prints a random Integer. I am trying to understand, what is really happening here, why isn't the pointer pointing at NULL and does the random integer have a significance ?
The C++ language has a concept of Undefined Behavior. It means that it is possible to write code that does not constitute a valid program, and the compiler won't stop or even warn you. What such code does when executed is unknown.
Your program is a typical example. After the line int* pointeri = &i;, the pointer is pointing to the value i. After pointeri++ it is pointing to the memory location after the value i. What is stored at that location is unknown and the behavior of such code is undefined.
Needless to say, great care should be taken when coding in C++ in order to stay in the realm of defined behavior, in order to have meaningful and predictable results when running the program.
why isn't the pointer pointing at NULL
Because you haven't assigned or initialised the pointer to null.
and does the random integer have a significance ?
No.
Why is there a value printed ...
Because the behaviour of the program is undefined.
As you know, a "pointer" is simply an integer variable whose value is understood to be a memory address. If that value is zero, by convention we call it NULL and understand this to mean that "it doesn't point at anything." Otherwise, the value is presumed to be valid.
If you "increment" a pointer, its value is non-zero and therefore presumed to be valid. If you dereference it, you will either get "unpredictable data" or a memory-addressing fault.

Internal logic of operator [] when dealing with pointers

I've been studying C++ for couple of months now and just recently decided to look more deeply into the logic of pointers and arrays. What I've been taught in uni is pretty basic - pointers contain the address of a variable. When an array is created, basically a pointer to its first element is created.
So I started experimenting a bit. (and got to a conclusion which I need confirmation for). First of all I created
int arr[10];
int* ptr = &arr[5];
And as you would imagine
cout << ptr[3];
gave me the 8th element of the array. Next I tried
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
which to my great delight (not irony) returned the same addresses. Even though num wasn't an array.
The conclusion to which I got: array is not something special in C++. It's just a pointer to the first element (already typed that). More important: Can I think about every pointer in the manner of object of a class variable*. Is the operator [] just overloaded in the class int*? For example to be something along the lines of:
int operator[] (int index){
return *(arrayFirstaddress + index);
}
What was interesting to me in these experiments is that operator [] works for EVERY pointer. (So it's exactly like overloading an operator for all instances of the said class)
Of course, I can be as wrong as possible. I couldn't find much information in the web, since I didn't know how to word my question so I decided to ask here.
It would be extremely helpful if you explained to me if I'm right/wrong/very wrong and why.
You find the definition of subscripting, i.e. an expression like ptr2[5] in the c++ standard, e.g. like in this online c++ draft standard:
5.2.1 Subscripting [expr.sub]
(1) ... The expression E1[E2] is identical (by definition) to
*((E1)+(E2))
So your "discovery" sounds correct, although your examples seem to have some bugs (e.g. ptr2[5] should not return an address but an int value, whereas ptr2+5 is an address an not an int value; I suppose you meant &ptr2[5]).
Further, your code is not a prove of this discovery as it is based on undefined behaviour. It may yield something that supports your "discovery", but your discovery could still be not valid, and it could also do the opposite (really!).
The reason why it is undefined behaviour is that even pointer arithmetics like ptr2+5 is undefined behaviour if the result is out of the range of the allocated memory block ptr2 points to (which is definitely the case in your example):
5.7 Additive operators
(6) ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
Different compilers, different optimization settings, and even slight modifications anywhere in your program may let the compiler do other things here.
An array in C++ is a collection of objects. A pointer is a variable that can store the address of something. The two are not the same thing.
Unfortunately, your sample
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
exhibits undefined behaviour, both in the evaluation of ptr2[5] and ptr2 + 5. Pointer expressions are special - arithmetic involving pointers only has defined behaviour if the pointer being acted on (ptr2 in this case) and the result (ptr2 + 5) are within the same object. Or one past the end (although dereferencing a "one past the end" pointer - trying to access the value it points at - also gives undefined behaviour).
Semantically, *(ptr + n) and ptr[n] are equivalent (i.e. they have the same meaning) if ptr is a pointer and n is an integral value. So if evaluating ptr + n gives undefined behaviour, so does evaluating ptr[n]. Similarly, &ptr[n] and ptr + n are equivalent.
In expressions, depending on context, the name of an array is converted to a pointer, and that pointer is equal to the address of that array's first element. So, given
int x[5];
int *p;
// the following all have the same effect
p = x + 2;
p = &x[0] + 2;
p = &x[2];
That does not mean an array is a pointer though.

Why does C++ consider pointer and array of pointers as same thing?

Once I encountered following code:
int s =10;
int *p=&s;
cout << p[3] << endl;
And I can't understand why am I able to access p[3] that doesn't exist (only p exists that is single pointer but I still get access to p[3] that is array that I have never created).
Is it some compiler bug or it is a feature or I don't know some basics of C++ that covers this?
Thank you
Why does C++ consider pointer and array of pointers as same thing?
It doesn't. You're asking why it treats pointers and arrays as the same.
The [] operator is just an abbreviated form of pointer arithmetic. a[b] is equivalent to *(a + b). Array names can decay into pointers, and then pointer arithmetic is applied. It's the programmers job to make sure they don't go out of bounds. The compiler can't possibly stop you from shooting your foot off.
Also, claiming to be able to "access" it is a strong assertion. That is UB, and is most likely going to either read the wrong memory or get a segfault.
No, it's not a compiler bug, its a very useful feature... but lets not get ahead of ourselves here, the consequence of your code is called Undefined Behaviour
So, what's the feature? All naked arrays are actually pointer to the first element. Except un-decayed arrays (See What is array decaying?).
Consider this code:
int s =10;
int* array = new int[12];
int *p;
p = array; // p refers to the first element
int* x = p + 7; //advances to the 7th element, compiler never checks bounds
int* y = p + 700; //ditto ...this is obviously undefined
p = &s; //p is now pointing to where s
int* xx = p + 3; //But s is a single element, so Undefined Behaviour
Once an array is decayed, it's simply a pointer... And a pointer can be incremented, decremented, dereferenced, advanced, assigned or reassigned.
So,
cout << p[7] << endl;
is a valid C++ program. but not necessarily correct.
It's the responsibility of the programmer to know whether a pointer points to a single element or an array. but thanks to static analyzers and https://github.com/isocpp/CppCoreGuidelines, things are changing for good.
Also see What are all the common undefined behaviours that a C++ programmer should know about?
From here, section array-to-pointer decay:
There is anĀ implicit conversionĀ from lvalues and rvalues of array type to rvalues of pointer type: it constructs a pointer to the first element of an array. This conversion is used whenever arrays appear in context where arrays are not expected, but pointers are
Inherited from C, C++ allows you to treat any pointer like the first element of an array starting at that address.
That's in part because it passes arrays by reference as pointers and so for that to make sense you need to be able to treat a pointer as an array.
It also enables some quite neat and very efficient code in various circumstances.
The upshot is that p[3] is a valid construct in this context.
Obviously however it has undefined behaviour because p isn't pointing to an array! Unfortunately the language rules (and compiler) aren't smart enough to work that out.
C is a very low level language and doesn't enforce nice things like range checking either during compilation or execution.

Memory location of null pointer

After reading a lot of questions about null pointers, I still have confusion about memory allocation in null pointer.
If I type following code-
int a=22;
int *p=&a;//now p is pointing towards a
std::cout<<*p;//outputs 22
std::cout<<p;//outputs memory address of object a;
int *n=nullptr;// pointer n is initialized to null
std::cout<<n;
After compiling this code pointer n outputs literal constant 0, and if i try this,
std::cout<<*n;
this line of code is compiled by compiler but it is unable to execute, what is wrong in this code, it should print memory location of this pointer.
std::cout<<p;
does this output location of pointer in memory or location of an object in memory.
Since many or all of these answers are already answered in previous questions but somehow i am unable to understand because I am beginner in C++.
A nullptr pointer doesn't point to anything. It doesn't contain a valid address but a "non-address". It's conceptual, you shouldn't worry about the value it has.
The only thing that matters is that you can't dereference a nullptr pointer, because this will cause undefined behavior, and that's why your program fails at runtime (std::cout<<*n)
std::cout<<p;
In general outputs value of variable p, what that value means depends on p's type. In your case type or p is pointer to int (int *) so value of it is address of int. As pointer itself is an lvalue you can get address of it, so if you want to see where your pointer n located in memory just output it's address:
std::cout << &n << std::endl;
As said on many other answers do not dereference null pointer, as it leads to UB. So again:
std::cout << n << std::endl; // value of pointer n, ie address, in your case 0
std::cout << &n << std::endl; // address of pointer n, will be not 0
std::cout << *n << std::endl; // undefined behavior, you try to dereference nullptr
If you want to see address of nullptr itself, you cannot - it is a constant, not lvalue, and does not have address:
std::cout << &nullptr << std::endl; // compile error, nullptr is not lvalue
When you compile:
std::cout << *n;
The compiler will typically build some code like this:
mov rax, qword ptr [rbp - 0x40]
mov esi, dword ptr [rax]
call cout
The first line looks up the address of the pointer (rdp - 0x40) and stores it in the CPU register RAX. In this case the address of the nullptr is 0. RAX now contains 0.
The second line tries to read memory from the location (0) specified by RAX. On a typical computer setup memory location 0 is protected (it isn't a valid data memory location). This causes an invalid operation and you get a crash*.
It never reaches the third line.
*However, this isn't necessary true in all circumstances: on a micro-controller where you don't have an operating system in place, this might successfully dereference and read the value of memory location 0. However *nullptr wouldn't be a good way of expressing this intention! See http://c-faq.com/null/machexamp.html for more discussion. If you want the full detail on nullptr: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2431.pdf
nullptr is a special value, selected in such a way that no valid pointer could get this value. On many systems, the value is equal to numeric zero, but it is not a good idea to think of nullptr in terms of its numeric value.
To understand the meaning of nullptr you should first consider the meaning of a pointer: it is a variable that refers to something else, which may also refer to nothing at all. You need to be able to distinguish the state "my pointer refers to something" from the state "my pointer refers to nothing at all". This is where nullptr comes in: if a pointer is equal to nullptr, you know that it references "nothing at all".
Note: dereferencing nullptr (i.e. applying the unary asterisk operator to it) is undefined behavior. It may crash, or it may print some value, but it would be a "garbage value".
Dereferencing a null pointer is undefined behavior, so anything at all can happen. But, a null pointer still has to have a place in memory. So what you're seeing is just that. Typically compilers implement a null pointer as its value being all 0's.
Just because it's a gold quote, here's what Scott Meyer's has to say about UD behavior, from his book Effective C++ 2nd Ed.
"Nevertheless, there is something very troubling here. Your program's
behavior is undefined -- you have no way of knowing what will
happen... That means compilers may generate code to do whatever they
like: reformat your disk, send suggestive email to your boss, fax
source code to your competitors, whatever."
It is undefined behavior to dereference a null pointer. Any behavior that the compiler chooses or unintentionally happens is valid.
Changing the program in other places may also change the behavior of this code line.
I suspect that your confusion revolves around the fact that there are two memory locations involved here.
In this code:
int *n=nullptr;// pointer n is initialized to null
There is one variable, n, and that variable occupies space in memory. You can take the address of n and prove this to yourself:
std::cout << &n << "\n";
And you'll see that the address-of n is something legitimate. As in, not NULL.
n happens to be of type pointer-to-int, and the thing it points to is NULL. That means it doesn't point to anything at all; it's in a state where you can't dereference it.
But "dereference it" is exactly what you are doing here:
std::cout<<*n;
n is valid, but the thing it points to is not. That is why your program is ill-formed.

Address held by pointer changes after pointer is deleted

In the following code, why is the address held by pointer x changing after the delete? As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
using namespace std;
#include <iostream>
#include <cstdlib>
int main()
{
int* x = new int;
*x = 2;
cout << x << endl << *x << endl ;
delete x;
cout << x << endl;
system("Pause");
return 0;
}
OUTPUT:
01103ED8
2
00008123
Observations: I'm using Visual Studio 2013 and Windows 8. Reportedly this doesn't work the same in other compilers. Also, I understand this is bad practice and that I should just reassign the pointer to NULL after it's deletion, I'm simply trying to understand what is driving this weird behaviour.
As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
Well, why not? It's perfectly legal output -- reading a pointer after having deleted it leads to undefined behavior. And that includes the pointer's value changing. (In fact, that doesn't even need UB; a deleted pointer can really point anywhere.)
Having read relevant bits of both C++98 and C++11 [N3485], and all the stuff H2CO3 pointed to:
Neither edition of the standard adequately describes what an "invalid pointer" is, under what circumstances they are created, or what their semantics are. Therefore, it is unclear to me whether or not the OP's code was intended to provoke undefined behavior, but de facto it does (since anything that the standard does not clearly define is, tautologically, undefined). The text is improved in C++11 but is still inadequate.
As a matter of language design, the following program certainly does exhibit unspecified behavior as marked, which is fine. It may, but should not also exhibit undefined behavior as marked; in other words, to the extent that this program exhibits undefined behavior, that is IMNSHO a defect in the standard. Concretely, copying the value of an "invalid" pointer, and performing equality comparisons on such pointers, should not be UB. I specifically reject the argument to the contrary from hypothetical hardware that traps on merely loading a pointer to unmapped memory into a register. (Note: I cannot find text in C++11 corresponding to C11 6.5.2.3 footnote 95, regarding the legitimacy of writing one union member and reading another; this program assumes that the result of this operation is unspecified but not undefined (except insofar as it might involve a trap representation), as it is in C.)
#include <string.h>
#include <stdio.h>
union ptr {
int *val;
unsigned char repr[sizeof(int *)];
};
int main(void)
{
ptr a, b, c, d, e;
a.val = new int(0);
b.val = a.val;
memcpy(c.repr, a.repr, sizeof(int *));
delete a.val;
d.val = a.val; // copy may, but should not, provoke UB
memcpy(e.repr, a.repr, sizeof(int *));
// accesses to b.val and d.val may, but should not, provoke UB
// result of comparison is unspecified (may, but should not, be undefined)
printf("b %c= d\n", b.val == d.val ? '=' : '!');
// result of comparison is unspecified
printf("c %c= e\n", memcmp(c.repr, e.repr, sizeof(int *)) ? '!' : '=');
}
This is all of the relevant text from C++98:
[3.7.3.2p4] If the argument given to a deallocation function in the standard library
is a pointer that is not the null pointer value (4.10), the deallocation function
shall deallocate the storage referenced by the pointer, rendering invalid all
pointers referring to any part of the deallocated storage. The effect of using
an invalid pointer value (including passing it to a deallocation function) is
undefined. [footnote: On some implementations, it causes a system-generated
runtime fault.]
The problem is that there is no definition of "using an invalid pointer value", so we get to argue about what qualifies. There is a clue to the committee's intent in the discussion of iterators (a category which is defined to include bare pointers):
[24.1p5] ... Iterators can also have singular values that are not associated with any container. [Example: After the declaration of an uninitialized pointer x (as with int* x; [sic]), x must always be assumed to have a singular value of a pointer.] Results of most expressions are undefined for singular values; the only exception is an assignment of a non-singular value to an iterator that holds a singular value. In this case the singular value is overwritten the same way as any other value. Dereferenceable and past-the-end values are always non-singular.
It seems at least plausible to assume that an "invalid pointer" is also meant to be an example of a "singular iterator", but there is no text to back this up; going in the opposite direction, there is no text confirming the (equally plausible) assumption that an uninitialized pointer value is meant to be an "invalid pointer" as well s a "singular iterator". So the hair-splitters among us might not accept "results of most expressions are undefined" as clarifying what qualifies as use of an invalid pointer.
C++11 has changed the text corresponding to 3.7.2.3p4 somewhat:
[3.7.4.2p4] ... Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an
invalid pointer value has implementation-defined behavior. [footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.]
(the text elided by the ellipsis is unchanged) We now have somewhat more clarity as to what is meant by "use of an invalid pointer value", and we can now say that the OP's code's semantics are definitely implementation-defined (but might be implementation-defined to be undefined). There is also a new paragraph in the discussion of iterators:
[24.2.1p10] An invalid iterator is an iterator that may be singular.
which confirms that "invalid pointer" and "singular iterator" are effectively the same thing. The remaining confusion in C++11 is largely about the exact circumstances that produce invalid/singular pointers/iterators; there should be a detailed chart of pointer/iterator lifecycle transitions (like there is for *values). And, as with C++98, the standard is defective to the extent it does not guarantee that copying-from and equality comparison upon such values are valid (not undefined).