#include <iostream>
using namespace std;
int main()
{
int a[100],n;
cout<<&n<<" "<<&a[100]<<endl;
if(&n!=&a[100])
{
cout<<" What is wrong with C++?";
}
}
It prints the address of n and a[100] as same. But when I compare the two values in the if loop It says that they both are not equal.
What does this mean?
When I change the value of n, a[100] also changes so doesn't that mean n and a[100] are equal.
First, let's remember that there is no a[100]. It does not exist! If you tried to access a[100]'s "value" then, according to the abstract machine called C++, anything can happen. This includes blowing up the sun or dying my hair purple; now, I like my hair the way it is, so please don't!
Anyway, what you're doing is playing with the array's "one-past-the-end" pointer. You are allowed to obtain this pointer, which is a fake pointer of sorts, as long as you do not dereference it.
It is available only because the "101st element" would be "one past the end" of the array. (And there is debate as to whether you are allowed to write &a[100], rather than a+100, to get such a pointer; I am in the "no" camp.)
However, that still says nothing about comparing it to the address of some entirely different object. You cannot assume anything about the relative location of local variables in memory. You simply have no idea where n will live with respect to a.
The results you're observing are unpredictable, unreliable and meaningless side effects of the undefined behaviour exhibited by your program, caused by a combination of compiler optimisations and data locality.
For the array declaration int a[100], array index starts from 0 to 99 and when you try access the address of 101th element which is out of its range could overlap with next member (in your case its variable n) on the stack. However its undefined behavior.
For a bit if facts here is the relevant text from the specifications
Equality operator (==,!=)
Pointers to objects of the same type can be compared for equality with the 'intuitive' expected results:
From § 5.10 of the C++11 standard:
Pointers of the same type (after pointer conversions) can be compared for equality. Two pointers of the same type compare equal if > and only if they are both null, both point to the same function, or > both represent the same address (3.9.2).
(leaving out details on comparison of pointers to member and or the
null pointer constants - they continue down the same line of 'Do >
What I Mean':)
[...] If both operands are null, they compare equal. Otherwise if only one is null, they compare unequal.[...]
The most 'conspicuous' caveat has to do with virtuals, and it does seem to be the logical thing to expect too:
[...] if either is a pointer to a virtual member function, the result is unspecified. Otherwise they compare equal if and only if
they would refer to the same member of the same most derived object
(1.8) or the same subobject if they were dereferenced with a
hypothetical object of the associated class type. [...]
Maybe the problem could be that array of int is not the same as int!
by writing &a[100] you're invoking undefined behavior, since there is no element with index 100 in the array a. To get the same address safely, you could instead write a+100 (or &a[0]+100) in which case your program would be well-defined, but whether or not the if condition will hold cannot be predicted or relied upon on any implementation.
Related
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
The working draft of the standard N4659 says:
[basic.compound]
If two objects are pointer-interconvertible, then they have the same address
and then notes that
An array object and its first element are not pointer-interconvertible, even though they have the same address
What is the rationale for making an array object and its first element non-pointer-interconvertible? More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address? Isn't there a contradiction in there somewhere?
It would appear that given this sequence of statements
int a[10];
void* p1 = static_cast<void*>(&a[0]);
void* p2 = static_cast<void*>(&a);
int* i1 = static_cast<int*>(p1);
int* i2 = static_cast<int*>(p2);
we have p1 == p2, however, i1 is well defined and using i2 would result in UB.
There are apparently existing implementations that optimize based on this. Consider:
struct A {
double x[4];
int n;
};
void g(double* p);
int f() {
A a { {}, 42 };
g(&a.x[1]);
return a.n; // optimized to return 42;
// valid only if you can't validly obtain &a.n from &a.x[1]
}
Given p = &a.x[1];, g might attempt to obtain access to a.n by reinterpret_cast<A*>(reinterpret_cast<double(*)[4]>(p - 1))->n. If the inner cast successfully yielded a pointer to a.x, then the outer cast will yield a pointer to a, giving the class member access defined behavior and thus outlawing the optimization.
More generally, what is the rationale for distinguishing the notion of pointer-interconvertibility from the notion of having the same address?
It is hard if not impossible to answer why certain decisions are made by the standard, but this is my take.
Logically, pointers points to objects, not addresses. Addresses are the value representations of pointers. The distinction is particularly important when reusing the space of an object containing const members
struct S {
const int i;
};
S s = {42};
auto ps = &s;
new (ps) S{420};
foo(ps->i); // UB, requires std::launder
That a pointer with the same value representation can be used as if it were the same pointer should be thought of as the special case instead of the other way round.
Practically, the standard tries to place as little restriction as possible on implementations. Pointer-interconvertibility is the condition that pointers may be reinterpret_cast and yield the correct result. Seeing as how reinterpret_cast is meant to be compiled into nothing, it also means the pointers share the same value representation. Since that places more restrictions on implementations, the condition won't be given without compelling reasons.
Because the comittee wants to make clear that an array is a low level concept an not a first class object: you cannot return an array nor assign to it for example. Pointer-interconvertibility is meant to be a concept between objects of same level: only standard layout classes or unions.
The concept is seldom used in the whole draft: in [expr.static.cast] where it appears as a special case, in [class.mem] where a note says that for standard layout classes, pointers an object and its first subobject are interconvertible, in [class.union] where pointers to the union and its non static data members are also declared interconvertible and in [ptr.launder].
That last occurence separates 2 use cases: either pointers are interconvertible, or one element is an array. This is stated in a remark and not in a note like it is in [basic.compound], so it makes it more clear that pointer-interconvertibility willingly does not concern arrays.
Having read this section of Standard closely, I have the understanding that two objects are pointer-interconvertible, as the name suggests, if
They are “interconnected”, through their class definition (note that pointer interconvertible concept is defined for a class object and its first non-static data member).
They point to the same address. But, because their types are different, we need to “convert” their pointers' types, using reinterpret_cast operator.
For an array object, mentioned in the question, the array and its first element have no interconnectivity in terms of class definition and also we don’t need to convert their pointer types to be able to work with them. They just point to the same address.
I created the following simple test program to show that I can use the address of public integer to access the value of a private integer:
#include <iostream>
using namespace std;
class CarType{
public:
int year;
CarType(int price, int year){this -> year = year; this -> price = price;};
private:
int price;
};
int main(){
//Create new CarType object
CarType a = CarType(15000, 1999);
//Increment the memory address of public member variable, year by 1 and save the result
int* pricePointer = &a.year+1;
//De-reference the memory address and output it
cout << "PRICE: "<< *pricePointer << endl;
return 0;
}
The output shows that I can access the price variable just by knowing the address of year. Is there a way to prevent against this? Is this just an edge case or is this true for all types of objects?
This is possibly undefined (but certainly unwise) behaviour and you can prevent it in much the same way you prevent people writing to random addresses, or prevent inexperienced people from cutting off limbs with a chainsaw. In other words, not at all.
Caveat Coder.
The reason I say possibly in the above paragraph is that it's not entirely clear. In language-lawyer terms, refer to C++14 5.7 Additive operators /4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
and /5 when discussing adding an integral value to a pointer:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In your case, your pointer is actually pointing "one past the last element" so the addition itself doesn't generate undefined behaviour.
You may think that dereferencing the object would be undefined but, as per a note in 3.9.2 Compound types /3, it appears it may be valid:
For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
However, undefined or not, it's still unwise to dereference since you don't actually know that there is a variable of the correct type there. Implementations are free to pad structures as they see fit so there's no guarantee that price will be the same as *(year + 1).
In the following code, why is the address held by pointer x changing after the delete? As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
using namespace std;
#include <iostream>
#include <cstdlib>
int main()
{
int* x = new int;
*x = 2;
cout << x << endl << *x << endl ;
delete x;
cout << x << endl;
system("Pause");
return 0;
}
OUTPUT:
01103ED8
2
00008123
Observations: I'm using Visual Studio 2013 and Windows 8. Reportedly this doesn't work the same in other compilers. Also, I understand this is bad practice and that I should just reassign the pointer to NULL after it's deletion, I'm simply trying to understand what is driving this weird behaviour.
As I understand, the deletecall should free up allocated memory from heap, but it shouldn't change the pointer address.
Well, why not? It's perfectly legal output -- reading a pointer after having deleted it leads to undefined behavior. And that includes the pointer's value changing. (In fact, that doesn't even need UB; a deleted pointer can really point anywhere.)
Having read relevant bits of both C++98 and C++11 [N3485], and all the stuff H2CO3 pointed to:
Neither edition of the standard adequately describes what an "invalid pointer" is, under what circumstances they are created, or what their semantics are. Therefore, it is unclear to me whether or not the OP's code was intended to provoke undefined behavior, but de facto it does (since anything that the standard does not clearly define is, tautologically, undefined). The text is improved in C++11 but is still inadequate.
As a matter of language design, the following program certainly does exhibit unspecified behavior as marked, which is fine. It may, but should not also exhibit undefined behavior as marked; in other words, to the extent that this program exhibits undefined behavior, that is IMNSHO a defect in the standard. Concretely, copying the value of an "invalid" pointer, and performing equality comparisons on such pointers, should not be UB. I specifically reject the argument to the contrary from hypothetical hardware that traps on merely loading a pointer to unmapped memory into a register. (Note: I cannot find text in C++11 corresponding to C11 6.5.2.3 footnote 95, regarding the legitimacy of writing one union member and reading another; this program assumes that the result of this operation is unspecified but not undefined (except insofar as it might involve a trap representation), as it is in C.)
#include <string.h>
#include <stdio.h>
union ptr {
int *val;
unsigned char repr[sizeof(int *)];
};
int main(void)
{
ptr a, b, c, d, e;
a.val = new int(0);
b.val = a.val;
memcpy(c.repr, a.repr, sizeof(int *));
delete a.val;
d.val = a.val; // copy may, but should not, provoke UB
memcpy(e.repr, a.repr, sizeof(int *));
// accesses to b.val and d.val may, but should not, provoke UB
// result of comparison is unspecified (may, but should not, be undefined)
printf("b %c= d\n", b.val == d.val ? '=' : '!');
// result of comparison is unspecified
printf("c %c= e\n", memcmp(c.repr, e.repr, sizeof(int *)) ? '!' : '=');
}
This is all of the relevant text from C++98:
[3.7.3.2p4] If the argument given to a deallocation function in the standard library
is a pointer that is not the null pointer value (4.10), the deallocation function
shall deallocate the storage referenced by the pointer, rendering invalid all
pointers referring to any part of the deallocated storage. The effect of using
an invalid pointer value (including passing it to a deallocation function) is
undefined. [footnote: On some implementations, it causes a system-generated
runtime fault.]
The problem is that there is no definition of "using an invalid pointer value", so we get to argue about what qualifies. There is a clue to the committee's intent in the discussion of iterators (a category which is defined to include bare pointers):
[24.1p5] ... Iterators can also have singular values that are not associated with any container. [Example: After the declaration of an uninitialized pointer x (as with int* x; [sic]), x must always be assumed to have a singular value of a pointer.] Results of most expressions are undefined for singular values; the only exception is an assignment of a non-singular value to an iterator that holds a singular value. In this case the singular value is overwritten the same way as any other value. Dereferenceable and past-the-end values are always non-singular.
It seems at least plausible to assume that an "invalid pointer" is also meant to be an example of a "singular iterator", but there is no text to back this up; going in the opposite direction, there is no text confirming the (equally plausible) assumption that an uninitialized pointer value is meant to be an "invalid pointer" as well s a "singular iterator". So the hair-splitters among us might not accept "results of most expressions are undefined" as clarifying what qualifies as use of an invalid pointer.
C++11 has changed the text corresponding to 3.7.2.3p4 somewhat:
[3.7.4.2p4] ... Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an
invalid pointer value has implementation-defined behavior. [footnote: Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.]
(the text elided by the ellipsis is unchanged) We now have somewhat more clarity as to what is meant by "use of an invalid pointer value", and we can now say that the OP's code's semantics are definitely implementation-defined (but might be implementation-defined to be undefined). There is also a new paragraph in the discussion of iterators:
[24.2.1p10] An invalid iterator is an iterator that may be singular.
which confirms that "invalid pointer" and "singular iterator" are effectively the same thing. The remaining confusion in C++11 is largely about the exact circumstances that produce invalid/singular pointers/iterators; there should be a detailed chart of pointer/iterator lifecycle transitions (like there is for *values). And, as with C++98, the standard is defective to the extent it does not guarantee that copying-from and equality comparison upon such values are valid (not undefined).
I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.
There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.
Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.
In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.