Related
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:
I have a method which accepts the bool variable named alarm. I need to access to some array by the index equals to alarm:
int ind = alarm; // assume here can be only '0' or '1'
But sometimes I get Access violation reading location... because of my variable equals more than 1 - 3, 5, etc:
How can this be possible?
...
Update:
the problem happens as a result of a randomize memory. I use it to simulate different input data.
Complete verifiable example on Microsoft Visual Studio 2015.
Standard Win32 console application:
#include "stdafx.h"
#include <stdint.h>
#include <stdlib.h>
static void randomize_memory(void* pointer, size_t size)
{
uint8_t* byteArray = reinterpret_cast<uint8_t*>(pointer);
for (int i = 0; i < size; i++)
{
byteArray[i] = rand() % 0xff;
}
}
struct MyStruct
{
double a = 0;
float b = 0;
bool flag = false;
};
int main()
{
while(true)
{
MyStruct st;
randomize_memory(&st, sizeof(st));
bool tmp = st.flag;
int ind = tmp;
if (ind > 1)
__debugbreak();
}
return 0;
}
The behaviour tested and occurs on compilers from Visual Studio 2010 (v100) up to Visual Studio 2015 (v140).
Looks like I just should not use this way to simulate data, but much worse is that I cannot be sure that some bool variable can be casted to 0-1.
You probably have undefined behavior somewhere in the code that is calling setAlarmSwitch.
In C++, an integer value when converted to bool takes the value false/true, which when promoted back to an integer type, become 0/1.
But that doesn't mean that bool is actually stored or passed in function calls as a single bit. In fact, most ABIs have a minimal width per argument (usually the width of a general-purpose register), and any shorter integral types are promoted to it.
To illustrate the issue:
#include <iostream>
void printBool(bool x) {
std::cout << x << std::endl;
}
int main() {
int i = 5;
printBool(i);
((void(*)(int))printBool)(i); // UB
}
Will print:
1
5
The assumption the compiler makes is that a bool argument can only contain the values 0/1 because if you follow the rules there would never be any other value passed in. It's only possible to pass another value by breaking the rules.
So then:
Using a bool value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.
Your example with reinterpret_cast on the other hand is another issue, I believe a bug in MSVC. It is working fine in GCC and clang.
In C++, bool is an integral type, that means it is possible to implicitly convert int or float data (for example) to bool type. Zero value is treated as false, and non-zero is treated as true. For example, the statements-
bool a = 0; // false
bool b = 10; // true
bool c = 3.2; // true
If you want your bool value (alarm) to be either 0(false) or 1(true), then you may revise the fuinction as below:
void StatusParameter::setAlarmSwitch(int ialarm)
{
bool alarm = (ialarm != 0); // value 5 is converted to 1
ber_print_func;
set_bold(alarm);
int ind = (int)alarm;
set_suffix(LimitSwitchSuffix[ind]);
Ber::setElementColor(m_label, LimitSwitchColor[ind]);
}
The value of a bool can only be true or false. However the representation is implementation-defined.
If you set any variable's memory to something which is not a valid representation for the variable (according to that implementation definition), and you try to read the variable, it causes undefined behaviour which is what you are observing.
You should be able to confirm it by checking the compiler documentation, but the compiler may have said something like "all bits 0 = false, all bits 1 = true, any other representation invalid".
Regardless of how 'bad' the code is, and assuming that alignment etc are not an issue on the compiler/platform, is this undefined or broken behavior?
If I have a struct like this :-
struct data
{
int a, b, c;
};
struct data thing;
Is it legal to access a, b and c as (&thing.a)[0], (&thing.a)[1], and (&thing.a)[2]?
In every case, on every compiler and platform I tried it on, with every setting I tried it 'worked'. I'm just worried that the compiler might not realize that b and thing[1] are the same thing and stores to 'b' might be put in a register and thing[1] reads the wrong value from memory (for example). In every case I tried it did the right thing though. (I realize of course that doesn't prove much)
This is not my code; it's code I have to work with, I'm interested in whether this is bad code or broken code as the different affects my priorities for changing it a great deal :)
Tagged C and C++ . I'm mostly interested in C++ but also C if it is different, just for interest.
It is illegal 1. That's an Undefined behavior in C++.
You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):
[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...
But, for members, there's no such contiguous requirement:
[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other...
While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:
[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))
Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..
[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...
Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.
On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....
Some viable workarounds (with defined behavior) have been provided by other answers.
As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks #2501 and #M.M.
1: See #Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.
No. In C, this is undefined behavior even if there is no padding.
The thing that causes undefined behavior is out-of-bounds access1. When you have a scalar (members a,b,c in the struct) and try to use it as an array2 to access the next hypothetical element, you cause undefined behavior, even if there happens to be another object of the same type at that address.
However you may use the address of the struct object and calculate the offset into a specific member:
struct data thing = { 0 };
char* p = ( char* )&thing + offsetof( thing , b );
int* b = ( int* )p;
*b = 123;
assert( thing.b == 123 );
This has to be done for each member individually, but can be put into a function that resembles an array access.
1 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 8)
If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
2 (Quoted from: ISO/IEC 9899:201x 6.5.6 Additive operators 7)
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
In C++ if you really need it - create operator[]:
struct data
{
int a, b, c;
int &operator[]( size_t idx ) {
switch( idx ) {
case 0 : return a;
case 1 : return b;
case 2 : return c;
default: throw std::runtime_error( "bad index" );
}
}
};
data d;
d[0] = 123; // assign 123 to data.a
it is not only guaranteed to work but usage is simpler, you do not need to write unreadable expression (&thing.a)[0]
Note: this answer is given in assumption that you already have a structure with fields, and you need to add access via index. If speed is an issue and you can change the structure this could be more effective:
struct data
{
int array[3];
int &a = array[0];
int &b = array[1];
int &c = array[2];
};
This solution would change size of structure so you can use methods as well:
struct data
{
int array[3];
int &a() { return array[0]; }
int &b() { return array[1]; }
int &c() { return array[2]; }
};
For c++: If you need to access a member without knowing its name, you can use a pointer to member variable.
struct data {
int a, b, c;
};
typedef int data::* data_int_ptr;
data_int_ptr arr[] = {&data::a, &data::b, &data::c};
data thing;
thing.*arr[0] = 123;
In ISO C99/C11, union-based type-punning is legal, so you can use that instead of indexing pointers to non-arrays (see various other answers).
ISO C++ doesn't allow union-based type-punning. GNU C++ does, as an extension, and I think some other compilers that don't support GNU extensions in general do support union type-punning. But that doesn't help you write strictly portable code.
With current versions of gcc and clang, writing a C++ member function using a switch(idx) to select a member will optimize away for compile-time constant indices, but will produce terrible branchy asm for runtime indices. There's nothing inherently wrong with switch() for this; this is simply a missed-optimization bug in current compilers. They could compiler Slava' switch() function efficiently.
The solution/workaround to this is to do it the other way: give your class/struct an array member, and write accessor functions to attach names to specific elements.
struct array_data
{
int arr[3];
int &operator[]( unsigned idx ) {
// assert(idx <= 2);
//idx = (idx > 2) ? 2 : idx;
return arr[idx];
}
int &a(){ return arr[0]; } // TODO: const versions
int &b(){ return arr[1]; }
int &c(){ return arr[2]; }
};
We can have a look at the asm output for different use-cases, on the Godbolt compiler explorer. These are complete x86-64 System V functions, with the trailing RET instruction omitted to better show what you'd get when they inline. ARM/MIPS/whatever would be similar.
# asm from g++6.2 -O3
int getb(array_data &d) { return d.b(); }
mov eax, DWORD PTR [rdi+4]
void setc(array_data &d, int val) { d.c() = val; }
mov DWORD PTR [rdi+8], esi
int getidx(array_data &d, int idx) { return d[idx]; }
mov esi, esi # zero-extend to 64-bit
mov eax, DWORD PTR [rdi+rsi*4]
By comparison, #Slava's answer using a switch() for C++ makes asm like this for a runtime-variable index. (Code in the previous Godbolt link).
int cpp(data *d, int idx) {
return (*d)[idx];
}
# gcc6.2 -O3, using `default: __builtin_unreachable()` to promise the compiler that idx=0..2,
# avoiding an extra cmov for idx=min(idx,2), or an extra branch to a throw, or whatever
cmp esi, 1
je .L6
cmp esi, 2
je .L7
mov eax, DWORD PTR [rdi]
ret
.L6:
mov eax, DWORD PTR [rdi+4]
ret
.L7:
mov eax, DWORD PTR [rdi+8]
ret
This is obviously terrible, compared to the C (or GNU C++) union-based type punning version:
c(type_t*, int):
movsx rsi, esi # sign-extend this time, since I didn't change idx to unsigned here
mov eax, DWORD PTR [rdi+rsi*4]
In C++, this is mostly undefined behavior (it depends on which index).
From [expr.unary.op]:
For purposes of pointer
arithmetic (5.7) and comparison (5.9, 5.10), an object that is not an array element whose address is taken in
this way is considered to belong to an array with one element of type T.
The expression &thing.a is thus considered to refer to an array of one int.
From [expr.sub]:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
(&thing.a)[0] is perfectly well-formed because &thing.a is considered an array of size 1 and we're taking that first index. That is an allowed index to take.
(&thing.a)[2] violates the precondition that 0 <= i + j <= n, since we have i == 0, j == 2, n == 1. Simply constructing the pointer &thing.a + 2 is undefined behavior.
(&thing.a)[1] is the interesting case. It doesn't actually violate anything in [expr.add]. We're allowed to take a pointer one past the end of the array - which this would be. Here, we turn to a note in [basic.compound]:
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (1.7) occupied by the object53 or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (5.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address.
Hence, taking the pointer &thing.a + 1 is defined behavior, but dereferencing it is undefined because it does not point to anything.
This is undefined behavior.
There are lots of rules in C++ that attempt to give the compiler some hope of understanding what you are doing, so it can reason about it and optimize it.
There are rules about aliasing (accessing data through two different pointer types), array bounds, etc.
When you have a variable x, the fact that it isn't a member of an array means that the compiler can assume that no [] based array access can modify it. So it doesn't have to constantly reload the data from memory every time you use it; only if someone could have modified it from its name.
Thus (&thing.a)[1] can be assumed by the compiler to not refer to thing.b. It can use this fact to reorder reads and writes to thing.b, invalidating what you want it to do without invalidating what you actually told it to do.
A classic example of this is casting away const.
const int x = 7;
std::cout << x << '\n';
auto ptr = (int*)&x;
*ptr = 2;
std::cout << *ptr << "!=" << x << '\n';
std::cout << ptr << "==" << &x << '\n';
here you typically get a compiler saying 7 then 2 != 7, and then two identical pointers; despite the fact that ptr is pointing at x. The compiler takes the fact that x is a constant value to not bother reading it when you ask for the value of x.
But when you take the address of x, you force it to exist. You then cast away const, and modify it. So the actual location in memory where x is has been modified, the compiler is free to not actually read it when reading x!
The compiler may get smart enough to figure out how to even avoid following ptr to read *ptr, but often they are not. Feel free to go and use ptr = ptr+argc-1 or somesuch confusion if the optimizer is getting smarter than you.
You can provide a custom operator[] that gets the right item.
int& operator[](std::size_t);
int const& operator[](std::size_t) const;
having both is useful.
Heres a way to use a proxy class to access elements in a member array by name. It is very C++, and has no benefit vs. ref-returning accessor functions, except for syntactic preference. This overloads the -> operator to access elements as members, so to be acceptable, one needs to both dislike the syntax of accessors (d.a() = 5;), as well as tolerate using -> with a non-pointer object. I expect this might also confuse readers not familiar with the code, so this might be more of a neat trick than something you want to put into production.
The Data struct in this code also includes overloads for the subscript operator, to access indexed elements inside its ar array member, as well as begin and end functions, for iteration. Also, all of these are overloaded with non-const and const versions, which I felt needed to be included for completeness.
When Data's -> is used to access an element by name (like this: my_data->b = 5;), a Proxy object is returned. Then, because this Proxy rvalue is not a pointer, its own -> operator is auto-chain-called, which returns a pointer to itself. This way, the Proxy object is instantiated and remains valid during evaluation of the initial expression.
Contruction of a Proxy object populates its 3 reference members a, b and c according to a pointer passed in the constructor, which is assumed to point to a buffer containing at least 3 values whose type is given as the template parameter T. So instead of using named references which are members of the Data class, this saves memory by populating the references at the point of access (but unfortunately, using -> and not the . operator).
In order to test how well the compiler's optimizer eliminates all of the indirection introduced by the use of Proxy, the code below includes 2 versions of main(). The #if 1 version uses the -> and [] operators, and the #if 0 version performs the equivalent set of procedures, but only by directly accessing Data::ar.
The Nci() function generates runtime integer values for initializing array elements, which prevents the optimizer from just plugging constant values directly into each std::cout << call.
For gcc 6.2, using -O3, both versions of main() generate the same assembly (toggle between #if 1 and #if 0 before the first main() to compare): https://godbolt.org/g/QqRWZb
#include <iostream>
#include <ctime>
template <typename T>
class Proxy {
public:
T &a, &b, &c;
Proxy(T* par) : a(par[0]), b(par[1]), c(par[2]) {}
Proxy* operator -> () { return this; }
};
struct Data {
int ar[3];
template <typename I> int& operator [] (I idx) { return ar[idx]; }
template <typename I> const int& operator [] (I idx) const { return ar[idx]; }
Proxy<int> operator -> () { return Proxy<int>(ar); }
Proxy<const int> operator -> () const { return Proxy<const int>(ar); }
int* begin() { return ar; }
const int* begin() const { return ar; }
int* end() { return ar + sizeof(ar)/sizeof(int); }
const int* end() const { return ar + sizeof(ar)/sizeof(int); }
};
// Nci returns an unpredictible int
inline int Nci() {
static auto t = std::time(nullptr) / 100 * 100;
return static_cast<int>(t++ % 1000);
}
#if 1
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d->b << "\n";
d->b = -5;
std::cout << d[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd->c << "\n";
//cd->c = -5; // error: assignment of read-only location
std::cout << cd[2] << "\n";
}
#else
int main() {
Data d = {Nci(), Nci(), Nci()};
for(auto v : d.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << d.ar[1] << "\n";
d->b = -5;
std::cout << d.ar[1] << "\n";
std::cout << "\n";
const Data cd = {Nci(), Nci(), Nci()};
for(auto v : cd.ar) { std::cout << v << ' '; }
std::cout << "\n";
std::cout << cd.ar[2] << "\n";
//cd.ar[2] = -5;
std::cout << cd.ar[2] << "\n";
}
#endif
If reading values is enough, and efficiency is not a concern, or if you trust your compiler to optimize things well, or if struct is just that 3 bytes, you can safely do this:
char index_data(const struct data *d, size_t index) {
assert(sizeof(*d) == offsetoff(*d, c)+1);
assert(index < sizeof(*d));
char buf[sizeof(*d)];
memcpy(buf, d, sizeof(*d));
return buf[index];
}
For C++ only version, you would probably want to use static_assert to verify that struct data has standard layout, and perhaps throw exception on invalid index instead.
It is illegal, but there is a workaround:
struct data {
union {
struct {
int a;
int b;
int c;
};
int v[3];
};
};
Now you can index v:
Is int &y=x same as int y=&x?
Are s++ and *s++ the same?
Also in the below code, why is *s++ giving me some wrong results?
I was expecting *s value to be 12
#include <iostream>
using namespace std;
int main()
{
int p=10;
int &q=p; //q is a reference variable to p
//int r=&p; //error: invalid conversion from 'int*' to 'int'
int *s=&p; //valid
q++;
*s++; //here even s++ works, and cout<<*s does not give 12 but some lengthy number
//and cout<<s gives some hexadecimal, I'm guessing thats the address
cout<<p<<endl<<q<<endl<<*s;
}
Output I'm getting:
11
11
6422280
int &y=x same as int y=&x.
In first y is an integer reference which is initialized to x . OTOH, in second you are trying to assign int a value of int * which results in a compilation error .
why is *s++ giving me some wrong results? I was expecting *s value to be 12
This is because ++ has precedence over * , therefore ,first s is incremented first (thus pointing to uninitialized memory location), and then dereferenced .
Which is leading to dereference of uninitialized memory location resulting in undefined behaviour.
If you want to expected value , you can do this -
(*s)++; //dereference first and then increment
Is int &y=x same as int y=&x?
No. They are very different.
int x;
int& y = x; // y is a reference to x.
On the other hand,
int x;
int y = &x; // Should be compiler error.
// Initializing y with a pointer.
The line:
*s++;
is equivalent to:
int* temp = s;
s++;
*temp;
You are evaluating *temp but the side effect is that you are incrementing s. After that s points to the next element. That is not a valid address though. The next time you access *s, the program exhibits undefined behavior. The output of the line:
cout<<p<<endl<<q<<endl<<*s;
can be anything.
Is int &y=x same as int int y=&x?
Not nearly. The first binds the x to the int-reference y, the second initializes the int y with the address of x.
Both can be made to compile cleanly, but the second only with severe contortions abusing operator overloading:
{
int y;
int& x = y;
}
{
// Only for demonstration, never do something this crazy for real
struct { int operator&() { return 0; }; } x;
int y = &x;
}
Are s++ and *s++ the same?
Certainly not, though they have the same effect in your example, as you discard the result:
The first increments s and returns its pre-increment value.
The second does the same, but then dereferences that.
In both cases, s thereafter points behind an object, and may not be dereferenced on pain of Undefined Behavior (UB).
Which explains the curious number you got, but might also have resulted in your program just printing 42 and then beginning to reformat your harddrive (legal but unlikely).