C++/Address Space: 2 Bytes per adress?

C++/Address Space: 2 Bytes per adress? - c++

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards

With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.

While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...

This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

Related

C++ Stack around the variable x was corrupted

I am new to C++. I am learning some basics.
I tried the below program and got a run time error Stack around the variable x was corrupted.
int x = 56;
int *ptr = &x;
ptr[1]=8;
cout << *ptr << endl;
But if i update the index in line 3 to 0, say ptr[0] = 8, I am not getting any run time error and the console shows 8 as the output.
I assume 2 digits in the integer variable x and thought pointer index will have 0 and 1 as valid values. Why ptr[1] is causing run tume error where as ptr[2], ptr[3] does not cause any run times error and simply shows 56 as o/p.
Can any one help me to understand what is really going on here. May be a better tutorial site as an add on would help me to understand more on this subject.

ptr[1] is actually the integer next to the variable pointed to by ptr. Writing to such memory means overwriting about anything. You may overwrite other variables. Or return addresses. Or stack frames. Or about anything else. Heck, you may overwrite ptr itself, depending on how variables on the stack are arranged.
Undefined behavior. Compilers are allowed to assume it doesn't happen.

Let see what you are doing here (Lets assume int is 4 bytes):
int x = 56;
int *ptr = &x;
ptr[1]=8;
cout << *ptr << endl;
<- 4 bytes->......(rest of the stack)
----------------
| x | |
----------------
^ ^
ptr[0] ptr[1]
So, pre[1] is writing to a memory location which does not yet exist. So, you are writing data out-of-bound.

Presumably you were expecting ptr[1] to mean the second byte in x. But that's not how pointer arithmetic works. The pointer is an int*, so arithmetic is performed in "chunks" of int-sizes. Therefore, ptr[1] is the non-existent integer "next to" x.
You could probably see this working by making ptr a char* instead, but be careful because this is real hackery and probably not a good idea unless you really know what you're doing.
A further misconception is your indication that the number of decimal digits in the human-readable representation of x's value has anything to do with the number of bytes taking up by x in memory; it doesn't.

Weird C++ array initialization behaviour

The following code
int x;
cin >> x;
int b[x];
b[5] = 8;
cout << sizeof(b)/sizeof(b[0]) << endl << b[5];
with x inputted as 10 gives the ouput:
10
8
which seems very weird to me because:
According to http://www.cplusplus.com/doc/tutorial/arrays/ we shouldn't even be able to initialize an array using a value obtained from cin stream.
NOTE: The elements field within square brackets [], representing the number of elements in the array, must be a constant expression, since arrays are blocks of static memory whose size must be determined at compile time, before the program runs.
But that's not the whole story! The same code with x inputted as 4 sometimes gives the output
Segmentation fault. Core dumped.
and sometimes gives the output:
4
8
What the heck is going on? Why doesn't the compiler act in a single manner? Why can I assign a value to an array index that is larger than the array? And why can we even initialize an array using variable in the first place?

I initially mentioned this as a comment, but seeing how no one has answered, I'm gonna add it here.
What you have demonstrated above is undefined behavior. It means that you can't tell what will the outcome be. As Brian adds in the comments, it will result in a diagnostic message (which could be a warning). Since the compiler would go ahead anyway, it can be called UB as it is not defined in the standard.

Why do I get a random number when increasing the integer value of a pointer?

I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.

What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.

You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]

You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".

int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.

When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)

Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2

It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?

Data member offset in C++

"Inside the C++ Object Model" says that the offset of a data member in a class is always 1 more than the actual offset in order to distinguish between the pointer to 0 and the pointer to the first data member, here is the example:
class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
//to be used after, ignore it for the first question
int main(void) {
/*cout << "&Point3d::x = " << &Point3d::x << endl;
cout << "&Point3d::y = " << &Point3d::y << endl;
cout << "&Point3d::z = " << &Point3d::z << endl;*/
printf("&Point3d::x = %p\n", &Point3d::x);
printf("&Point3d::y = %p\n", &Point3d::y);
printf("&Point3d::z = %p\n", &Point3d::z);
getchar();
}
So in order to distinguish the two pointers below, the offset of a data member is always 1 more.
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
The main function above is attempt to get the offset of the members to verify this argument, which is supposed to output: 5, 9, 13(Consider the vptr of 4bytes at the beginning). In MS Visual Studio 2012 however, the output is:
&Point3d::x = 00000004
&Point3d::y = 00000008
&Point3d::z = 0000000C
Question: So is MS C++ compiler did some optimization or something to prevent this mechanism?

tl;dr
Inside the C++ Object Model is a very old book, and most of its contents are implementation details of a particular compiler anyway. Don't worry about comparing your compiler to some ancient compiler.
Full version
An answer to the question linked to in a comment on this question addresses this quite well.
The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
[...]
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit.
[...]
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

The language doesn't specify how member-pointers are represented, so anything you read in a book will just be an example of how they might be represented.
In this case, as you say, it sounds like the vptr occupies the first four bytes of the object; again, this is not something specified by the language. If that is the case, no accessible members would have an offset of zero, so there's no need to adjust the offsets to avoid zero; a member-pointer could simply be represented by the member's offset, with zero reserved for "null". It sounds like that is what your compiler does.
You may find that the offsets for non-polymorphic types are adjusted as you describe; or you may find that the representation of "null" is not zero. Either would be valid.

class Point3d {
public:
virtual ~Point3d();
public:
static Point3d origin;
float x, y, z;
};
Since your class contains a virtual destructor, and (most of) the compiler(s) typically puts a pointer to the virtual function table as the first element in the object, it makes sense that the first of your data is at offset 4 (I'm guessing your compiler is a 32-bit compiler).
Note however that the C++ standard does not stipulate how data members should be stored inside the class, and even less how much space, if any, the virtual function table should take up.
[And yes, it's invalid (undefined behaviour) to take the address of a element that is not to a "real" member object, but I don't think this is causing an issue in this particular example - it may with a different compiler or on a different processor architecture, etc]

Unless you specify a different alignment, your expectation of the offset bing 5, ... would be wwong anyway. Normaly the adresses of bigger elements than char are usually aligned on even adresses and I guess even to the next 4-byte boundary. The reason is efficiency of accessing the memory in the CPU.
On some architectures, accessing an odd address could cause an exception (i.e. Motorola 68000), depending on the member, or at least a performance slowdown.

While it's true that the a null pointer of type "pointer to member of a given type" must be different from any non-null value of that type, offsetting non-null pointers by one is not the only way that a compiler can ensure this. For example, my compiler uses a non-zero representation of null pointer-to-members.
namespace {
struct a {
int x, y;
};
}
#include <iostream>
int main() {
int a::*p = &a::x, a::*q = &a::y, a::*r = nullptr;
std::cout << "sizeof(int a::*) = " << sizeof(int a::*)
<< ", sizeof(unsigned long) = " << sizeof(long);
std::cout << "\n&a::x = " << *reinterpret_cast<long*>(&p)
<< "\n&a::y = " << *reinterpret_cast<long*>(&q)
<< "\nnullptr = " << *reinterpret_cast<long*>(&r)
<< '\n';
}
Produces the following output:
sizeof(int a::*) = 8, sizeof(unsigned long) = 8
&a::x = 0
&a::y = 4
nullptr = -1
Your compiler is probably doing something similar, if not identical. This scheme is probably more efficient for most 'normal' use cases for the implementation because it won't have to do an extra "subtract 1" every time you use a non-null pointer-to-member.

That book (available at this link) should make it much clearer that it is just describing a particular implementation of a C++ compiler. Details like the one you mention are not part of the C++ language specification -- it's just how Stanley B. Lippman and his colleagues decided to implement a particular feature. Other compilers are free to do things a different way.

Pointer to data member address

I have read (Inside C++ object model) that address of pointer to data member in C++ is the offset of data member plus 1?
I am trying this on VC++ 2005 but i am not getting exact offset values.
For example:
Class X{
public:
int a;
int b;
int c;
}
void x(){
printf("Offsets of a=%d, b=%d, c=%d",&X::a,&X::b,&X::c);
}
Should print Offsets of a=1, b=5, c=9. But in VC++ 2005 it is coming out to be a=0,b=4,c=8.
I am not able to understand this behavior.
Excerpt from book:
"That expectation, however, is off by one—a somewhat traditional error
for both C and C++ programmers.
The physical offset of the three coordinate members within the class
layout are, respectively, either 0, 4, and 8 if the vptr is placed at
the end or 4, 8, and 12 if the vptr is placed at the start of the
class. The value returned from taking the member's address, however,
is always bumped up by 1. Thus the actual values are 1, 5, and 9, and
so on. The problem is distinguishing between a pointer to no data
member and a pointer to the first data member. Consider for example:
float Point3d::*p1 = 0;
float Point3d::*p2 = &Point3d::x;
// oops: how to distinguish?
if ( p1 == p2 ) {
cout << " p1 & p2 contain the same value — ";
cout << " they must address the same member!" << endl;
}
To distinguish between p1 and p2, each actual member offset value is
bumped up by 1. Hence, both the compiler (and the user) must remember
to subtract 1 before actually using the value to address a member."

The offset of something is how many units it is from the start. The first thing is at the start so its offset is zero.
Think in terms of your structure being at memory location 100:
100: class X { int a;
104: int b;
108: int c;
As you can see, the address of a is the same as the address of the entire structure, so its offset (what you have to add to the structure address to get the item address) is 0.
Note that the ISO standard doesn't specify where the items are laid out in memory. Padding bytes to create correct alignment are certainly possible. In a hypothetical environment where ints were only two bytes but their required alignment was 256 bytes, they wouldn't be at 0, 2 and 4 but rather at 0, 256 and 512.
And, if that book you're taking the excerpt from is really Inside the C++ Object Model, it's getting a little long in the tooth.
The fact that it's from '96 and discusses the internals underneath C++ (waxing lyrical about how good it is to know where the vptr is, missing the whole point that that's working at the wrong abstraction level and you should never care) dates it quite a bit. In fact, the introduction even states "Explains the basic implementation of the object-oriented features ..." (my italics).
And the fact that nobody can find anything in the ISO standard saying this behaviour is required, along the fact that neither MSVC not gcc act that way, leads me to believe that, even if this was true of one particular implementation far in the past, it's not true (or required to be true) of all.
The author apparently led the cfront 2.1 and 3 teams and, while this books seems of historical interest, I don't think it's relevant to the modern C++ language (and implementation), at least those bits I've read.

Firstly, the internal representation of values of a pointer to a data member type is an implementation detail. It can be done in many different ways. You came across a description of one possible implementation, where the pointer contains the offset of the member plus 1. It is rather obvious where that "plus 1" come from: that specific implementation wants to reserve the physical zero value (0x0) for null pointer, so the offset of the first data member (which could easily be 0) has to be transformed to something else to make it different from a null pointer. Adding 1 to all such pointers solves the problem.
However, it should be noted that this is a rather cumbersome approach (i.e. the compiler always has to subtract 1 from the physical value before performing access). That implementation was apparently trying very hard to make sure that all null-pointers are represented by a physical zero-bit pattern. To tell the truth, I haven't encountered implementations that follow this approach in practice these days.
Today, most popular implementations (like GCC or MSVC++) use just the plain offset (not adding anything to it) as the internal representation of the pointer to a data member. The physical zero will, of course, no longer work for representing null pointers, so they use some other physical value to represent null pointers, like 0xFFFF... (this is what GCC and MSVC++ use).
Secondly, I don't understand what you were trying to say with your p1 and p2 example. You are absolutely wrong to assume that the pointers will contain the same value. They won't.
If we follow the approach described in your post ("offset + 1"), then p1 will receive the physical value of null pointer (apparently a physical 0x0), while the p2 whill receive physical value of 0x1 (assuming x has offset 0). 0x0 and 0x1 are two different values.
If we follow the approach used by modern GCC and MSVC++ compilers, then p1 will receive the physical value of 0xFFFF.... (null pointer), while p2 will be assigned a physical 0x0. 0xFFFF... and 0x0 are again different values.
P.S. I just realized that the p1 and p2 example is actually not yours, but a quote from a book. Well, the book, once again, is describing the same problem I mentioned above - the conflict of 0 offset with 0x0 representation for null pointer, and offers one possible viable approach to solving that conflict. But, once again, there are alternative ways to do it, and many compilers today use completely different approaches.

The behavior you're getting looks quite reasonable to me. What sounds wrong is what you read.

To complement AndreyT's answer: Try running this code on your compiler.
void test()
{
using namespace std;
int X::* pm = NULL;
cout << "NULL pointer to member: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
pm = &X::a;
cout << "pointer to member a: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
pm = &X::b;
cout << "pointer to member b: "
<< " value = " << pm
<< ", raw byte value = 0x" << hex << *(unsigned int*)&pm << endl;
}
On Visual Studio 2008 I get:
NULL pointer to member: value = 0, raw byte value = 0xffffffff
pointer to member a: value = 1, raw byte value = 0x0
pointer to member b: value = 1, raw byte value = 0x4
So indeed, this particular compiler is using a special bit pattern to represent a NULL pointer and thus leaving an 0x0 bit pattern as representing a pointer to the first member of an object.
This also means that wherever the compiler generates code to translate such a pointer to an integer or a boolean, it must be taking care to look for that special bit pattern. Thus something like if(pm) or the conversion performed by the << stream operator is actually written by the compiler as a test against the 0xffffffff bit pattern (instead of how we typically like to think of pointer tests being a raw test against address 0x0).

I have read that address of pointer to
data member in C++ is the offset of
data member plus 1?
I have never heard that, and your own empirical evidence shows it's not the case. I think you misunderstood an odd property of structs & class in C++. If they are completely empty, they nevertheless have a size of 1 (so that each element of an array of them has a unique address)

$9.2/12 is interesting
Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).
This explains that such behavior is implementation defined. However the fact that 'a', 'b' and 'c' are at increasing addresses is in accordance with the Standard.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js