How long* cast works - c++

So I have this chunk of code
char buf[2];
buf[0] = 'a';
buf[1] = 'b';
std::cout << *((long *)((void*)buf) + 1) << std::endl;
When I saw that I said to myself:
We have memory address 1000 (for example) and that's the address of buf[1].
So I thought that *((long )((void)buf) + 1) will print out whatever is in addresses:
from 1000 until 1000 + sizeof(long)
But that's not the case. This code prints always -858993460
What is that number and why it isn't random?
I obviously lack the knowledge to understand what is going on so I would appreciate if you could give me a hint or something!

What is that number and why it isn't random?
It is a random value. Nothing in your program suggests that value should be printed.
It happens to be consistent so far as you've run your program. Maybe you haven't run it enough. Using uninitialized memory produces undefined behavior. Programs with UB might work as intended for years and then fail to compile.
By the way, your expression doesn't have the intended meaning. Try adding more spaces.
* ( (long *) ((void*)buf) + 1 )
First you cast buf to void *, then you cast it againt to long *, then you added one (to get the next long, not the next byte), and then fetched a long from memory. The bytes that got printed are entirely outside the char[2] array.

This code is reading past the end of a buffer and so is a security risk and completely undefined behaviour.
The contents of memory beyond buf could be anything.
Your compiler, architecture, and/or build settings may be such that currently that value is the same each time you run it, but that's just blind chance and could change at any point.
It will be different again on 64-bit systems where long is 64 bits wide. Alignment rules may also cause this code to fail outright.
Summary: even though this code is returning you the same result for each run right now, this is totally unsafe and undefined behaviour.
Avoid.

At a particular instant the value in a particular memory address will be constant unless other variables are created which take it's place. In your program if you output the memory address of buf it will be the same. Which means that you would be referring to the same address everytime the program is run and hence the same garbage value would be printed.

Related

why wild pointer holds zero address rather than garabge address?

I have been trying to find the size of an particular datatype like "int" without using sizeof() and found this :
#include<stdio.h>
int main() {
int *ptr; /*Declare a pointer*/
printf("Size of ptr = %d\n",ptr);
ptr++;
printf("Size of ptr = %d\n",ptr);
return 0;
}
This returns correct size for int. How?
Isn't wild pointer suppose to contain garbage address rather than zero. And if it contains zero how is it different than NULL pointer as NULL is (void*)0 ?
Since ptr is uninitialised, its value is indeterminate and accessing its value gives undefined behaviour. The meaning of "undefined", somewhat ironically, is defined by C and C++ standards to mean something like "this standard doesn't constrain what happens".
Beginners often incorrectly assume this means it must contain a "garbage value" or be a "wild pointer" or "add some colourful description here" but that is simply not the case.
The meaning of "value is indeterminate" or "the behaviour on accessing the value is undefined" is that any behaviour is permitted from code that accesses the value.
Accessing the value is necessary to print it, increment it, or (in case of a pointer) dereference it (access contents of the address identified by the pointer's value).
The behaviour of code that accesses the value is undefined. Giving a printed value of zero, 42, or a "garbage value" are all correct outcomes. Equally, however, the result could mean no output, or undesirable actions, such as reformatting a hard drive. The behaviour may even change over time if the code is executed repeatedly. Or it may be 100% repeatable (for a specific compiler, specific operating system, specific hardware, etc).
Practically, it is quite common for code with undefined behaviour to give no sign of malfunction during program testing, but to later cause some nasty and visible but unintended effect when the program is installed and executed on a customer's computer. That tends to result in grumpy customers, bug reports that the developers may be unable to replicate, and stress for developers in trying to fix the flaw.
Trying to explain why undefined behaviour results in some particular outcome (such as printing a value of zero) is therefore pointless.
the first print will have garbage or zero, depends on your compiler and previous value that was in the memory location.
If it was zero, then the second print will have the size of int, because incrementing a pointer increments with the size of the pointee.
for instance:
char *x = 0;
x++; //x=1
int *y = 0;
y++; //y=4
In your case, if you got a 0 on the first print, it was the same as if you initialized it to NULL, but you can't count it to always be zero.

Is there a difference in compiling/executing code in different Operational Systems?

I have just found a problem and I have no idea what it could be. I started learning programming a few weeks ago and I am learning about pointers.
I compiled exactly the same code in 2 different PC's. In the first, the program runs perfectly. In the second, it stops working when it reaches a certain line.
I use 2 PC's.
The one at my workplace runs Windows XP SP3. In this one, the program worked fine.
The one at my home runs Windows 7 SP1. It compiled the code, but the program did not work.
I am writing and compiling using DEV C++ and TDM GCC 5.1.0 in both systems.
#include<iostream>
using namespace std;
int main (void) {
int* pointer;
cout << "pointer == " << pointer << "\n";
cout << "*pointer == " << *pointer << "\n"; // this is the line where the program stops.
cout << "&pointer == " << &pointer << "\n";
return 0;}
The output in the first computer was something like:
pointer == 0x000001234
*pointer == some garbage value
&pointer == 0x000007865
In the second computer, it stops at second line.
pointer == 0x1
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to. I know the code is right because it worked fine in the first PC. But I do not understand why it failed in other computer.
I know the code is right because it worked fine in the first PC
You know no such thing.
You have undefined behaviour, and one entirely valid consequence is a program that always works. Or always works except on Saturdays, or always works until after you finished testing and shipped it to a paying customer, or always works on one machine and always fails on another.
The behaviour is undefined, not "defined to some specific consistent observable mode of failure".
Specifically, the real risk of undefined behaviour isn't simply that the result of some operation has an unspecified value, but that it may have undefined and unpredictable side-effects - on apparently-unrelated areas of your program, or on the system as a whole.
Even so, it should at least show the garbage value inside it
It did. But then you asked it to dereference that garbage value.
Reading any variable with an unspecified value is itself Undefined Behaviour, so the first piece of UB is reading the value of the pointer.
Following (dereferencing) a pointer which doesn't point to a valid object is also undefined behaviour, because you don't know whether the unspecified value you illegally interpreted as an address is correctly aligned for the type, or is mapped in your process' address space.
If you successfully load some integer from that address, that is a third piece of undefined behaviour, because again its value is unspecified.
So, the worst-case immediate pitfalls (with hardware trap values and restrictive alignment) are:
read the unspecified pointer value, get a trap representation, die with a hardware trap
OR read the unspecified pointer value, interpret it as an address which is misaligned, die with a bus error
OR follow the unspecified pointer to an unmapped address, die with a segment violation
OR survive all the previous steps - by pure chance - load some random value from some location in memory. Then die because that value is a trap representation.
But your if your process just dies, reproducibly, you can easily debug and fix it with no ill effects. In that sense, crashing at the point of invoking UB is actually the best possible outcome. The alternatives are worse, less predictable, and harder to debug.
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to.
It did! That was the 0x000001234.
Unfortunately you then tried to dereference this invalid pointer, and print the value of an int that does not exist. You cannot do that.
If you hadn't done that, we'd have made it to the third line, where the 0x000007865 would correctly represent the address of the pointer, which is an object with name pointer and type int* that does indeed exist.
I know the code is right because it worked fine in the first PC.
One of the things you'll have to get used to with C++ is that "it appears to work on one computer" is very far from proof that the code is correct. Read about undefined behaviour and weep slow tears.
But I do not understand why it failed in other computer.
Because the code isn't right, and you didn't get "lucky" this time.
We could analyse a few reasons why it appeared to work on one system and not the other, and there are reasons for that. But it's late, and you're just starting out, and since this is undefined behaviour it doesn't matter. :)

Why does using reinterpret_cast to convert from char* to a structure seem to work normally?

People say it's not good to trust reinterpret_cast to convert from raw data (like char*) to a structure. For example, for the structure
struct A
{
unsigned int a;
unsigned int b;
unsigned char c;
unsigned int d;
};
sizeof(A) = 16 and __alignof(A) = 4, exactly as expected.
Suppose I do this:
char *data = new char[sizeof(A) + 1];
A *ptr = reinterpret_cast<A*>(data + 1); // +1 is to ensure it doesn't points to 4-byte aligned data
Then copy some data to ptr:
memcpy_s(sh, sizeof(A),
"\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00", sizeof(A));
Then ptr->a is 1, ptr->b is 2, ptr->c is 3 and ptr->d is 4.
Okay, seems to work. Exactly what I was expecting.
But the data pointed by ptr is not 4-byte aligned like A should be. What problems this may cause in a x86 or x64 platform? Performance issues?
For one thing, your initialization string assumes that the underlying integers are stored in little endian format. But another architecture might use big endian, in which case your string will produce garbage. (Some huge numbers.) The correct string for that architecture would be
"\x00\x00\x00\x01\x00\x00\x00\x02\x03\x00\x00\x00\x00\x00\x00\x04".
Then, of course, there is the issue of alignment.
Certain architectures won't even allow you to assign the address of data + 1 to a non-character pointer, they will issue a memory alignment trap.
But even architectures which will allow this (like x86) will perform miserably, having to perform two memory accesses for each integer in the structure. (For more information, see this excellent answer:
https://stackoverflow.com/a/381368/773113)
Finally, I am not completely sure about this, but I think that C and C++ do not even guarantee to you that an array of characters will contain characters packed in bytes. (I hope someone who knows more might clarify this.) Conceivably, there can be architectures which are completely incapable of addressing non-word-aligned data, so in such architectures each character would have to occupy an entire word. This would mean that it would be valid to take the address of data + 1, because it would still be aligned, but your initialization string would be unsuitable for the intended job, as the first 4 characters in it would cover your entire structure, producing a=1, b=0, c=0 and d=0.
The problem is that you can not be sure if this code will run on another platform, with the next version of Visual Studio, etc. When running on another processor, it may cause a hardware exception.
There was a time when you could read out arbitrary memory locations, but all those programs crash with an "access violation" exception nowadays. Something similar could happen to this program in the future.
However, what you can do, and what any compiler that calls itself "C++ standard compliant" must compile correctly, is this:
You can reinterpret_cast a pointer to something else, and then back to the original type. The value of the type, when read before and after, must stay the same.
I don't know what exactly you want to do, but you might get away with, for example
allocating a struct A
reinterpret_casting it to chars
saving the memory content to a file
and restore everything later:
allocate a struct A
reinterpret_cast it to chars
load the content to memory
reinterpret_cast it back to a struct A

C++ Dereference the Non-allocated Memory but Without Segmentation Fault

I have encountered a problem which I don't understand, the following is my code:
#include <iostream>
#include <stdio.h>
#include <string.h>
#include <cstdlib>
using namespace std;
int main(int argc, char **argv)
{
char *format = "The sum of the two numbers is: %d";
char *presult;
int sum = 10;
presult = (char *)calloc(sizeof(format) + 20, 1); //allocate 24 bytes
sprintf(presult, format, sum); // after this operation,
// the length of presult is 33
cout << presult << endl;
presult[40] = 'g'; //still no segfault here...
delete(presult);
}
I compiled this code on different machines. On one machine the sizeof(format) is 4 bytes and on another, the sizeof(format) is 8 bytes; (On both machines, the char only takes one byte, which means sizeof(*format) equals 1)
However, no matter on which machine, the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed. But there is NO segmentation fault occurring after I run this program. As you can see, even if I tried to dereference the presult at position 40, the program doesn't crash and show any segfault information.
Could anyone help to explain why? Thank you so much.
Accessing unallocated memory is undefined behavior, meaning you might get a segfault (if you're lucky) or you might not.
Or your program is free to display kittens on the screen.
Speculating on why something happens or doesn't happen in undefined behavior land is usually counter-productive, but I'd imagine what's happening to you is that the OS is actually assigning your application a larger block of memory than it's asking for. Since your application isn't trying to dereference anything outside that larger block, the OS doesn't detect the problem, and therefore doesn't kill your program with a segmentation fault.
Because undefined behavior is undefined. It's not "defined to crash".
There is no seg fault because there is no reason for there to be one. You are very likely stil writing into the heap since you got memory from the heap, so the memory isn't read only. Also, the memory there is likely to exist and be allocated for you(or at least the program), so it's not an access violation. Normally you would get a seg fault because you might try to access memory that is not given to you or you may be trying to write to memory that is read only. Neither of these appears to be the case here, so nothing goes wrong.
In fact, writing past the end of a buffer is a common security problem, known as the buffer overflow. It was the most common security vulnerability for some time. Nowadays people are using higher level languages which check for out of index bounds, so this is not as big of a problem anymore.
To respond to this: "the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed."
sizeof(some_pointer) == sizeof(size_t) on any infrastructure. You were testing on a 32bit machine (4B) and on a 64bit machine (8B).
You have to give malloc the number of bytes to allocate; sizeof(ptr_to_char) will not give you the length of the string (the number of chars until '\0').
Btw, strlen does what you want: http://www.cplusplus.com/reference/cstring/strlen/

How internally this works int const iVal = 5; (int&)iVal = 10;

I wanted to know how the following works # compiler level.
int const iVal = 5;
(int&)iVal = 10;
A bit of m/c or compiler level answer would be great full.
Thanks in advance.
It is undefined behavior.
In the first line you define a constant integer. Henceforth, in your program, the compiler is permitted to just substitute iVal with the value 5. It may load it from memory instead, but probably won't, because that would bring no benefit.
The second line writes to the memory location that your compiler tells you contains the number 5. However, this is not guaranteed to have any effect, as you've already told the compiler that the value won't change.
For example, the following will define an array of 5 elements, and print an undefined value (or it can do anything it wants! it's undefined)
int const iVal = 5;
(int&)iVal = 10;
char arr[iVal];
cout << iVal;
The generated assembly might look something like:
sub ESP, 9 ; allocate mem for arr and iVal. hardcoded 5+sizeof(int) bytes
; (iVal isn't _required_ to have space allocated to it)
mov $iVal, 10 ; the compiler might do this, assuming that you know what
; you're doing. But then again, it might not.
push $cout
push 5
call $operator_ltlt__ostream_int
add ESP, 9
C-style cast acts as a const_cast. Like if you've written
const_cast<int&>( iVal ) = 10;
If you happen to do so and the compiler decides not to allocate actual memory for iVal, you run into undefined behaviour.
For example, VC7 compiles it allright. It even runs it allright in Debug mode. In Release mode iVal value doesn't change after the assignment – it remains 5.
So you should not do so ever.
Why not run it through your cown ompiler and look at the assember output?
This is possible because the idea of "const-ness" only exists in the language/compiler. In actual computer memory, everything is variable. Once the code has been compiled, your iVal variable is simply a location in RAM.
edit: the above assumes that the constant is actually placed in memory. See sharptooth's answer.
Using the c-style cast tells the compiler to treat this memory location as if it were a simple integer variable, and set that value to 10.
Undefined behavior.
In release builds, most compilers will substitute the const value directly - but you could run into one that does a memory load. Also, the second assignment might or might not generate an access violation, depending on platform and compiler. Iirc Intel's compiler puts const data in read-only memory, and would thus generate an access violation at runtime.
If this were an embedded system, it's likely that iVal would be stored in flash, so writing to it would have no effect.
However, it is possible that the compiler would not regard this an error, as embedded compilers generally don't keep track of whether a particular area of memory is readable or not.
I suspect that it might pass the linker too, as a linker will typically determine that iVal is a constant, so goes in flash - but it's not the linker's job to determine how iVal is used.
Incidentally, this question is tagged with "C", but the "(int&)" syntax is not (AFAIK) valid C. The same could, however, be achived with something like:
int *Ptr = (int *) &iVal;
*Ptr = 10;