Query regarding copying expression value to pointer using memcpy - c++

I am looking to set the result of an expression(which is an int) into an int pointer. If I do the following:
int* a = new int;
memcpy(a, (int*)(3+4), sizeof(int));
I'm having trouble wrapping my head around it's expected behavior. Wll it copy the value 7 into a as expected. Or will it cause some undefined behavior

Nope, this won’t copy the value 7 in there. Instead, you get undefined behavior.
Think of writing memcpy(dest, source, size) as saying “please go to the memory address specified by source and copy size bytes from it. If you write (int *)(3 + 4), you get a pointer whose numeric value is seven, meaning that it points to memory address 7. Just as you wouldn’t expect house 137 on a block to be inhabited by people named Mr. and Mrs. 137, you shouldn’t expect the contents of memory address 7 to be the numeric value 7.
In fact, memory address 7 is unlikely to even be storage you can read or write from. You essentially “made up” an address with the typecast, and there’s no reason to believe there’s anything meaningful there. Imagine writing down a fake mailing address on a letter and dropping it in the mail. The odds that it actually reaches anyone is slim, and if it does reach them they’ll have no idea why you’re contacting them.
From a language perspective, this is undefined behavior, and on most systems it’ll segfault.

Related

why wild pointer holds zero address rather than garabge address?

I have been trying to find the size of an particular datatype like "int" without using sizeof() and found this :
#include<stdio.h>
int main() {
int *ptr; /*Declare a pointer*/
printf("Size of ptr = %d\n",ptr);
ptr++;
printf("Size of ptr = %d\n",ptr);
return 0;
}
This returns correct size for int. How?
Isn't wild pointer suppose to contain garbage address rather than zero. And if it contains zero how is it different than NULL pointer as NULL is (void*)0 ?
Since ptr is uninitialised, its value is indeterminate and accessing its value gives undefined behaviour. The meaning of "undefined", somewhat ironically, is defined by C and C++ standards to mean something like "this standard doesn't constrain what happens".
Beginners often incorrectly assume this means it must contain a "garbage value" or be a "wild pointer" or "add some colourful description here" but that is simply not the case.
The meaning of "value is indeterminate" or "the behaviour on accessing the value is undefined" is that any behaviour is permitted from code that accesses the value.
Accessing the value is necessary to print it, increment it, or (in case of a pointer) dereference it (access contents of the address identified by the pointer's value).
The behaviour of code that accesses the value is undefined. Giving a printed value of zero, 42, or a "garbage value" are all correct outcomes. Equally, however, the result could mean no output, or undesirable actions, such as reformatting a hard drive. The behaviour may even change over time if the code is executed repeatedly. Or it may be 100% repeatable (for a specific compiler, specific operating system, specific hardware, etc).
Practically, it is quite common for code with undefined behaviour to give no sign of malfunction during program testing, but to later cause some nasty and visible but unintended effect when the program is installed and executed on a customer's computer. That tends to result in grumpy customers, bug reports that the developers may be unable to replicate, and stress for developers in trying to fix the flaw.
Trying to explain why undefined behaviour results in some particular outcome (such as printing a value of zero) is therefore pointless.
the first print will have garbage or zero, depends on your compiler and previous value that was in the memory location.
If it was zero, then the second print will have the size of int, because incrementing a pointer increments with the size of the pointee.
for instance:
char *x = 0;
x++; //x=1
int *y = 0;
y++; //y=4
In your case, if you got a 0 on the first print, it was the same as if you initialized it to NULL, but you can't count it to always be zero.

Is there a difference in compiling/executing code in different Operational Systems?

I have just found a problem and I have no idea what it could be. I started learning programming a few weeks ago and I am learning about pointers.
I compiled exactly the same code in 2 different PC's. In the first, the program runs perfectly. In the second, it stops working when it reaches a certain line.
I use 2 PC's.
The one at my workplace runs Windows XP SP3. In this one, the program worked fine.
The one at my home runs Windows 7 SP1. It compiled the code, but the program did not work.
I am writing and compiling using DEV C++ and TDM GCC 5.1.0 in both systems.
#include<iostream>
using namespace std;
int main (void) {
int* pointer;
cout << "pointer == " << pointer << "\n";
cout << "*pointer == " << *pointer << "\n"; // this is the line where the program stops.
cout << "&pointer == " << &pointer << "\n";
return 0;}
The output in the first computer was something like:
pointer == 0x000001234
*pointer == some garbage value
&pointer == 0x000007865
In the second computer, it stops at second line.
pointer == 0x1
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to. I know the code is right because it worked fine in the first PC. But I do not understand why it failed in other computer.
I know the code is right because it worked fine in the first PC
You know no such thing.
You have undefined behaviour, and one entirely valid consequence is a program that always works. Or always works except on Saturdays, or always works until after you finished testing and shipped it to a paying customer, or always works on one machine and always fails on another.
The behaviour is undefined, not "defined to some specific consistent observable mode of failure".
Specifically, the real risk of undefined behaviour isn't simply that the result of some operation has an unspecified value, but that it may have undefined and unpredictable side-effects - on apparently-unrelated areas of your program, or on the system as a whole.
Even so, it should at least show the garbage value inside it
It did. But then you asked it to dereference that garbage value.
Reading any variable with an unspecified value is itself Undefined Behaviour, so the first piece of UB is reading the value of the pointer.
Following (dereferencing) a pointer which doesn't point to a valid object is also undefined behaviour, because you don't know whether the unspecified value you illegally interpreted as an address is correctly aligned for the type, or is mapped in your process' address space.
If you successfully load some integer from that address, that is a third piece of undefined behaviour, because again its value is unspecified.
So, the worst-case immediate pitfalls (with hardware trap values and restrictive alignment) are:
read the unspecified pointer value, get a trap representation, die with a hardware trap
OR read the unspecified pointer value, interpret it as an address which is misaligned, die with a bus error
OR follow the unspecified pointer to an unmapped address, die with a segment violation
OR survive all the previous steps - by pure chance - load some random value from some location in memory. Then die because that value is a trap representation.
But your if your process just dies, reproducibly, you can easily debug and fix it with no ill effects. In that sense, crashing at the point of invoking UB is actually the best possible outcome. The alternatives are worse, less predictable, and harder to debug.
I do understand that the pointer have not been assigned to a variable. Therefore, it does not store any correct address. Even so, it should at least show the garbage value inside it, or a "0" to indicate it has not yet an address to point to.
It did! That was the 0x000001234.
Unfortunately you then tried to dereference this invalid pointer, and print the value of an int that does not exist. You cannot do that.
If you hadn't done that, we'd have made it to the third line, where the 0x000007865 would correctly represent the address of the pointer, which is an object with name pointer and type int* that does indeed exist.
I know the code is right because it worked fine in the first PC.
One of the things you'll have to get used to with C++ is that "it appears to work on one computer" is very far from proof that the code is correct. Read about undefined behaviour and weep slow tears.
But I do not understand why it failed in other computer.
Because the code isn't right, and you didn't get "lucky" this time.
We could analyse a few reasons why it appeared to work on one system and not the other, and there are reasons for that. But it's late, and you're just starting out, and since this is undefined behaviour it doesn't matter. :)

Why does an unassigned int have a value?

If i run the following code it shows a long number.
int i;
int *p;
p= &i;
cout<<*p;
Why does an unassigned int have a value? And what is that value?
The value of the pointer p is the address of the int i. You assigned it with the address-of & operator: p = &i. The int i itself is not initialized also called default initialized. When you dereference your pointer with *p you get the value of your uninitialized int i which could be anything.
The value of your int i is the uninitialized memory interpreted as int. Using uninitialized variables is undefined behaviour.
Also you would have the same behaviour without a pointer by simply doing:
int i;
cout << i;
Because this is what "undefined behavior" means in C++.
"Undefined behavior" means "anything is possible". This includes:
You getting some random value for the object. It can be always the same, or different every time you run the code.
The program crashes.
Your computer starts playing the latest Justin Bieber video, all by itself, with no way to stop it.
The universe, as you know it, comes to an end.
etc... That's what "undefined behavior" means.
Imagine you want to buy a land, where you intend to build a house. To buy a land, you contact the local land seller.
You need to tell him how much units of land you need. In return, he will tell you the location of the land.
Done - your land is ready for use. But did you notice something ? The land seller only told you the coordinates of the land. He didn't say anything about the land. On the land there could be already existing house. There could even be a Hotel, or an Airport. Who knows what is there? If you try to use land, without building your house first, you have no guarantee what will be there. It is your responsibility as a land owner, to build something on top of the land, and use it as appropriate.
C/C++ is the same as the above example. Asking for a int, is like asking for a land with size of 8 units. C/C++ will give you the land, telling you its coordinates. It won't tell you what the land contains. You're responsible for using the land to put a house on top of it. If you don't put a house, and try to enter "the house", you might end up in a Airport. Hope it's clearer now :).
Simply because the memory location where i is has some value (whatever value it is). As Sam pointed out, it is a good example of undefined (and unwanted) behavior.
Because variables can't be empty.
Every byte of computer memory always contains something.
Computer programs usually don't clean up memory when they done with it (for speed reasons), thus when you leave a variable uninitialized, it will have some random (more or less) value which was left in this place of memory by another program or by our own code.
Usually it is 0 or a value of some other variable that was recently destructed or some internal pointer.
The current contents of the memory location ( in the stack ) of the variable i.

Could I ever want to access the address zero?

The constant 0 is used as the null pointer in C and C++. But as in the question "Pointer to a specific fixed address" there seems to be some possible use of assigning fixed addresses. Is there ever any conceivable need, in any system, for whatever low level task, for accessing the address 0?
If there is, how is that solved with 0 being the null pointer and all?
If not, what makes it certain that there is not such a need?
Neither in C nor in C++ null-pointer value is in any way tied to physical address 0. The fact that you use constant 0 in the source code to set a pointer to null-pointer value is nothing more than just a piece of syntactic sugar. The compiler is required to translate it into the actual physical address used as null-pointer value on the specific platform.
In other words, 0 in the source code has no physical importance whatsoever. It could have been 42 or 13, for example. I.e. the language authors, if they so pleased, could have made it so that you'd have to do p = 42 in order to set the pointer p to null-pointer value. Again, this does not mean that the physical address 42 would have to be reserved for null pointers. The compiler would be required to translate source code p = 42 into machine code that would stuff the actual physical null-pointer value (0x0000 or 0xBAAD) into the pointer p. That's exactly how it is now with constant 0.
Also note, that neither C nor C++ provides a strictly defined feature that would allow you to assign a specific physical address to a pointer. So your question about "how one would assign 0 address to a pointer" formally has no answer. You simply can't assign a specific address to a pointer in C/C++. However, in the realm of implementation-defined features, the explicit integer-to-pointer conversion is intended to have that effect. So, you'd do it as follows
uintptr_t address = 0;
void *p = (void *) address;
Note, that this is not the same as doing
void *p = 0;
The latter always produces the null-pointer value, while the former in general case does not. The former will normally produce a pointer to physical address 0, which might or might not be the null-pointer value on the given platform.
On a tangential note: you might be interested to know that with Microsoft's C++ compiler, a NULL pointer to member will be represented as the bit pattern 0xFFFFFFFF on a 32-bit machine. That is:
struct foo
{
int field;
};
int foo::*pmember = 0; // 'null' member pointer
pmember will have the bit pattern 'all ones'. This is because you need this value to distinguish it from
int foo::*pmember = &foo::field;
where the bit pattern will indeed by 'all zeroes' -- since we want offset 0 into the structure foo.
Other C++ compilers may choose a different bit pattern for a null pointer to member, but the key observation is that it won't be the all-zeroes bit pattern you might have been expecting.
You're starting from a mistaken premise. When you assign an integer constant with the value 0 to a pointer, that becomes a null pointer constant. This does not, however, mean that a null pointer necessarily refers to address 0. Quite the contrary, the C and C++ standards are both very clear that a null pointer may refer to some address other than zero.
What it comes down to is this: you do have to set aside an address that a null pointer would refer to -- but it can be essentially any address you choose. When you convert zero to a pointer, it has to refer to that chosen address -- but that's all that's really required. Just for example, if you decided that converting an integer to a point would mean adding 0x8000 to the integer, then the null pointer to would actually refer to address 0x8000 instead of address 0.
It's also worth noting that dereferencing a null pointer results in undefined behavior. That means you can't do it in portable code, but it does not mean you can't do it at all. When you're writing code for small microcontrollers and such, it's fairly common to include some bits and pieces of code that aren't portable at all. Reading from one address may give you the value from some sensor, while writing to the same address could activate a stepper motor (just for example). The next device (even using exactly the same processor) might be connected up so both of those addresses referred to normal RAM instead.
Even if a null pointer does refer to address 0, that doesn't prevent you from using it to read and/or write whatever happens to be at that address -- it just prevents you from doing so portably -- but that doesn't really matter a whole lot. The only reason address zero would normally be important would be if it was decoded to connect to something other than normal storage, so you probably can't use it entirely portably anyway.
The compiler takes care of this for you (comp.lang.c FAQ):
If a machine uses a nonzero bit pattern for null pointers, it is the compiler's responsibility to generate it when the programmer requests, by writing "0" or "NULL," a null pointer. Therefore, #defining NULL as 0 on a machine for which internal null pointers are nonzero is as valid as on any other, because the compiler must (and can) still generate the machine's correct null pointers in response to unadorned 0's seen in pointer contexts.
You can get to address zero by referencing zero from a non-pointer context.
In practice, C compilers will happily let your program attempt to write to address 0. Checking every pointer operation at run time for a NULL pointer would be a tad expensive. On computers, the program will crash because the operating system forbids it. On embedded systems without memory protection, the program will indeed write to address 0 which will often crash the whole system.
The address 0 might be useful on an embedded systems (a general term for a CPU that's not in a computer; they run everything from your stereo to your digital camera). Usually, the systems are designed so that you wouldn't need to write to address 0. In every case I know of, it's some kind of special address. Even if the programmer needs to write to it (e.g., to set up an interrupt table), they would only need to write to it during the initial boot sequence (usually a short bit of assembly language to set up the environment for C).
Memory address 0 is also called the Zero Page. This is populated by the BIOS, and contains information about the hardware running on your system. All modern kernels protect this region of memory. You should never need to access this memory, but if you want to you need to do it from within kernel land, a kernel module will do the trick.
On the x86, address 0 (or rather, 0000:0000) and its vicinity in real mode is the location of the interrupt vector. In the bad old days, you would typically write values to the interrupt vector to install interrupt handers (or if you were more disciplined, used the MS-DOS service 0x25). C compilers for MS-DOS defined a far pointer type which when assigned NULL or 0 would recieve the bit pattern 0000 in its segment part and 0000 in its offset part.
Of course, a misbehaving program that accidentally wrote to a far pointer whose value was 0000:0000 would cause very bad things to happen on the machine, typically locking it up and forcing a reboot.
In the question from the link, people are discussing setting to fixed addresses in a microcontroller. When you program a microcontroller everything is at a much lower level there.
You even don't have an OS in terms of desktop/server PC, and you don't have virtual memory and that stuff. So there is it OK and even necessary to access memory at a specific address. On a modern desktop/server PC it is useless and even dangerous.
I compiled some code using gcc for the Motorola HC11, which has no MMU and 0 is a perfectly good address, and was disappointed to find out that to write to address 0, you just write to it. There's no difference between NULL and address 0.
And I can see why. I mean, it's not really possible to define a unique NULL on an architecture where every memory location is potentially valid, so I guess the gcc authors just said 0 was good enough for NULL whether it's a valid address or not.
char *null = 0;
; Clears 8-bit AR and BR and stores it as a 16-bit pointer on the stack.
; The stack pointer, ironically, is stored at address 0.
1b: 4f clra
1c: 5f clrb
1d: de 00 ldx *0 <main>
1f: ed 05 std 5,x
When I compare it with another pointer, the compiler generates a regular comparison. Meaning that it in no way considers char *null = 0 to be a special NULL pointer, and in fact a pointer to address 0 and a "NULL" pointer will be equal.
; addr is a pointer stored at 7,x (offset of 7 from the address in XR) and
; the "NULL" pointer is at 5,y (offset of 5 from the address in YR). It doesn't
; treat the so-called NULL pointer as a special pointer, which is not standards
; compliant as far as I know.
37: de 00 ldx *0 <main>
39: ec 07 ldd 7,x
3b: 18 de 00 ldy *0 <main>
3e: cd a3 05 cpd 5,y
41: 26 10 bne 53 <.LM7>
So to address the original question, I guess my answer is to check your compiler implementation and find out whether they even bothered to implement a unique-value NULL. If not, you don't have to worry about it. ;)
(Of course this answer is not standard compliant.)
It all depends on whether the machine has virtual memory. Systems with it will typically put an unwritable page there, which is probably the behaviour that you are used to. However in systems without it (typically microcontrollers these days, but they used to be far more common) then there's often very interesting things in that area such as an interrupt table. I remember hacking around with those things back in the days of 8-bit systems; fun, and not too big a pain when you had to hard-reset the system and start over. :-)
Yes, you might want to access memory address 0x0h. Why you would want to do this is platform-dependent. A processor might use this for a reset vector, such that writing to it causes the CPU to reset. It could also be used for an interrupt vector, as a memory-mapped interface to some hardware resource (program counter, system clock, etc), or it could even be valid as a plain old memory address. There is nothing necessarily magical about memory address zero, it is just one that was historically used for special purposes (reset vectors and the like). C-like languages follow this tradition by using zero as the address for a NULL pointer, but in reality the underlying hardware may or may not see address zero as special.
The need to access address zero usually arises only in low-level details like bootloaders or drivers. In these cases, the compiler can provide options/pragmas to compile a section of code without optimizations (to prevent the zero pointer from being extracted away as a NULL pointer) or inline assembly can be used to access the true address zero.
C/C++ don't allows you to write to any address. It is the OS that can raise a signal when a user access some forbidden address. C and C++ ensure you that any memory obtained from the heap, will be different of 0.
I have at times used loads from address zero (on a known platform where that would be guaranteed to segfault) to deliberately crash at an informatively named symbol in library code if the user violates some necessary condition and there isn't any good way to throw an exception available to me. "Segfault at someFunction$xWasnt16ByteAligned" is a pretty effective error message to alert someone to what they did wrong and how to fix it. That said, I wouldn't recommend making a habit of that sort of thing.
Writing to address zero can be done, but it depends upon several factors such as your OS, target architecture and MMU configuration. In fact, it can be a useful debugging tool (but not always).
For example, a few years ago while working on an embedded system (with few debugging tools available), we had a problem which was resulting in a warm reboot. To help locate the problem, we were debugging using sprintf(NULL, ...); and a 9600 baud serial cable. As I said--few debugging tools available. With our setup, we knew that a warm reboot would not corrupt the first 256 bytes of memory. Thus after the warm reboot we could pause the loader and dump the memory contents to find out what happened prior to reboot.
Remember that in all normal cases, you don't actually see specific addresses.
When you allocate memory, the OS supplies you with the address of that chunk of memory.
When you take the reference of a variable, the the variable has already been allocated at an address determined by the system.
So accessing address zero is not really a problem, because when you follow a pointer, you don't care what address it points to, only that it is valid:
int* i = new int(); // suppose this returns a pointer to address zero
*i = 42; // now we're accessing address zero, writing the value 42 to it
So if you need to access address zero, it'll generally work just fine.
The 0 == null thing only really becomes an issue if for some reason you're accessing physical memory directly. Perhaps you're writing an OS kernel or something like that yourself. In that case, you're going to be writing to specific memory addresses (especially those mapped to hardware registers), and so you might conceivably need to write to address zero. But then you're really bypassing C++ and relying on the specifics of your compiler and hardware platform.
Of course, if you need to write to address zero, that is possible. Only the constant 0 represents a null pointer. The non-constant integer value zero will not, if assigned to a pointer, yield a null pointer.
So you could simply do something like this:
int i = 0;
int* zeroaddr = (int*)i;
now zeroaddr will point to address zero(*), but it will not, strictly speaking, be a null pointer, because the zero value was not constant.
(*): that's not entirely true. The C++ standard only guarantees an "implementation-defined mapping" between integers and addresses. It could convert the 0 to address 0x1633de20` or any other address it likes. But the mapping is usually the intuitive and obvious one, where the integer 0 is mapped to the address zero)
It may surprise many people, but in the core C language there is no such thing as a special null pointer. You are totally free to read and write to address 0 if it's physically possible.
The code below does not even compile, as NULL is not defined:
int main(int argc, char *argv[])
{
void *p = NULL;
return 0;
}
OTOH, the code below compiles, and you can read and write address 0, if the hardware/OS allows:
int main(int argc, char *argv[])
{
int *p = 0;
*p = 42;
int x = *p; /* let's assume C99 */
}
Please note, I did not include anything in the above examples.
If we start including stuff from the standard C library, NULL becomes magically defined. As far as I remember it comes from string.h.
NULL is still not a core C feature, it's a CONVENTION of many C library functions to indicate the invalidity of pointers. The C library on the given platform will define NULL to a memory location which is not accessible anyway. Let's try it on a Linux PC:
#include <stdio.h>
int main(int argc, char *argv[])
{
int *p = NULL;
printf("NULL is address %p\n", p);
printf("Contents of address NULL is %d\n", *p);
return 0;
}
The result is:
NULL is address 0x0
Segmentation fault (core dumped)
So our C library defines NULL to address zero, which it turns out is inaccessible.
But it was not the C compiler, of not even the C-library function printf() that handled the zero address specially. They all happily tried to work with it normally. It was the OS that detected a segmentation fault, when printf tried to read from address zero.
If I remember correctly, in an AVR microcontroller the register file is mapped into an address space of RAM and register R0 is at the address 0x00. It was clearly done in purpose and apparently Atmel thinks there are situations, when it's convenient to access address 0x00 instead of writing R0 explicitly.
In the program memory, at the address 0x0000 there is a reset interrupt vector and again this address is clearly intended to be accessed when programming the chip.

Allocate room for null terminating character when copying strings in C?

const char* src = "hello";
Calling strlen(src); returns size 5...
Now say I do this:
char* dest = new char[strlen(src)];
strcpy(dest, src);
That doesn't seem like it should work, but when I output everything it looks right. It seems like I'm not allocating space for the null terminator on the end... is this right? Thanks
You are correct that you are not allocating space for the terminator, however the failure to do this will not necessarily cause your program to fail. You may be overwriting following information on the heap, or your heap manager will be rounding up allocation size to a multiple of 16 bytes or something, so you won't necessarily see any visible effect of this bug.
If you run your program under Valgrind or other heap debugger, you may be able to detect this problem sooner.
Yes, you should allocate at least strlen(src)+1 characters.
That doesn't seem like it should work, but when I output everything it looks right.
Welcome to the world of Undefined Behavior. When you do this, anything can happen. Your program can crash, your computer can crash, your computer can explode, demons can fly out of your nose.
And worst of all, your program could run just fine, inconspicuously looking like it's working correctly until one day it starts spitting out garbage because it's overwriting sensitive data somewhere due to the fact that somewhere, someone allocated one too few characters for their arrays, and now you've corrupted the heap and you get a segfault at some point a million miles away, or even worse your program happily chugs along with a corrupted heap and your functions are operating on corrupted credit card numbers and you get in huge trouble.
Even if it looks like it works, it doesn't. That's Undefined Behavior. Avoid it, because you can never be sure what it will do, and even when what it does when you try it is okay, it may not be okay on another platform.
The best description I have read (was on stackoverflow) and went like this:
If the speed limit is 50 and you drive at 60. You may get lucky and not get a ticket but one day maybe not today maybe not tomorrow but one day that cop will be waiting for you. On that day you will pay and you will pay dearly.
If somebody can find the original I would much rather point at that they were much more eloquent than my explanation.
strcpy will copy the null terminated char as well as all of the other chars.
So you are copying the length of hello + 1 which is 6 into a buffer size which is 5.
You have a buffer overflow here though, and overwiting memory that is not your own will have undefined results.
Alternatively, you could also use dest = strdup(src) which will allocate enough memory for the string + 1 for the null terminator (+1 for Juliano's answer).
This is why you should always, always, always run valgrind on any C program that appears to work.
Yeah, everyone has covered the major point; you are not guaranteed to fail. The fact is that the null terminator is usually 0 and 0 is a pretty common value to be sitting in any particular memory address. So it just happens to work. You could test this by taking a set of memory, writing a bunch of garbage to it and then writing that string there and trying to work with it.
Anyway, the major issue I see here is that you are talking about C but you have this line of code:
char* dest = new char[strlen(src)];
This won't compile in any standard C compiler. There's no new keyword in C. That is C++. In C, you would use one of the memory allocation functions, usually malloc. I know it seems nitpicy, but really, it's not.