Why does this line of code cause a computer to crash? - c++

Why does this line of code cause the computer to crash? What happens on memory-specific level?
for(int *p=0; ;*(p++)=0)
;
I have found the "answer" on Everything2, but I want a specific technical answer.

This code simply formally sets an integer pointer to null, then writes to the integer pointed by it a 0 and increments the pointer, looping forever.
The null pointer is not pointing to anything, so writing a 0 to it is undefined behavior (i.e. the standard doesn't say what should happen). Also you're not allowed to use pointer arithmetic outside arrays and so even just the increment is also undefined behavior.
Undefined behavior means that the compiler and library authors don't need to care at all about these cases and still the system is a valid C/C++ implementation. If a programmer does anything classified as undefined behavior then whatever happens happens and s/he cannot blame compiler and library authors. A programmer entering the undefined behavior realm cannot expect an error message or a crash, but cannot complain if getting one (even one million executed instructions later).
On systems where the null pointer is represented as zeros and there is no support for memory protection the effect or such a loop could be of starting wiping all the addressable memory, until some vital part of memory like an interrupt table is corrupted or until the code writes zeros on the code itself, self-destroying. On other systems with memory protection (most common desktop systems today) execution may instead simply stop at the very first write operation.

Undoubtedly, the cause of the problem is that p has not been assigned a reasonable address.
By not properly initializing a pointer before writing to where it points, it is probably going to do Bad Things™.
It could merely segfault, or it could overwrite something important, like the function's return address where a segfault wouldn't occur until the function attempts to return.
In the 1980s, a theoretician I worked with wrote a program for the 8086 to, once a second, write one word of random data at a randomly computed address. The computer was a process controller with watchdog protection and various types of output. The question was: How long would the system run before it ceased usefully functioning? The answer was hours and hours! This was a vivid demonstration that most of memory is rarely accessed.

It may cause an OS to crash, or it may do any number of other things. You are invoking undefined behavior. You don't own the memory at address 0 and you don't own the memory past it. You're just trouncing on memory that doesn't belong to you.

It works by overwriting all the memory at all the addresses, starting from 0 and going upwards. Eventually it will overwrite something important.
On any modern system, this will only crash your program, not the entire computer. CPUs designed since, oh, 1985 or so, have this feature called virtual memory which allows the OS to redirect your program's memory addresses. Most addresses aren't directed anywhere at all, which means trying to access them will just crash your program - and the ones that are directed somewhere will be directed to memory that is allocated to your program, so you can only crash your own program by messing with them.
On much older systems (older than 1985, remember!), there was no such protection and this loop could access memory addresses allocated to other programs and the OS.

The loop is not necessary to explain what's wrong. We can simply look at only the first iteration.
int *p = 0; // Declare a null pointer
*p = 0; // Write to null pointer, causing UB
The second line is causing undefined behavior, since that's what happens when you write to a null pointer.

Related

How can an operating system detect an out of range segmentation fault in C?

I encounter this problem when learning Operating System and I'm really interested in how operating system detecs whether an array index is out of range and therefore produce a segmentation fault?
int main(){
char* ptr0;
ptr0[0] = 1;
}
The code above will absolutely produce a segmentation fault, since ptr0 is not allocated with any memory.
But if add one line, things change.
int main(){
char* ptr0;
char* ptr1 = ptr0;
ptr0[0] = 1;
}
This code won't cause any fault, even you change ptr0[0] to ptr0[1000], it still won't cause any segmentation fault.
I don't know why the line has such power
char* ptr1 = ptr0
I tried to disassmbly these codes but find little information.
Could somebody explain that to me on the perspective of memory allocation? thanks a lot.
A segmentation fault happens when a process attempts to access memory it's not supposed to, not necessarily if an array is read out of bounds.
In your particular case the variable ptr0 is uninitialized, and so if you attempt to read it any value may be read and it need not even be consistent. So in the case of the first program the value that was read happened to be an invalid memory address and attempting to read from that address triggered a sigfault, while in the case of the second program the value read happened to be a valid address for the program and so a segfault was not generated.
When I ran these programs, both resulted in a segfault. This demonstrates the undefined behavior present in the program which attempts to dereference an invalid pointer.
When a program has undefined behavior, the C standard makes no guarantees regarding what the program will do. It may crash, it may output unexpected results, or it may appear to work properly.
The operating system and the hardware maintain a map of memory in the process’ virtual address space. Each time the process accesses memory, the hardware consults the map information and decides whether or not the access is allowed. If the access is not allowed, it generates a trap to alert the operating system.
This process will catch many incorrect accesses—it will catch all those that attempt to read or write memory that is not mapped or that attempt to write memory that is mapped read-only. It will not catch all incorrect accesses—it will not catch accesses that are to a wrong location (in terms of what the program’s author or user desires) but that are within mapped memory and the appropriate permissions.
Commonly, the operating system’s map information is more complete than the hardware’s. The operating system may map only a subset of the a process’ address space. This is because processes often do not use all of their address space (code and data to handle rare errors is often not executed or used) or do not use all of it all the time (processes spend some time doing one task before going on to another, and perhaps later returning to an earlier task). So, when the hardware reports a memory access fault to the operating system, the operating system will consult its complete information. If the process is allowed to access the attempted location, the operating system will set up physical memory as necessary, update the map for the hardware, and resume execution of the process. If the process is not allowed to access the attempted location, the operating system will report a signal to the process or terminate it or take other appropriate action.
The code above [char *ptr0; ptr0[0] = 1;] will absolutely produce a segmentation fault, since ptr0 is not allocated with any memory.
This is false, for several reasons:
Since ptr0 is not initialized, its value is indeterminate. When calculating the address for ptr0[0], the compiler will not necessarily use an address outside the process’ address space; it might use an address that is inside the address space and writable. In this case, storing 1 in that location will not generate a segmentation fault.
Due to a special rule in the C standard (C 2018 6.3.2.1 2), using the uninitialized object ptr0 in this situation results in the behavior of the program not being defined by the C standard. The compiler may transform this compiler to any other program.
Even if ptr0 were defined, the compiler may observe that the program has no defined observable behavior—it does not read any input, print anything, write anything to files, or change any volatile objects. So the compiler may optimize the program by changing it to an empty main function that does nothing.
But if add one line, things change.
If there is any change here, using char *ptr0; char *ptr1 = ptr0;, it is mere happenstance of the compiler. This program has the same semantics as the earlier program.
From your other comments, you might have intended to write char ptr0[0]; instead of char *ptr;. Zero-length arrays are not defined by the C standard. However, some compilers may allow them, as an extension to the C standard. In this case, what likely happens is that the compiler picks a location to put the array, likely on the stack. Then ptr0[0] = 1; attempts to store a byte at that location. Although the array has been assigned a location, zero bytes there are reserved for it. Instead, those bytes may be in use for something else. Possibly they are the function return address or possibly they are just filler used to help align the stack. In this case, ptr0[0] = 1; might overwrite necessary data for your program and break it. Or it might overwrite unused data and have no effect. Or, again, the behavior of your program is not defined by the C standard, so the compiler might transform it in other ways.

Difference between accessing non-existent array index and existing-but-empty index

Suppose I wrote
vector<int> example(5);
example[6];
What difference would it make with the following?
vector<int> example(6);
example[5];
In the first case I'm trying to access a non-existent, non-declared index. Could that result in malicious code execution? Would it be possible to put some sort of code in the portion on memory corresponding to example[5] and have it executed by a program written like the first above?
What about the second case? Would it still be possible to place code in the area of memory of example[5], even though it should be reserved to my program, even if I haven't written anything in it?
Could that result in malicious code execution?
No, this causes 'only' undefined behaviour.
Simple code execution exploits usually write past the end of a stack-allocated buffer, thereby overwriting a return adress. When the function returns, it jumps to the malicious code. A write is always required, because else there is no malicious code in your program's address space.
With a vector the chances that this happens are low, because the storage for the elements is not allocated on the stack.
By writing to a wrong location on the heap, exploits are possible too, but they are much more complicated.
The first case reaches beyond the vector's buffer and thus invokes Undefined Behaviour. Technically, this means literally anything can happen. But it's unlikely to be directly exploitable to run malicious code—either the program will try to read the invalid memory (getting a garbage value or a memory error), or the compiler has eliminated the code path altogether (because it's allowed to assume UB doesn't happen). Depending on what's done with the result, it might potentially reveal unintended data from memory, though.
In the second case, all is well. Your program has already written into this memory—it has value-initialised all the 6 int objects in the vector (which happens in std::vector's constructor). So you're guarnateed to find a 0 of type int there.

Why does strcpy "work" when writing to malloc'ed memory that is not large enough? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why does this intentionally incorrect use of strcpy not fail horribly?
Below see below code:
char* stuff = (char*)malloc(2);
strcpy(stuff,"abc");
cout<<"The size of stuff is : "<<strlen(stuff);
Even though I assigned 2 bytes to stuff, why does strcpy still work and the output of strlen is 3. Shouldn't this throw something like index out of bounds?
C and C++ don't do automatic bounds checking like Java and C# do. This code will overwrite stuff in memory past the end of the string, corrupting whatever was there. That can lead to strange behavior or crashes later, so it's good to be cautious about such things.
Accessing past the end of an array is deemed "undefined behavior" by the C and C++ standards. That means the standard doesn't specify what must happen when a program does that, so a program that triggers UB is in never-never-land where anything might happen. It might continue to work with no apparent problems. It might crash immediately. It might crash later when doing something else that shouldn't have been a problem. It might misbehave but not crash. Or velociraptors might come and eat you. Anything can happen.
Writing past the end of an array is called a buffer overflow, by the way, and it's a common cause of security flaws. If that "abc" string were actually user input, a skilled attacker could put bytes into it that end up overwriting something like the function's return pointer, which can be used to make the program run different code than it should, and do different things than it should.
you just over write heap memory, no crash usually, but bad things can happen later. C does not prevent you from shooting your own foot, no such thing as array out of bounds.
No, your char pointer now points to a character of length 3. Generally this would not cause any problems, but you might overwrite some critical memory region and cause the program to crash(you can expect to see a segmentation fault then). Specially when you are performing such operations over a large amount of memory
  here is the implementation of "strcpy"
char *strcpy(char *strDestination, const char *strSource)
  {
  assert(strDestination && strSource);
  char *strD=strDestination;
  while ((*strDestination++=*strSource++)!='\0')
  NULL;
  return strD;
  }
you should ensure the destination have enough space. However,what it is,it is.
strcpy does not check for sufficient space in strDestination before copying strSource,
ALso it does not perform bounds checking, and thus risks overrunning from or to. it is a potential cause of buffer overruns.

segfault with array

I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.
That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.
For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.
It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.
Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).
The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.
Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.

Pointers to statically allocated objects

I'm trying to understand how pointers to statically allocated objects work and where they can go wrong.
I wrote this code:
int* pinf = NULL;
for (int i = 0; i<1;i++) {
int inf = 4;
pinf = &inf;
}
cout<<"inf"<< (*pinf)<<endl;
I was surprised that it worked becasue I thought that inf would dissapear when the program left the block and the pointer would point to something that no longer exists. I expected a segmentation fault when trying to access pinf. At what stage in the program would inf die?
Your understanding is correct. inf disappears when you leave the scope of the loop, and so accessing *pinf yields undefined behavior. Undefined behavior means the compiler and/or program can do anything, which may be to crash, or in this case may be to simply chug along.
This is because inf is on the stack. Even when it is out of scope pinf still points to a useable memory location on the stack. As far as the runtime is concerned the stack address is fine, and the compiler doesn't bother to insert code to verify that you're not accessing locations beyond the end of the stack. That would be prohibitively expensive in a language designed for speed.
For this reason you must be very careful to avoid undefined behavior. C and C++ are not nice the way Java or C# are where illegal operations pretty much always generate an immediate exception and crash your program. You the programmer have to be vigilant because the compiler will miss all kinds of elementary mistakes you make.
You use so called Dangling pointer. It will result in undefined behavior by the C++ Standard.
It probably will never die because pinf will point to something on the stack.
Stacks don't often shrink.
Modify it and you'll pretty much be guaranteed an overwrite though.
If you are asking about this:
int main() {
int* pinf = NULL;
for (int i = 0; i<1;i++){
int inf = 4;
pinf = &inf;
}
cout<<"inf"<< (*pinf)<<endl;
}
Then what you have is undefined behaviour. The automatically allocated (not not static) object inf has gone out of scope and notionally been destroyed when you access it via the pointer. In this case, anything might happen, including it appearing to "work".
You won't necessarily get a SIGSEGV (segmentation fault). inf memory is probably allocated in the stack. And the stack memory region is probably still allocated to your process at that point, so, that's probably why you are not getting a seg fault.
The behaviour is undefined, but in practice, "destructing" an int is a noop, so most compilers will leave the number alone on the stack until something else comes along to reuse that particular slot.
Some compilers might set the int to 0xDEADBEEF (or some such garbage) when it goes out of scope in debug mode, but that won't make the cout << ... fail; it will simply print the nonsensical value.
The memory may or may not still contain a 4 when it gets to your cout line. It might contain a 4 strictly by accident. :)
First things first: your operating system can only detect memory access gone astray on page boundaries. So, if you're off by 4k or 8k or 16k or more. (Check /proc/self/maps on a Linux system some day to see the memory layout of a process; any addresses in the listed ranges are allowed, any outside the listed ranges aren't allowed. Every modern OS on protected-memory CPUs will support a similar mechanism, so it'll be instructive even if you're just not that interested in Linux. I just know it is easy on Linux.) So, the OS can't help you when your data is so small.
Also, your int inf = 4; might very well be stashed in the .rodata, .data or .text segments of your program. Static variables may be stuffed into any of these sections (I have no idea how the compiler/linker decides; I consider it magic) and they will therefore be valid throughout the entire duration of the program. Check size /bin/sh next time you are on a Unix system for an idea how much data gets put into which sections. (And check out readelf(1) for way too much information. objdump(1) if you're on older systems.)
If you change inf = 4 to inf = i, then the storage will be allocated on the stack, and you stand a much better chance of having it get overwritten quickly.
A protection fault occurs when the memory page you point to is not valid anymore for the process.
Luckily most OS's don't create a separate page for each integer's worth of stack space.