segfault with array - c++

I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.

That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.

For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.

It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.

Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).

The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.

Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.

Related

How can an operating system detect an out of range segmentation fault in C?

I encounter this problem when learning Operating System and I'm really interested in how operating system detecs whether an array index is out of range and therefore produce a segmentation fault?
int main(){
char* ptr0;
ptr0[0] = 1;
}
The code above will absolutely produce a segmentation fault, since ptr0 is not allocated with any memory.
But if add one line, things change.
int main(){
char* ptr0;
char* ptr1 = ptr0;
ptr0[0] = 1;
}
This code won't cause any fault, even you change ptr0[0] to ptr0[1000], it still won't cause any segmentation fault.
I don't know why the line has such power
char* ptr1 = ptr0
I tried to disassmbly these codes but find little information.
Could somebody explain that to me on the perspective of memory allocation? thanks a lot.
A segmentation fault happens when a process attempts to access memory it's not supposed to, not necessarily if an array is read out of bounds.
In your particular case the variable ptr0 is uninitialized, and so if you attempt to read it any value may be read and it need not even be consistent. So in the case of the first program the value that was read happened to be an invalid memory address and attempting to read from that address triggered a sigfault, while in the case of the second program the value read happened to be a valid address for the program and so a segfault was not generated.
When I ran these programs, both resulted in a segfault. This demonstrates the undefined behavior present in the program which attempts to dereference an invalid pointer.
When a program has undefined behavior, the C standard makes no guarantees regarding what the program will do. It may crash, it may output unexpected results, or it may appear to work properly.
The operating system and the hardware maintain a map of memory in the process’ virtual address space. Each time the process accesses memory, the hardware consults the map information and decides whether or not the access is allowed. If the access is not allowed, it generates a trap to alert the operating system.
This process will catch many incorrect accesses—it will catch all those that attempt to read or write memory that is not mapped or that attempt to write memory that is mapped read-only. It will not catch all incorrect accesses—it will not catch accesses that are to a wrong location (in terms of what the program’s author or user desires) but that are within mapped memory and the appropriate permissions.
Commonly, the operating system’s map information is more complete than the hardware’s. The operating system may map only a subset of the a process’ address space. This is because processes often do not use all of their address space (code and data to handle rare errors is often not executed or used) or do not use all of it all the time (processes spend some time doing one task before going on to another, and perhaps later returning to an earlier task). So, when the hardware reports a memory access fault to the operating system, the operating system will consult its complete information. If the process is allowed to access the attempted location, the operating system will set up physical memory as necessary, update the map for the hardware, and resume execution of the process. If the process is not allowed to access the attempted location, the operating system will report a signal to the process or terminate it or take other appropriate action.
The code above [char *ptr0; ptr0[0] = 1;] will absolutely produce a segmentation fault, since ptr0 is not allocated with any memory.
This is false, for several reasons:
Since ptr0 is not initialized, its value is indeterminate. When calculating the address for ptr0[0], the compiler will not necessarily use an address outside the process’ address space; it might use an address that is inside the address space and writable. In this case, storing 1 in that location will not generate a segmentation fault.
Due to a special rule in the C standard (C 2018 6.3.2.1 2), using the uninitialized object ptr0 in this situation results in the behavior of the program not being defined by the C standard. The compiler may transform this compiler to any other program.
Even if ptr0 were defined, the compiler may observe that the program has no defined observable behavior—it does not read any input, print anything, write anything to files, or change any volatile objects. So the compiler may optimize the program by changing it to an empty main function that does nothing.
But if add one line, things change.
If there is any change here, using char *ptr0; char *ptr1 = ptr0;, it is mere happenstance of the compiler. This program has the same semantics as the earlier program.
From your other comments, you might have intended to write char ptr0[0]; instead of char *ptr;. Zero-length arrays are not defined by the C standard. However, some compilers may allow them, as an extension to the C standard. In this case, what likely happens is that the compiler picks a location to put the array, likely on the stack. Then ptr0[0] = 1; attempts to store a byte at that location. Although the array has been assigned a location, zero bytes there are reserved for it. Instead, those bytes may be in use for something else. Possibly they are the function return address or possibly they are just filler used to help align the stack. In this case, ptr0[0] = 1; might overwrite necessary data for your program and break it. Or it might overwrite unused data and have no effect. Or, again, the behavior of your program is not defined by the C standard, so the compiler might transform it in other ways.

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

Why does this line of code cause a computer to crash?

Why does this line of code cause the computer to crash? What happens on memory-specific level?
for(int *p=0; ;*(p++)=0)
;
I have found the "answer" on Everything2, but I want a specific technical answer.
This code simply formally sets an integer pointer to null, then writes to the integer pointed by it a 0 and increments the pointer, looping forever.
The null pointer is not pointing to anything, so writing a 0 to it is undefined behavior (i.e. the standard doesn't say what should happen). Also you're not allowed to use pointer arithmetic outside arrays and so even just the increment is also undefined behavior.
Undefined behavior means that the compiler and library authors don't need to care at all about these cases and still the system is a valid C/C++ implementation. If a programmer does anything classified as undefined behavior then whatever happens happens and s/he cannot blame compiler and library authors. A programmer entering the undefined behavior realm cannot expect an error message or a crash, but cannot complain if getting one (even one million executed instructions later).
On systems where the null pointer is represented as zeros and there is no support for memory protection the effect or such a loop could be of starting wiping all the addressable memory, until some vital part of memory like an interrupt table is corrupted or until the code writes zeros on the code itself, self-destroying. On other systems with memory protection (most common desktop systems today) execution may instead simply stop at the very first write operation.
Undoubtedly, the cause of the problem is that p has not been assigned a reasonable address.
By not properly initializing a pointer before writing to where it points, it is probably going to do Bad Things™.
It could merely segfault, or it could overwrite something important, like the function's return address where a segfault wouldn't occur until the function attempts to return.
In the 1980s, a theoretician I worked with wrote a program for the 8086 to, once a second, write one word of random data at a randomly computed address. The computer was a process controller with watchdog protection and various types of output. The question was: How long would the system run before it ceased usefully functioning? The answer was hours and hours! This was a vivid demonstration that most of memory is rarely accessed.
It may cause an OS to crash, or it may do any number of other things. You are invoking undefined behavior. You don't own the memory at address 0 and you don't own the memory past it. You're just trouncing on memory that doesn't belong to you.
It works by overwriting all the memory at all the addresses, starting from 0 and going upwards. Eventually it will overwrite something important.
On any modern system, this will only crash your program, not the entire computer. CPUs designed since, oh, 1985 or so, have this feature called virtual memory which allows the OS to redirect your program's memory addresses. Most addresses aren't directed anywhere at all, which means trying to access them will just crash your program - and the ones that are directed somewhere will be directed to memory that is allocated to your program, so you can only crash your own program by messing with them.
On much older systems (older than 1985, remember!), there was no such protection and this loop could access memory addresses allocated to other programs and the OS.
The loop is not necessary to explain what's wrong. We can simply look at only the first iteration.
int *p = 0; // Declare a null pointer
*p = 0; // Write to null pointer, causing UB
The second line is causing undefined behavior, since that's what happens when you write to a null pointer.

Why does my dynamically allocated array get initialized to 0?

I have some code that creates a dynamically allocated array with
int *Array = new int[size];
From what I understand, Array should be a pointer to the first item of Array in memory. When using gdb, I can call x Array to examine the value at the first memory location, x Array+1 to examine the second, etc. I expect to have junk values left over from whatever application was using those spots in memory prior to mine. However, using x Array returns 0x00000000 for all those spots. What am I doing wrong? Is my code initializing all of the values of the Array to zero?
EDIT: For the record, I ask because my program is an attempt to implement this: http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/. I want to make sure that my algorithm isn't incrementing through the array to initialize every element to 0.
In most modern OSes, the OS gives zeroed pages to applications, as opposed to letting information seep between unrelated processes. That's important for security reasons, for example. Back in the old DOS days, things were a bit more casual. Today, with memory protected OSes, the OS generally gives you zeros to start with.
So, if this new happens early in your program, you're likely to get zeros. You'd be crazy to rely on that though; it's undefined behavior if you do.
If you keep allocating, filling, and freeing memory, eventually new will return memory that isn't zeroed. Rather, it'll contain remnants of your process' own earlier scribblings.
And there's no guarantee that any particular call to new, even at the beginning of your program, will return memory filled with zeros. You're just likely to see that for calls to new early in your program. Don't let that mislead you.
I expect to have junk values left over from whatever application was using those spots
It's certainly possible but by no means guaranteed. Particularly in debug builds, you're just as likely to have the runtime zero out that memory (or fill it with some recognisable bit pattern) instead, to help you debug things if you use the memory incorrectly.
And, really, "those spots" is a rather loose term, given virtual addressing.
The important thing is that, no, your code is not setting all those values to zero.

Can a C/C++ program seg-fault from reading past the end of an array (UNIX)?

I'm aware that you can read past the end of an array - I'm wondering now if you can seg-fault just by performing that reading operation though.
int someints[100];
std::cerr << someints[100] << std::endl; //This is 1 past the end of the array.
Can the second line actually cause a seg-fault or will it just print jibberish? Also, if I changed that memory, can that cause a seg-fault on that specific line, or would a fault only happen later when something else tried to use that accidentally changed memory?
This is undefined behaviour and entirely depends on the virtual memory layout the operating system has arranged for the process. Generally you can either:
access some gibberish that belongs to your virtual address space but has a meaningless value, or
attempt to access a restricted memory address in which case the memory mapping hardware invokes a page fault and the OS decides whether to spank your process or allocate more memory.
If someints is an array on the stack and is the last variable declared, you will most likely get some gibberish off the top of the stack or (very unlikely) invoke a page fault that could either let the OS resize the stack or kill your process with a SIGSEGV.
Imagine you declare a single int right after your array:
int someints[100];
int on_top_of_stack = 42;
std::cerr << someints[100] << std::endl;
Then most likely the program should print 42, unless the compiler somehow rearranges the order of declarations on the stack.
Yes, it can segfault if memory at that address is not accessible by the program. In your case it is not likely as array is allocated on stack and is only 100 bytes long and stack size is significantly larger (i.e. 8 MB per thread on Linux 2.4.X), so there will be uninitialized data. But in some cases it may crash. In either case, this code is erroneous and profilers like Valgrind should be able to help you troubleshoot it.
The second line can cause literally anything to happen and still be correct as far as the language specification is concerned. It could print gibberish, it could crash due to a segmentation fault or something else, it could cause power to go out on the entire eastern seaboard, or it could cause the canonical demons to fly out of your nose...
That's the magic of undefined behaviour.