Difference between accessing non-existent array index and existing-but-empty index - c++

Suppose I wrote
vector<int> example(5);
example[6];
What difference would it make with the following?
vector<int> example(6);
example[5];
In the first case I'm trying to access a non-existent, non-declared index. Could that result in malicious code execution? Would it be possible to put some sort of code in the portion on memory corresponding to example[5] and have it executed by a program written like the first above?
What about the second case? Would it still be possible to place code in the area of memory of example[5], even though it should be reserved to my program, even if I haven't written anything in it?

Could that result in malicious code execution?
No, this causes 'only' undefined behaviour.
Simple code execution exploits usually write past the end of a stack-allocated buffer, thereby overwriting a return adress. When the function returns, it jumps to the malicious code. A write is always required, because else there is no malicious code in your program's address space.
With a vector the chances that this happens are low, because the storage for the elements is not allocated on the stack.
By writing to a wrong location on the heap, exploits are possible too, but they are much more complicated.

The first case reaches beyond the vector's buffer and thus invokes Undefined Behaviour. Technically, this means literally anything can happen. But it's unlikely to be directly exploitable to run malicious code—either the program will try to read the invalid memory (getting a garbage value or a memory error), or the compiler has eliminated the code path altogether (because it's allowed to assume UB doesn't happen). Depending on what's done with the result, it might potentially reveal unintended data from memory, though.
In the second case, all is well. Your program has already written into this memory—it has value-initialised all the 6 int objects in the vector (which happens in std::vector's constructor). So you're guarnateed to find a 0 of type int there.

Related

Why does this line of code cause a computer to crash?

Why does this line of code cause the computer to crash? What happens on memory-specific level?
for(int *p=0; ;*(p++)=0)
;
I have found the "answer" on Everything2, but I want a specific technical answer.
This code simply formally sets an integer pointer to null, then writes to the integer pointed by it a 0 and increments the pointer, looping forever.
The null pointer is not pointing to anything, so writing a 0 to it is undefined behavior (i.e. the standard doesn't say what should happen). Also you're not allowed to use pointer arithmetic outside arrays and so even just the increment is also undefined behavior.
Undefined behavior means that the compiler and library authors don't need to care at all about these cases and still the system is a valid C/C++ implementation. If a programmer does anything classified as undefined behavior then whatever happens happens and s/he cannot blame compiler and library authors. A programmer entering the undefined behavior realm cannot expect an error message or a crash, but cannot complain if getting one (even one million executed instructions later).
On systems where the null pointer is represented as zeros and there is no support for memory protection the effect or such a loop could be of starting wiping all the addressable memory, until some vital part of memory like an interrupt table is corrupted or until the code writes zeros on the code itself, self-destroying. On other systems with memory protection (most common desktop systems today) execution may instead simply stop at the very first write operation.
Undoubtedly, the cause of the problem is that p has not been assigned a reasonable address.
By not properly initializing a pointer before writing to where it points, it is probably going to do Bad Things™.
It could merely segfault, or it could overwrite something important, like the function's return address where a segfault wouldn't occur until the function attempts to return.
In the 1980s, a theoretician I worked with wrote a program for the 8086 to, once a second, write one word of random data at a randomly computed address. The computer was a process controller with watchdog protection and various types of output. The question was: How long would the system run before it ceased usefully functioning? The answer was hours and hours! This was a vivid demonstration that most of memory is rarely accessed.
It may cause an OS to crash, or it may do any number of other things. You are invoking undefined behavior. You don't own the memory at address 0 and you don't own the memory past it. You're just trouncing on memory that doesn't belong to you.
It works by overwriting all the memory at all the addresses, starting from 0 and going upwards. Eventually it will overwrite something important.
On any modern system, this will only crash your program, not the entire computer. CPUs designed since, oh, 1985 or so, have this feature called virtual memory which allows the OS to redirect your program's memory addresses. Most addresses aren't directed anywhere at all, which means trying to access them will just crash your program - and the ones that are directed somewhere will be directed to memory that is allocated to your program, so you can only crash your own program by messing with them.
On much older systems (older than 1985, remember!), there was no such protection and this loop could access memory addresses allocated to other programs and the OS.
The loop is not necessary to explain what's wrong. We can simply look at only the first iteration.
int *p = 0; // Declare a null pointer
*p = 0; // Write to null pointer, causing UB
The second line is causing undefined behavior, since that's what happens when you write to a null pointer.

Why does strcpy "work" when writing to malloc'ed memory that is not large enough? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why does this intentionally incorrect use of strcpy not fail horribly?
Below see below code:
char* stuff = (char*)malloc(2);
strcpy(stuff,"abc");
cout<<"The size of stuff is : "<<strlen(stuff);
Even though I assigned 2 bytes to stuff, why does strcpy still work and the output of strlen is 3. Shouldn't this throw something like index out of bounds?
C and C++ don't do automatic bounds checking like Java and C# do. This code will overwrite stuff in memory past the end of the string, corrupting whatever was there. That can lead to strange behavior or crashes later, so it's good to be cautious about such things.
Accessing past the end of an array is deemed "undefined behavior" by the C and C++ standards. That means the standard doesn't specify what must happen when a program does that, so a program that triggers UB is in never-never-land where anything might happen. It might continue to work with no apparent problems. It might crash immediately. It might crash later when doing something else that shouldn't have been a problem. It might misbehave but not crash. Or velociraptors might come and eat you. Anything can happen.
Writing past the end of an array is called a buffer overflow, by the way, and it's a common cause of security flaws. If that "abc" string were actually user input, a skilled attacker could put bytes into it that end up overwriting something like the function's return pointer, which can be used to make the program run different code than it should, and do different things than it should.
you just over write heap memory, no crash usually, but bad things can happen later. C does not prevent you from shooting your own foot, no such thing as array out of bounds.
No, your char pointer now points to a character of length 3. Generally this would not cause any problems, but you might overwrite some critical memory region and cause the program to crash(you can expect to see a segmentation fault then). Specially when you are performing such operations over a large amount of memory
  here is the implementation of "strcpy"
char *strcpy(char *strDestination, const char *strSource)
  {
  assert(strDestination && strSource);
  char *strD=strDestination;
  while ((*strDestination++=*strSource++)!='\0')
  NULL;
  return strD;
  }
you should ensure the destination have enough space. However,what it is,it is.
strcpy does not check for sufficient space in strDestination before copying strSource,
ALso it does not perform bounds checking, and thus risks overrunning from or to. it is a potential cause of buffer overruns.

segfault with array

I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.
That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.
For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.
It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.
Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).
The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.
Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.

Vector Ranges in C++

Another quick question here, I have this code:
string sa[6] = {
"Fort Sumter", "Manassas", "Perryville",
"Vicksburg", "Meridian", "Chancellorsville" };
vector<string> svec(sa, sa+6);
for (vector<string>::iterator iter = svec.begin(); iter != svec.end(); iter++)
{
std::cout << *iter << std::endl;
}
Why is it that when I do svec(sa, sa+7), the code works but it prints out an empty line after the last word and when I do sa+8 instead it crashes? Because the string array is only 6 elements big, shouldn't it crash at sa+7 also?
Thanks.
Accessing past the end of a vector is undefined behavior. Anything could happen. You might have nasal demons.
You have an array of only six elements. When you try to access the supposed "seventh" element, you get undefined behavior. Technically, that means anything can happen, but that doesn't seem to me like a very helpful explanation, so let's take a closer look.
That array occupies memory, and when you accessed the element beyond the end, you were reading whatever value happened to occupy that memory. It's possible that that address doesn't belong to your process, but it probably is, and so it's generally safe to read the sizeof(string) bytes that reside in that space.
Your program read from it and, since it was reading it through a string array, it treated that memory as though it were a real string object. (Your program can't tell the difference. It doesn't know any better. It's just trying to carry out your instructions.) Apparently, whatever data happened to be there looked enough like a real string object that your program was able to treat it like one, at least long enough to make a copy of it in the vector and then print its (empty) value. It worked this time, but that doesn't mean it will work every time.
There was no such luck with the data in the "eighth" position of the array. It did not look enough like a valid string object. A string object usually contains a pointer to the character data, along with a length. Maybe the area of the object that would normally represent that pointer didn't contain a valid address for your program. Or maybe the part that represented the length field contained a value far larger than what was available at the address in the pointer.
Your application does not crash because there is some standard that specifies that it should crash. Crashing is just random (undefined) behaviour. You will not always get a crash when you exceed the bounds of an array as you have found out.
Essentially, anything could happen such as printing a blank line, crashing or even as just posted - have demons fly out of your nose.
C++ doesn't do range-checking of arrays.
Reading beyond the end of an array is what's called "undefined" behaviour: i.e. it's not guaranteed to throw an exception, it's not guaranteed to not throw an exception, and it's not guaranteed to have consistent behaviour from one run to the next.
If people say that C++ is an "unsafe" language, this is part of what they mean by that. C++ doesn't check the range at run-time, because doing that a run-time take extra CPU instructions, and part of the design philosophy of C++ is to make it no slower than C.
Your compiler might have been able to warn you at compile-time (are you using the compiler command-line options to give you the maximum possible number of warnings?), though that too isn't guaranteed/required by the language.

Array index out of bound behavior

Why does C/C++ differentiates in case of array index out of bound
#include <stdio.h>
int main()
{
int a[10];
a[3]=4;
a[11]=3;//does not give segmentation fault
a[25]=4;//does not give segmentation fault
a[20000]=3; //gives segmentation fault
return 0;
}
I understand that it's trying to access memory allocated to process or thread in case of a[11] or a[25] and it's going out of stack bounds in case of a[20000].
Why doesn't compiler or linker give an error, aren't they aware of the array size? If not then how does sizeof(a) work correctly?
The problem is that C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
In this particular case, you are declaring a stack based array. Depending upon the particular implementation, accessing outside the bounds of the array will simply access another part of the already allocated stack space (most OS's and threads reserve a certain portion of memory for stack). As long as you just happen to be playing around in the pre-allocated stack space, everything will not crash (note i did not say work).
What's happening on the last line is that you have now accessed beyond the part of memory that is allocated for the stack. As a result you are indexing into a part of memory that is not allocated to your process or is allocated in a read only fashion. The OS sees this and sends a seg fault to the process.
This is one of the reasons that C/C++ is so dangerous when it comes to boundary checking.
The segfault is not an intended action of your C program that would tell you that an index is out of bounds. Rather, it is an unintended consequence of undefined behavior.
In C and C++, if you declare an array like
type name[size];
You are only allowed to access elements with indexes from 0 up to size-1. Anything outside of that range causes undefined behavior. If the index was near the range, most probably you read your own program's memory. If the index was largely out of range, most probably your program will be killed by the operating system. But you can't know, anything can happen.
Why does C allow that? Well, the basic gist of C and C++ is to not provide features if they cost performance. C and C++ has been used for ages for highly performance critical systems. C has been used as a implementation language for kernels and programs where access out of array bounds can be useful to get fast access to objects that lie adjacent in memory. Having the compiler forbid this would be for naught.
Why doesn't it warn about that? Well, you can put warning levels high and hope for the compiler's mercy. This is called quality of implementation (QoI). If some compiler uses open behavior (like, undefined behavior) to do something good, it has a good quality of implementation in that regard.
[js#HOST2 cpp]$ gcc -Wall -O2 main.c
main.c: In function 'main':
main.c:3: warning: array subscript is above array bounds
[js#HOST2 cpp]$
If it instead would format your hard disk upon seeing the array accessed out of bounds - which would be legal for it - the quality of implementation would be rather bad. I enjoyed to read about that stuff in the ANSI C Rationale document.
You generally only get a segmentation fault if you try to access memory your process doesn't own.
What you're seeing in the case of a[11] (and a[10] by the way) is memory that your process does own but doesn't belong to the a[] array. a[25000] is so far from a[], it's probably outside your memory altogether.
Changing a[11] is far more insidious as it silently affects a different variable (or the stack frame which may cause a different segmentation fault when your function returns).
C isn't doing this. The OS's virtual memeory subsystem is.
In the case where you are only slightly out-of-bound you are addressing memeory that is allocated for your program (on the stack call stack in this case). In the case where you are far out-of-bounds you are addressing memory not given over to your program and the OS is throwing a segmentation fault.
On some systems there is also a OS enforced concept of "writeable" memory, and you might be trying to write to memeory that you own but is marked unwriteable.
Just to add what other people are saying, you cannot rely on the program simply crashing in these cases, there is no gurantee of what will happen if you attempt to access a memory location beyond the "bounds of the array." It's just the same as if you did something like:
int *p;
p = 135;
*p = 14;
That is just random; this might work. It might not. Don't do it. Code to prevent these sorts of problems.
As litb mentioned, some compilers can detect some out-of-bounds array accesses at compile time. But bounds checking at compile time won't catch everything:
int a[10];
int i = some_complicated_function();
printf("%d\n", a[i]);
To detect this, runtime checks would have to be used, and they're avoided in C because of their performance impact. Even with knowledge of a's array size at compile time, i.e. sizeof(a), it can't protect against that without inserting a runtime check.
As I understand the question and comments, you understand why bad things can happen when you access memory out of bounds, but you're wondering why your particular compiler didn't warn you.
Compilers are allowed to warn you, and many do at the highest warning levels. However the standard is written to allow people to run compilers for all sorts of devices, and compilers with all sorts of features so the standard requires the least it can while guaranteeing people can do useful work.
There are a few times the standard requires that a certain coding style will generate a diagnostic. There are several other times where the standard does not require a diagnostic. Even when a diagnostic is required I'm not aware of any place where the standard says what the exact wording should be.
But you're not completely out in the cold here. If your compiler doesn't warn you, Lint may. Additionally, there are a number of tools to detect such problems (at run time) for arrays on the heap, one of the more famous being Electric Fence (or DUMA). But even Electric Fence doesn't guarantee it will catch all overrun errors.
That's not a C issue its an operating system issue. You're program has been granted a certain memory space and anything you do inside of that is fine. The segmentation fault only happens when you access memory outside of your process space.
Not all operating systems have seperate address spaces for each proces, in which case you can corrupt the state of another process or of the operating system with no warning.
C philosophy is always trust the programmer. And also not checking bounds allows the program to run faster.
As JaredPar said, C/C++ doesn't always perform range checking. If your program accesses a memory location outside your allocated array, your program may crash, or it may not because it is accessing some other variable on the stack.
To answer your question about sizeof operator in C:
You can reliably use sizeof(array)/size(array[0]) to determine array size, but using it doesn't mean the compiler will perform any range checking.
My research showed that C/C++ developers believe that you shouldn't pay for something you don't use, and they trust the programmers to know what they are doing. (see accepted answer to this: Accessing an array out of bounds gives no error, why?)
If you can use C++ instead of C, maybe use vector? You can use vector[] when you need the performance (but no range checking) or, more preferably, use vector.at() (which has range checking at the cost of performance). Note that vector doesn't automatically increase capacity if it is full: to be safe, use push_back(), which automatically increases capacity if necessary.
More information on vector: http://www.cplusplus.com/reference/vector/vector/