Vector Ranges in C++ - c++

Another quick question here, I have this code:
string sa[6] = {
"Fort Sumter", "Manassas", "Perryville",
"Vicksburg", "Meridian", "Chancellorsville" };
vector<string> svec(sa, sa+6);
for (vector<string>::iterator iter = svec.begin(); iter != svec.end(); iter++)
{
std::cout << *iter << std::endl;
}
Why is it that when I do svec(sa, sa+7), the code works but it prints out an empty line after the last word and when I do sa+8 instead it crashes? Because the string array is only 6 elements big, shouldn't it crash at sa+7 also?
Thanks.

Accessing past the end of a vector is undefined behavior. Anything could happen. You might have nasal demons.

You have an array of only six elements. When you try to access the supposed "seventh" element, you get undefined behavior. Technically, that means anything can happen, but that doesn't seem to me like a very helpful explanation, so let's take a closer look.
That array occupies memory, and when you accessed the element beyond the end, you were reading whatever value happened to occupy that memory. It's possible that that address doesn't belong to your process, but it probably is, and so it's generally safe to read the sizeof(string) bytes that reside in that space.
Your program read from it and, since it was reading it through a string array, it treated that memory as though it were a real string object. (Your program can't tell the difference. It doesn't know any better. It's just trying to carry out your instructions.) Apparently, whatever data happened to be there looked enough like a real string object that your program was able to treat it like one, at least long enough to make a copy of it in the vector and then print its (empty) value. It worked this time, but that doesn't mean it will work every time.
There was no such luck with the data in the "eighth" position of the array. It did not look enough like a valid string object. A string object usually contains a pointer to the character data, along with a length. Maybe the area of the object that would normally represent that pointer didn't contain a valid address for your program. Or maybe the part that represented the length field contained a value far larger than what was available at the address in the pointer.

Your application does not crash because there is some standard that specifies that it should crash. Crashing is just random (undefined) behaviour. You will not always get a crash when you exceed the bounds of an array as you have found out.
Essentially, anything could happen such as printing a blank line, crashing or even as just posted - have demons fly out of your nose.

C++ doesn't do range-checking of arrays.
Reading beyond the end of an array is what's called "undefined" behaviour: i.e. it's not guaranteed to throw an exception, it's not guaranteed to not throw an exception, and it's not guaranteed to have consistent behaviour from one run to the next.
If people say that C++ is an "unsafe" language, this is part of what they mean by that. C++ doesn't check the range at run-time, because doing that a run-time take extra CPU instructions, and part of the design philosophy of C++ is to make it no slower than C.
Your compiler might have been able to warn you at compile-time (are you using the compiler command-line options to give you the maximum possible number of warnings?), though that too isn't guaranteed/required by the language.

Related

Reverse iterators and negative strided iterators in C++, using one before beginning as sentinel

In Another way of looking at C++ reverse iterators Raymond Chen wrote:
... a quirk of the C++ language: You are allowed to have a pointer
"one past the end" of a collection, but you are not allowed to have a
pointer "one before the beginning" of a collection.
I understand that it probably means "undefined behavior" and that is pretty much a conversation ender. But I am curious what is the worst that can happen in a realistic system if one ignores this rule. A segmentation fault? an overflow of pointer arithmetic? unnecessary paginations?
Remember that the pointer "before" the beginning (like "end") is not supposed to be referenced either, the problem seem to have the pointer just trying to point to it.
The only situation I can imagine it can be a problem is with a system where memory location "0" is valid.
But even then, if that is so there are bigger problems, (e.g. nullptr would be problematic in itself too and wrapping around might still work by convention I guess.)
I am not questioning the implementation of reverse_iterator with an off-by-one special code.
The question occurred to me because if you have a generic implementation of "strided" iterator would require a special logic for negative strides and that has a cost (in code and at runtime).
Arbitrary strides, including negative ones could be naturally appearing with multidimensional arrays.
I have a multidimensional array library in which I always allowed negative strides in principle, but now I realized that they should have a special code (different from the positive case) if I allowed them at all and I don't want to expose undefined behavior.
But I am curious what is the worst that can happen in a realistic
system if one ignores this rule. A segmentation fault? an overflow of
pointer arithmetic? unnecessary paginations?
Wasted space.
To be able to form the address "one before" you need to have the space available. For example, if you have a 1k struct, it must start at least 1k from the beginning of memory. On a small device with limited memory, or an x86 with segmented memory, that can be tricky.
To form a "one past" pointer/iterator you only need one byte beyond the end. As it cannot be dereferenced anyway, there is no need to have space for an entire object, just the possibility to form the address.
From the comments:
At the time the rules were defined, we had real CPUs with dedicated address registers that validated the address against the segment size on register load. That made sure that a -1 address would trap...
https://en.wikipedia.org/wiki/Motorola_68000#Architecture
This ruled out the case of having "one before" wrapping around to the end of the segment. Because if the segment is small, the resulting end might not be there.

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

Difference between accessing non-existent array index and existing-but-empty index

Suppose I wrote
vector<int> example(5);
example[6];
What difference would it make with the following?
vector<int> example(6);
example[5];
In the first case I'm trying to access a non-existent, non-declared index. Could that result in malicious code execution? Would it be possible to put some sort of code in the portion on memory corresponding to example[5] and have it executed by a program written like the first above?
What about the second case? Would it still be possible to place code in the area of memory of example[5], even though it should be reserved to my program, even if I haven't written anything in it?
Could that result in malicious code execution?
No, this causes 'only' undefined behaviour.
Simple code execution exploits usually write past the end of a stack-allocated buffer, thereby overwriting a return adress. When the function returns, it jumps to the malicious code. A write is always required, because else there is no malicious code in your program's address space.
With a vector the chances that this happens are low, because the storage for the elements is not allocated on the stack.
By writing to a wrong location on the heap, exploits are possible too, but they are much more complicated.
The first case reaches beyond the vector's buffer and thus invokes Undefined Behaviour. Technically, this means literally anything can happen. But it's unlikely to be directly exploitable to run malicious code—either the program will try to read the invalid memory (getting a garbage value or a memory error), or the compiler has eliminated the code path altogether (because it's allowed to assume UB doesn't happen). Depending on what's done with the result, it might potentially reveal unintended data from memory, though.
In the second case, all is well. Your program has already written into this memory—it has value-initialised all the 6 int objects in the vector (which happens in std::vector's constructor). So you're guarnateed to find a 0 of type int there.

How do I take the address of one past the end of an array if the last address is 0xFFFFFFFF?

If it is legal to take the address one past the end of an array, how would I do this if the last element of array's address is 0xFFFFFFFF?
How would this code work:
for (vector<char>::iterator it = vector_.begin(), it != vector_.end(); ++it)
{
}
Edit:
I read here that it is legal before making this question: May I take the address of the one-past-the-end element of an array?
If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.
Accessing out of the array boundaries is undefined behavior. You shouldn't be surprised if a demon flies out of your nose (or something like that)
What might actually happen would be an overflow in the address which could lead to you reading address zero and hence segmentation fault.
If you are always within the array range, and you do the last ++it which goes out of the array and you compare it against _vector.end(), then you are not really accessing anything and there should not be a problem.
I think there is a good argument for suggesting that a conformant C implementation cannot allow an array to end at (e.g.) 0xFFFFFFFF.
Let p be a pointer to one-element-off-the-end-of-the-array: if buffer is declared as char buffer[BUFFSIZE], then p = buffer+BUFFSIZE, or p = &buffer[BUFFSIZE]. (The latter means the same thing, and its validity was made explicit in the C99 standard document.)
We then expect the ordinary rules of pointer comparison to work, since the initialization of p was an ordinary bit of pointer arithmetic. (You cannot compare arbitrary pointers in standard C, but you can compare them if they are both based in a single array, memory buffer, or struct.) But if buffer ended at 0xFFFFFFFF, then p would be 0x00000000, and we would have the unlikely situation that p < buffer!
This would break a lot of existing code which assumes that, in valid pointer arithmetic done relative to an array base, the intuitive address-ordering property holds.
It's not legal to access one past the end of an array
that code doesn't actually access that address.
and you will never get an address like that on a real system for you objects.
The difference is between dereferencing that element and taking its address. In your example the element past the end wont be dereferenced and so it is a valid. Although this was not really clear in the early days of C++ it is clear now. Also the value you pass to subscript does not really matter.
Sometimes the best thing you can do about corner cases is forbid them. I saw this class of problem with some bit field extraction instructions of the NS32032 in which the hardware would load 32 bits starting at the byte address and extract from that datum. So even single-bit fields anywhere in the last 3 bytes of mapped memory would fail. The solution was to never allow the last 4 bytes of memory to be available for allocation.
Quite a few architectures that would be affected by this solve the problem by reserving offset 0xFFFFFFFF (and a bit more) for the OS.

Why compiler does not complain about accessing elements beyond the bounds of a dynamic array? [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 7 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
I am defining an array of size 9. but when I access the array index 10 it is not giving any error.
int main() {
bool* isSeedPos = new bool[9];
isSeedPos[10] = true;
}
I expected to get a compiler error, because there is no array element isSeedPos[10] in my array.
Why don't I get an error?
It's not a problem.
There is no bound-check in C++ arrays. You are able to access elements beyond the array's limit (but this will usually cause an error).
If you want to use an array, you have to check that you are not out of bounds yourself (you can keep the sizee in a separate variable, as you did).
Of course, a better solution would be to use the standard library containers such as std::vector.
With std::vector you can either
use the myVector.at(i)method to get the ith element (which will throw an exception if you are out of bounds)
use myVector[i] with the same syntax as C-style arrays, but you have to do bound-checking yourself ( e.g. try if (i < myVector.size()) ... before accessing it)
Also note that in your case, std::vector<bool> is a specialized version implemented so that each booltakes only one bit of memory (therefore it uses less memory than an array of bool, which may or may not be what you want).
Use std::vector instead. Some implementations will do bounds checking in debug mode.
No, the compiler is not required to emit a diagnostic for this case. The compiler does not perform bounds checking for you.
It is your responsibility to make sure that you don't write broken code like this, because the compiler will not error on it.
Unlike in other languages like java and python, array access is not bound-checked in C or C++. That makes accessing arrays faster. It is your responsibility to make sure that you stay within bounds.
However, in such a simple case such as this, some compilers can detect the error at compile time.
Also, some tools such as valgrind can help you detect such errors at run time.
What compiler/debugger are you using?
MSVC++ would complain about it and tell you that you write out of bounds of an array.
But it is not required to do it by the standard.
It can crash anytime, it causes undefined behaviour.
Primitive arrays do not do bounds-checking. If you want bounds-checking, you should use std::vector instead. You are accessing invalid memory after the end of array, and purely by luck it is working.
There is no runtime checking on the index you are giving, accessing element 10 is incorrect but possible. Two things can happen:
if you are "unlucky", this will not crash and will return some data located after your array.
if you are "lucky", the data after the array is not allocated by your program, so access to the requested address is forbidden. This will be detected by the operating system and will produce a "segmentation fault".
There is no rule stateing that the memory access is checked in c, plain and simple. When you ask for an array of bool's it might be faster for the Operating system to give you a 16bit og 32bit array, instead of a 9bit one. This means that you might not even be writing or reading into someone elses space.
C++ is fast, and one of the reasons that it is fast is becaurse there are very few checks on what you are doing, if you ask for some memory, then the programming language will assume that you know what you are doing, and if the operating system does not complain, then everything will run.
There is no problem! You are just accessing memory that you shouldn't access. You get access to memory after the array.
isSeedPos doesn't know how big the array is. It is just a pointer to a position in memory. When you point to isSeepPos[10] the behaviour is undefined. Chances are sooner or later this will cause a segfault, but there is no requirement for a crash, and there is certainly no standard error checking.
Writing to that position is dangerous.
But the compiler will let you do it - Effectively you're writing one-past the last byte of memory assigned to that array = not a good thing.
C++ isn't a lot like many other languages - It assumes that you know what you are doing!
Both C and C++ let you write to arbitrary areas of memory. This is because they originally derived from (and are still used for) low-level programming where you may legitimately want to write to a memory mapped peripheral, or similar, and because it's more efficient to omit bounds checking when the programmer already knows the value will be within (eg. for a loop 0 to N over an array, he/she knows 0 and N are within the bounds, so checking each intermediate value is superfluous).
However, in truth, nowadays you rarely want to do that. If you use the arr[i] syntax, you essentially always want to write to the array declared in arr, and never do anything else. But you still can if you want to.
If you do write to arbitrary memory (as you do in this case) either it will be part of your program, and it will change some other critical data without you knowing (either now, or later when you make a change to the code and have forgotten what you were doing); or it will write to memory not allocated to your program and the OS will shut it down to prevent worse problems.
Nowadays:
Many compilers will spot it if you make an obvious mistake like this one
There are tools which will test if your program writes to unallocated memory
You can and should use std::vector instead, which is there for the 99% of the time you want bounds checking. (Check whether you're using at() or [] to access it)
This is not Java. In C or C++ there is no bounds checking; it's pure luck that you can write to that index.