Referencing invalid memory locations with C++ Iterators - c++

I am a big fan of GCC, but recently I noticed a vague anomaly. Using __gnu_cxx::__normal_iterator (ie, the most common iterator type used in libstdc++, the C++ STL) it is possible to refer to an arbitrary memory location and even change its value without causing an exception! Is this expected behavior? If so, isn't a security loophole?
Here's an example:
#include <iostream>
using namespace std;
int main() {
basic_string<char> str("Hello world!");
basic_string<char>::iterator iter = str.end();
iter += str.capacity() + 99999;
*iter = 'x';
cout << "Value: " << *iter << endl;
}

Dereferencing an iterator beyond the end of the container from which it was obtained is undefined behavior, and doing nothing is just a possibility there.
Note that this is a question of compromise, it is nice having iterators check for validity for development, but that adds extra operations to the code. In MSVS iterators are by default checked (they will verify that they are valid and fail hard when they are used in a wrong way=. But that also has an impact in runtime performance.
The solution that Dinkumware (STL inside VS) provides (checked by default, can be unchecked through compiler options) is in fact a good choice, the user selects whether he wants slow safe iterators or fast unsafe versions of it. But from the point of view of the language, both are valid.

No, this is not a problem. Keep in mind that typical iterator usage is:
for ( type::const_iterator it = obj.begin(); it != obj.end(); ++it ){
// Refer to element using (*it)
}
Proper iterator usage requires one to check against the end() iterator. With random access iterators such as the one you are using, you can also use < and > with the iterators against end(). C and C++ don't typically do bounds checking as in Java, and it is your place to ensure that you do so.

C++ generally has a philosophy of not making you pay for what you don't use. It is up to you to validate that you're using iterators properly. For a random-access iterator, you can always test it:
if (iter < str.begin() || iter >= str.end())
throw something;

You got lucky. Or unlucky. Using your exact example, I segfaulted.
$ ./a.exe
11754 [main] a 4992 _cygtls::handle_exceptions: Error while dumping state (probably corrupted stack)
Segmentation fault (core dumped)
Undefined behavior can mean different things on different compiles, platforms, days. Perhaps when you ran it, the address created by all that adding ended up in some other valid memory space, just by chance. Maybe you incremented from the stack to the heap for example.

Related

Prefer Iterators Over Pointers?

This question is a bump of a question that had a comment here but was deleted as part of the bump.
For those of you who can't see deleted posts, the comment was on my use of const char*s instead of string::const_iterators in this answer: "Iterators may have been a better path from the get go, since it appears that is exactly how your pointers seems be treated."
So my question is this, do iterators hold string::const_iterators hold any intrinsic value over a const char*s such that switching my answer over to string::const_iterators makes sense?
Introduction
There are many perks of using iterators instead of pointers, among them are:
different code-path in release vs debug, and;
better type-safety, and;
making it possible to write generic code (iterators can be made to work with any data-structure, such as a linked-list, whereas intrinsic pointers are very limited in this regard).
Debugging
Since, among other things, dereferencing an iterator that is passed the end of a range is undefined-behavior, an implementation is free to do whatever it feels necessary in such case - including raising diagnostics saying that you are doing something wrong.
The standard library implementation, libstdc++, provided by gcc will issues diagnostics when it detects something fault (if Debug Mode is enabled).
Example
#define _GLIBCXX_DEBUG 1 /* enable debug mode */
#include <vector>
#include <iostream>
int
main (int argc, char *argv[])
{
std::vector<int> v1 {1,2,3};
for (auto it = v1.begin (); ; ++it)
std::cout << *it;
}
/usr/include/c++/4.9.2/debug/safe_iterator.h:261:error: attempt to
dereference a past-the-end iterator.
Objects involved in the operation:
iterator "this" # 0x0x7fff828696e0 {
type = N11__gnu_debug14_Safe_iteratorIN9__gnu_cxx17__normal_iteratorIPiNSt9__cxx19986vectorIiSaIiEEEEENSt7__debug6vectorIiS6_EEEE (mutable iterator);
state = past-the-end;
references sequence with type `NSt7__debug6vectorIiSaIiEEE' # 0x0x7fff82869710
}
123
The above would not happen if we were working with pointers, no matter if we are in debug-mode or not.
If we don't enable debug mode for libstdc++, a more performance friendly version (without the added bookkeeping) implementation will be used - and no diagnostics will be issued.
(Potentially) better Type Safety
Since the actual type of iterators are implementation-defined, this could be used to increase type-safety - but you will have to check the documentation of your implementation to see whether this is the case.
Consider the below example:
#include <vector>
struct A { };
struct B : A { };
// .-- oops
// v
void it_func (std::vector<B>::iterator beg, std::vector<A>::iterator end);
void ptr_func (B * beg, A * end);
// ^-- oops
int
main (int argc, char *argv[])
{
std::vector<B> v1;
it_func (v1.begin (), v1.end ()); // (A)
ptr_func (v1.data (), v1.data () + v1.size ()); // (B)
}
Elaboration
(A) could, depending on the implementation, be a compile-time error since std::vector<A>::iterator and std::vector<B>::iterator potentially isn't of the same type.
(B) would, however, always compile since there's an implicit conversion from B* to A*.
Iterators are intended to provide an abstraction over pointers.
For example, incrementing an iterator always manipulates the iterator so that if there's a next item in the collection, it refers to that next item. If it already referred to the last item in the collection, after the increment it'll be a unique value that can't be dereferenced, but will compare equal to another iterator pointing one past the end of the same collection (usually obtained with collection.end()).
In the specific case of an iterator into a string (or a vector), a pointer provides all the capabilities required of an iterator, so a pointer can be used as an iterator with no loss of required functionality.
For example, you could use std::sort to sort the items in a string or a vector. Since pointers provide the required capabilities, you can also use it to sort items in a native (C-style) array.
At the same time, yes, defining (or using) an iterator that's separate from a pointer can provide extra capabilities that aren't strictly required. Just for example, some iterators provide at least some degree of checking, to assure that (for example) when you compare two iterators, they're both iterators into the same collection, and that you aren't attempting an out of bounds access. A raw pointer can't (or at least normally won't) provide this kind of capability.
Much of this comes back to the "don't pay for what you don't use" mentality. If you really only need and want the capabilities of native pointers, they can be used as iterators, and you'll normally get code that's essentially identical to what you'd get by directly manipulating pointers. At the same time, for cases where you do want extra capabilities, such as traversing a threaded RB-tree or a B+ tree instead of a simple array, iterators allow you to do that while maintaining a single, simple interface. Likewise, for cases where you don't mind paying extra (in terms of storage and/or run-time) for extra safety, you can get that too (and it's decoupled from things like the individual algorithm, so you can get it where you want it without being forced to use it in other places that may, for example, have too critical of timing requirements to support it.
In my opinion, many people kind of miss the point when it comes to iterators. Many people happily rewrite something like:
for (size_t i=0; i<s.size(); i++)
...into something like:
for (std::string::iterator i = s.begin; i != s.end(); i++)
...and act as if it's a major accomplishment. I don't think it is. For a case like this, there's probably little (if any) gain from replacing an integer type with an iterator. Likewise, taking the code you posted and changing char const * to std::string::iterator seems unlikely to accomplish much (if anything). In fact, such conversions often make the code more verbose and less understandable, while gaining nothing in return.
If you were going to change the code, you should (in my opinion) do so in an attempt at making it more versatile by making it truly generic (which std::string::iterator really isn't going to do).
For example, consider your split (copied from the post you linked):
vector<string> split(const char* start, const char* finish){
const char delimiters[] = ",(";
const char* it;
vector<string> result;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == '(';
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
result.insert(result.end(), temp.begin(), temp.end());
start = ++it;
} while (it <= finish);
return result;
}
As it stands, this is restricted to being used on narrow strings. If somebody wants to work with wide strings, UTF-32 strings, etc., it's relatively difficult to get it to do that. Likewise, if somebody wanted to match [ or '{' instead of (, the code would need to be rewritten for that as well.
If there were a chance of wanting to support various string types, we might want to make the code more generic, something like this:
template <class InIt, class OutIt, class charT>
void split(InIt start, InIt finish, charT paren, charT comma, OutIt result) {
typedef std::iterator_traits<OutIt>::value_type o_t;
charT delimiters[] = { comma, paren };
InIt it;
do{
for (it = find_first_of(start, finish, begin(delimiters), end(delimiters));
it != finish && *it == paren;
it = find_first_of(extractParenthesis(it, finish) + 1, finish, begin(delimiters), end(delimiters)));
auto&& temp = interpolate(start, it);
*result++ = o_t{temp.begin(), temp.end()};
start = ++it;
} while (it != finish);
}
This hasn't been tested (or even compiled) so it's really just a sketch of a general direction you could take the code, not actual, finished code. Nonetheless, I think the general idea should at least be apparent--we don't just change it to "use iterators". We change it to be generic, and iterators (passed as template parameters, with types not directly specified here) are only a part of that. To get very far, we also eliminated hard-coding the paren and comma characters. Although not strictly necessary, I also change the parameters to fit more closely with the convention used by standard algorithms, so (for example) output is also written via an iterator rather than being returned as a collection.
Although it may not be immediately apparent, the latter does add quite a bit of flexibility. Just for example, if somebody just wanted to print out the strings after splitting them, he could pass an std::ostream_iterator, to have each result written directly to std::cout as it's produced, rather than getting a vector of strings, and then having to separately print them out.

issues with deques and exceeding deque size with index operator

Having a strange issue with deques in C++.
Let's say I have a deque of doubles of size 4. For some reason, when using the index operator, I seem to be able to exceed the size of the deque.
In other words, neither the compiler nor the program at execution will barf if I write the following:
for(int i = 0; i < 7; i++)
{
x[i] = (double)(i*i);
cout << x[i] << endl;
}
Where x is the deque. And I actually am able to get outputs from this.
It doesn't increase the size of the deque. If I output x.size(), I still get 4.
What gives?
I'm using Code::Blocks with the standard default gcc compiler that comes with it.
operator[] does not bounds check, just like when using a raw array. the at member function does, if you instead use
x.at(i);
you will get a std::out_of_range exception if you exceed the bounds of the deque. If you run your original code through a memory error checker (like valgrind) you will see "invalid read" and "invalid write" errors.
If you look at cppreference's docs on operator[] you'll see the note "No bounds checking is performed."
However the docs for at() say
If pos not within the range of the container, an exception of type std::out_of_range is thrown
Going out of bounds on a container is undefined behavior. If you are accessing with an index where you aren't sure if it's in-bounds or not, it's your job to either check that it is, or use at and possibly handle the exception.
Indexing out of bounds gives undefined behavior, so anything can happen.
Many containers will round the current size up to some convenient value (e.g., a power of 2), so depending on the current size you'll have some amount of memory after the last item in the collection. Indexing into that memory and attempting to read it will produce some result, but the memory is typically uninitialized, so the result will often be meaningless and invalid (and, although most don't, the container could do bounds checking, and throw an exception or almost anything else when you index out of bounds).
IMO, at is a fairly poor tool to deal with the possibility though. A better way to avoid such problems is a range-based for loop:
for (auto &d : x) {
d = d * d;
std::cout << d << "\n"; // avoid `endl`, which flushes the stream.
}
Another possibility would be to use standard algorithms:
std::transform(x.begin(), x.end(), x.begin(), [](double d) { return d*d; });
std::copy(x.begin(), x.end(), std::ostream_iterator<double>(std::cout, "\n"));
There are also range-based algorithms (e.g., one set in Boost, at least one more being suggested for a future C++ standard), that (do/would) allow something on the general order of:
copy(x, output_range<double>(std::cout, "\n"));
Since this figures out the bounds of x on its own, short of a bug in the code for the range, it's pretty much impossible to accidentally index out of bounds this way.

C++ Errors with referenced object - how to debug?

Following the help in this question, I am using a reference to my Class 'Mover' to manipulate the object (as part of a set) in a vector. I am having issues however, and I cannot seem to identify what's causing it for sure. It appears that once I've reached 30-35 objects in my vector (added at pseudo-random intervals) the program halts. No crash, just halt, and I have to manually end the task (CTRL-C doesn't work).
My problem appears to lie in these bits of code. My original:
int main() {
std::vector< Mover > allMovers;
std::vector< Mover >::iterator iter = allMovers.begin();
//This code runs to the end, but the 'do stuff' lines don't actually do anything.
Mover tempMover;
//Other code
while(iter < allMovers.end()) {
tempMover = *iter;
//Do stuff with tempMover
//Add another tempMover at a random interval
allMovers.push_back(CreateNewMover());
iter++;
}
//Other code
}
My update after the previous question linked to above:
int main() {
std::vector< Mover > allMovers;
std::vector< Mover >::iterator iter = allMovers.begin();
//This code crashes once about 30 or so items exist in the vector, but the 'do stuff' lines do work.
//Other code
while(iter < allMovers.end()) {
Mover& tempMover = *iter;
//Do stuff with tempMover
//Add another tempMover at a random interval
allMovers.push_back(CreateNewMover()); //Crashes here.
iter++;
}
//Other code
}
Any ideas of how to track this down? I have std::couts all over the place to flag where the code is for me. The crash (while happens at a varied number of objects) always crashes on the push_back(), despite having worked successfully multiple times in the same run before the crash.
EDIT
While I accept and (think) I understand the answer re: iterators, what I don't understand is why the code DOES work completely when I am not using a reference to the object? (First code block).
Another EDIT
In case anyone was looking for this specifically, part of my question was not addressed: "How to debug?" As a C++ newbie, I was unaware of the gdb debugger (using MinGW). Now that I've learned about it, it has been very helpful in finding the source of these issues.
When a vector reallocates its memory, all iterators are invalidated (along with any reference or pointer to any element). So sometimes your push_back will invalidate iter, and trying to use it afterwards gives undefined behaviour.
The simplest fix is to use an index rather than an iterator. Alternatively, if you can calculate an upper bound for the maximum size of the vector, you could call reserve before the loop to ensure it never reallocates. Or you could use std::list, whose iterators are preserved when new elements are inserted.
UPDATE: Regarding your edit, both give undefined behaviour. It might be that, in the first case, you don't crash because you don't access a dangling reference (while accessing tempMover in the second might very well crash), and then the memory happens to be reallocated at a lower address than before, so the while condition (which uses < rather than the more conventional !=) exits the loop immediately. Or something completely different could be happening - that's the nature of undefined behaviour.
You are (probably) doing it wrong.
The thing is, mixing iteration over a container and manipulation of the container structure (here adding objects) is extremely error-prone.
Whenever you add an element in allMovers, there is a risk that iter is invalidated. Any usage of iter after it has been invalidated is Undefined Behavior.
It is possible to do it correctly:
iter = allMovers.insert(allMovers.end(), CreateNewMover());
however it's just a bad idea in general.
My advice would be to ban this kind of code from your code base altogether. Every single occurrence is a bug in the making. Find another algorithm.
From documentation for push_back():
If new size() is not larger than capacity(), no iterators or references are invalidated. Otherwise all iterators and references are invalidated.
When you reach 30 or some objects new size() > capacity(), resulting in invalidation of the iterator iter, which is derefenced causing undefined behaviour.
You might probably need to change the line containing the while statement:
while(iter != allMovers.end()) {
the < operator seems to work fine with a vector usually, but I had better results using != which works with other containers and also seems to be used in more example code out there.
Update
You may replace the while loop with an equivalent for loop like this:
for(std::vector<Mover>::iterator iter = allMovers.begin(); iter != allMovers.end(); ++iter)
{
This has the advantage that the increment of the iterator iter "has its place" and is less likely to be forgotten.
Update 2
If I understand your example above, you'd like to fill the container with some content. I suggest (as others did) to get rid of the iterator altogether.
int main()
{
std::vector< Mover > allMovers;
//Other code
while(1) // this loop will add new movers as long as it succeeds to create one
{
Mover new_mover = CreateNewMover();
if ( IS EMPTY (new_mover) ) // pseudocode. Check if the previous
break; // CreateNewMover() succeeded.
allMovers.push_back(new_mover);
}
//Other code
}

erase in std::vector, debug, release

std::vector<int> va; // and push_back 1~100
std::vector<int>::iterator i = va.begin();
for(i; i != va.end(); )
{
if((*i) == 5) va.erase(i);
else i++
}
This code is 100% crashed when debug runtime.
But don`t crash this code when release runtime.
Why this happen?
What is different debug and release mode in this code?
You have undefined behavior because you're using an invalid iterator (i is invalidated by the erase()).
Avoid the whole problem by using the Erase-remove Idiom:
va.erase(std::remove(va.begin(), va.end(), 5), va.end());
As others have pointed out, the crash is due to the invalidated iterator that you are continuing to use after calling va.erase().
Now, as to why it works in Release mode, is that in some cases the iterator for a std::vector<> in Release mode is a simple pointer into a dynamically allocated array. When you call erase, the iterator continues to point at the same element of the array while the contents of the array have been moved by the erase function. This is undefined behavior and Standard Library implementation specific, but very common. Under no circumstance should you rely on the behavior in portable code.
However, on some Standard Library implementations, Debug mode iterators perform checking and are more complicated than simple pointers. As such, they can detect that you are doing something that isn't legal and intentionally cause a crash, so that you can recognize your error.
vector::erase returns a new iterator, as it makes the current one invalid.
if((*i) == 5) va.erase(i);
should be
if((*i) == 5) i = va.erase(i);

std::list<>: Element before l.begin()

Short question: Is the following code unsafe using other compilers than I do (mingw32), or is it valid to use?
list<int> l;
/* add elements */
list<int>::iterator i = l.begin();
i--;
i++;
cout << *i << endl;
...or in other words: is i defined to point to l.begin() after this?
Yes, the code is unsafe. Once you attempt to move before begin() you have caused undefined behavior. Attempting to move "back again" may not work.
A std::list traverses its contents via linked list pointers, so pointer arithmetic is not used to calculate a correct position. The previous position from .begin() will have no data and shouldn't provide any valid traversal mechanisms.
Containers like std::vector have random access iterators and would use pointer arithmetic under the covers, so they would probably give the right result (no problem), but its still a bad idea.
So, it shouldn't work, its undefined, and don't do it even if it does work somehow :)